[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187755#comment-15187755 ]
Colin Patrick McCabe commented on HDFS-9924: -------------------------------------------- Currently the NameNode can handle between 10k and 100k operations per second, depending on configuration and the nature of the operations. It seems like you should be able to comfortably dispatch that many operations from a few thousand client threads performing synchronous RPC calls... bearing in mind that each operation will take a few milliseconds on average. This is assuming that you want to consume all the available NN RPC bandwidth from a single client node. Perhaps I'm missing something, but I don't see how async operations will improve performance here. The overhead of a few thousand threads on the client is small, and certainly not what is limiting HDFS performance. Rather, performance is limited by considerations like the locking on the NameNode, Java garbage collections on the NameNode, and serialization/deserialization overheads. Please keep in mind that you don't need async operations to reuse connections and sockets... we do that already via mechanisms like the {{PeerCache}} (formerly {{SocketCache}}). Clearly, Hive can also dispatch operations in parallel using standard mechanisms like an Executor or ThreadPool. I certainly don't object to implementing this, but if the goal is better performance, I think you are going to be disappointed. Perhaps I have missed something, though... I'm curious if there are reasons for implementing this that I have not considered. > [umbrella] Asynchronous HDFS Access > ----------------------------------- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Reporter: Tsz Wo Nicholas Sze > Assignee: Xiaobing Zhou > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)