[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330890#comment-15330890 ]
Duo Zhang commented on HDFS-9924: --------------------------------- My concern is that, you can not tell people that hive is only compatible with hadoop-2.8.x, right? For example, we set hbase to be compatible with hadoop-2.4+, so usually we will optimize for all hadoop-2.4+ versions if possible instead of using a new feature only introduced in a newer version. Here, a thread pool solution works for all hadoop-2.x versions. And it is not that terrible to have 1MB stack size per thread... It is offheap, only increases 1MB VSZ, not RSS, RSS will increase on demand. And you can set a smaller stack size if you like to reduce the overhead. For the implementation, what [~stack] said above is the experience we got from our write-ahead-log implementation. And for the hive case here, yes, you have a different pattern. But it is not a good idea to wait on Futures sequentially. For example, you have request 0-99, and request 1 is blocked for a long time and request 2-99 are all failed. With your solution, you will block on request 1 for a long time before resubmit the failed 2-99 request. This is a inherent defect of lacking the support of callback. And a better solution is, sorry, but again, using multiple threads. With a thread pool and {{CompletionService}}, you can (sometimes) get the failed request first. Hope this could help. Thanks. > [umbrella] Asynchronous HDFS Access > ----------------------------------- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Reporter: Tsz Wo Nicholas Sze > Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org