[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

Duo Zhang (JIRA) Tue, 14 Jun 2016 16:58:20 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330890#comment-15330890
 ]


Duo Zhang commented on HDFS-9924:
---------------------------------

My concern is that, you can not tell people that hive is only compatible with 
hadoop-2.8.x, right?
For example, we set hbase to be compatible with hadoop-2.4+, so usually we will 
optimize for all hadoop-2.4+ versions if possible instead of using a new 
feature only introduced in a newer version.

Here, a thread pool solution works for all hadoop-2.x versions. And it is not 
that terrible to have 1MB stack size per thread... It is offheap, only 
increases 1MB VSZ, not RSS, RSS will increase on demand. And you can set a 
smaller stack size if you like to reduce the overhead.

For the implementation, what [~stack] said above is the experience we got from 
our write-ahead-log implementation. And for the hive case here, yes, you have a 
different pattern. But it is not a good idea to wait on Futures sequentially. 
For example, you have request 0-99, and request 1 is blocked for a long time 
and request 2-99 are all failed. With your solution, you will block on request 
1 for a long time before resubmit the failed 2-99 request. This is a inherent 
defect of lacking the support of callback. And a better solution is, sorry, but 
again, using multiple threads. With a thread pool and {{CompletionService}}, 
you can (sometimes) get the failed request first.

Hope this could help. Thanks.

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

Reply via email to