[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335054#comment-15335054
 ] 

Andrew Wang commented on HDFS-9924:
-----------------------------------

Thanks for posting the performance report [~xiaobingo]. Some comments:

* Hardware is listed as "32 X 8 cores Intel(R) Xeon(R) CPU E5­2630 v3 @ 
2.40GHz", could you clarify? 32 CPUs, or this is a typo?
* 500 threads is a lot to start with, and increasing beyond that it gets slower 
so already 500 looks like an upper bound on the sweet spot. Considering the 
best-case speedup ranges from 30-60x, I'm betting the sweet spot is closer to 
30-60 threads. I'd be interested in seeing e.g. 25, 50, 100, 250. Expectation 
to see an upside-down U-shaped curve.

Overall though, async and 500 thread pool aren't that different in terms of 
performance (one of the conclusions from the doc). I expect we can get better 
performance and less overhead by using a smaller # of threads.

So, regarding branching, where do we stand on this? I don't understand the 
resistance to a feature branch. What overhead are we concerned about? I don't 
understand what is achieved by the proposal to move AsyncDFS to a test package; 
isn't it better to just do this on a feature branch? Why is it easier to 
develop on trunk? The discussion we're having here can continue unabated with 
the code on a branch.

I'll hold off for another day, but seriously, let's just put it on a feature 
branch.

> [umbrella] Nonblocking HDFS Access
> ----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to