[
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335054#comment-15335054
]
Andrew Wang commented on HDFS-9924:
-----------------------------------
Thanks for posting the performance report [~xiaobingo]. Some comments:
* Hardware is listed as "32 X 8 cores Intel(R) Xeon(R) CPU E52630 v3 @
2.40GHz", could you clarify? 32 CPUs, or this is a typo?
* 500 threads is a lot to start with, and increasing beyond that it gets slower
so already 500 looks like an upper bound on the sweet spot. Considering the
best-case speedup ranges from 30-60x, I'm betting the sweet spot is closer to
30-60 threads. I'd be interested in seeing e.g. 25, 50, 100, 250. Expectation
to see an upside-down U-shaped curve.
Overall though, async and 500 thread pool aren't that different in terms of
performance (one of the conclusions from the doc). I expect we can get better
performance and less overhead by using a smaller # of threads.
So, regarding branching, where do we stand on this? I don't understand the
resistance to a feature branch. What overhead are we concerned about? I don't
understand what is achieved by the proposal to move AsyncDFS to a test package;
isn't it better to just do this on a feature branch? Why is it easier to
develop on trunk? The discussion we're having here can continue unabated with
the code on a branch.
I'll hold off for another day, but seriously, let's just put it on a feature
branch.
> [umbrella] Nonblocking HDFS Access
> ----------------------------------
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Xiaobing Zhou
> Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked
> until the method returns. It is very slow if a client makes a large number
> of independent calls in a single thread since each call has to wait until the
> previous call is finished. It is inefficient if a client needs to create a
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is
> not blocked. The methods in the new API immediately return a Java Future
> object. The return value can be obtained by the usual Future.get() method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]