[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300256#comment-15300256
 ] 

Daryn Sharp commented on HDFS-9924:
-----------------------------------

I'm late to the game due to time constraints, but this feature greatly concerns 
me.

It's true the NN can handle over 100k ops/sec but only with a read-dominated 
workload.  Even then, I've had to do _a lot_ of internal (hopefully soon to be 
published) performance work to prevent blowing the heap under such a sustained 
load - recent user pushed a NN to 90k ops/sec for most of weekend and barely 
dented the heap.  BUT it was 81% read ops.  In the past that would have been a 
8-10 min GC.  I digress.

More on point: The intended use case is for mass write operations.  Consider 
this: on multiple large clusters, offloading just a few thousands write ops/sec 
for log aggregation reduced 95th ptile processing time from 4ms to <.5ms and 
queue time from 20ms to 4ms.  The extremely wild variance in the metrics also 
stabilized.

I've already been having performance concerns with hive's mass 
setOwner/setPermission which I believe is single-threaded.  This feature 
appears intended for hive.  I'm really hesitant for a feature that makes it 
trivial to destroy a NN.

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to