[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300256#comment-15300256 ]
Daryn Sharp commented on HDFS-9924: ----------------------------------- I'm late to the game due to time constraints, but this feature greatly concerns me. It's true the NN can handle over 100k ops/sec but only with a read-dominated workload. Even then, I've had to do _a lot_ of internal (hopefully soon to be published) performance work to prevent blowing the heap under such a sustained load - recent user pushed a NN to 90k ops/sec for most of weekend and barely dented the heap. BUT it was 81% read ops. In the past that would have been a 8-10 min GC. I digress. More on point: The intended use case is for mass write operations. Consider this: on multiple large clusters, offloading just a few thousands write ops/sec for log aggregation reduced 95th ptile processing time from 4ms to <.5ms and queue time from 20ms to 4ms. The extremely wild variance in the metrics also stabilized. I've already been having performance concerns with hive's mass setOwner/setPermission which I believe is single-threaded. This feature appears intended for hive. I'm really hesitant for a feature that makes it trivial to destroy a NN. > [umbrella] Asynchronous HDFS Access > ----------------------------------- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Reporter: Tsz Wo Nicholas Sze > Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org