[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670 ]
stack commented on HDFS-9924: ----------------------------- bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a different problem. We should not protect NN by making the client slow. We should add protection in NN instead The above quote is magical-thinking (see the response to the above quote given by Daryn, an operator of one of our largest deploys). We are talking branch-2 here for this Future hack. The NN is not going to sprout scale of a sudden in the branch-2 line to support 'thousands' of concurrent ops coming in from an adjacent, Hive metadata server blame-shifting. Some form of parsimony, concern for NN loading, is in order. Rereading this issue from the top down (including the design doc -- it needs numbers... what is a large number of calls?; why wouldn't a thread pool work given you need to throttle) and seeing where we have arrived, this issue is not about 'Asynchronous HDFS Access' as the summary and original description advertises but instead is an expedient hack-for-hive, for late in branch-2 only. The 'change' will have a short shelf-life it seems given it arrives in 2.9.0+ (?) and branch-3 is looking to be a different API (See discussion on HADOOP-12910). The two distinct positions I discern in the discussion so far -- those who want a true async API on HDFS and those working on a hive fix -- are having trouble finding a common ground. If this characterization is correct, I'd suggest lets just call this issue a hack-for-hive explicitly and annotate it as such. A good few of the participants in this issue are likely not much interested in the latter (e.g. myself) as long as this work does not get in the way of our having a 'real' async API (HADOOP-12910) or confuse downstreamers on what the async story on HDFS is. > [umbrella] Asynchronous HDFS Access > ----------------------------------- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Reporter: Tsz Wo Nicholas Sze > Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org