[ https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982721#comment-16982721 ]
Erik Krogen commented on HDFS-14973: ------------------------------------ Thanks for the review [~shv]! I addressed your comments in [^HDFS-14973-branch-2.005.patch] : # Great idea, thanks. # I updated it to avoid the use of {{Preconditions}}. It still throws an {{IllegalArgumentException}}, as I think this is a much better semantic representation of the issue than an {{IOException}}. The description of an IAE: "Thrown to indicate that a method has been passed an illegal or inappropriate argument." # Yup, totally makes sense. My mistake. Fixed. # I had a bug in the implementation of {{setAtomicLongToMinMax()}} that would cause the {{FSNamesystem}} spy to hang sometimes. I fixed this. > Balancer getBlocks RPC dispersal does not function properly > ----------------------------------------------------------- > > Key: HDFS-14973 > URL: https://issues.apache.org/jira/browse/HDFS-14973 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0 > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14973-branch-2.003.patch, > HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, > HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, > HDFS-14973.003.patch, HDFS-14973.test.patch > > > In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls > issued by the Balancer/Mover more dispersed, to alleviate load on the > NameNode, since {{getBlocks}} can be very expensive and the Balancer should > not impact normal cluster operation. > Unfortunately, this functionality does not function as expected, especially > when the dispatcher thread count is low. The primary issue is that the delay > is applied only to the first N threads that are submitted to the dispatcher's > executor, where N is the size of the dispatcher's threadpool, but *not* to > the first R threads, where R is the number of allowed {{getBlocks}} QPS > (currently hardcoded to 20). For example, if the threadpool size is 100 (the > default), threads 0-19 have no delay, 20-99 have increased levels of delay, > and 100+ have no delay. As I understand it, the intent of the logic was that > the delay applied to the first 100 threads would force the dispatcher > executor's threads to all be consumed, thus blocking subsequent (non-delayed) > threads until the delay period has expired. However, threads 0-19 can finish > very quickly (their work can often be fulfilled in the time it takes to > execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), > thus opening up 20 new slots in the executor, which are then consumed by > non-delayed threads 100-119, and so on. So, although 80 threads have had a > delay applied, the non-delay threads rush through in the 20 non-delay slots. > This problem gets even worse when the dispatcher threadpool size is less than > the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no > threads ever have a delay applied_, and the feature is not enabled at all. > This problem wasn't surfaced in the original JIRA because the test > incorrectly measured the period across which {{getBlocks}} RPCs were > distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} > were used to track the time over which the {{getBlocks}} calls were made. > However, {{startGetBlocksTime}} was initialized at the time of creation of > the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even > worse, the Balancer in this test takes 2 iterations to complete balancing the > cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} > actually represents: > {code} > (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the > Dispatcher to complete an iteration of moving blocks) > {code} > Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen > during the period of initial block fetching. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org