[ https://issues.apache.org/jira/browse/HDFS-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daryn Sharp updated HDFS-7967: ------------------------------ Attachment: HDFS-7967-branch-2.patch HDFS-7967-branch-2.8.patch We’ve been using a similar patch since 2.6 to prevent the balancer destroying the performance of large and/or dense clusters. When getBlocks queries exceeded tens or hundreds of ms, we had to reduce the balancer dispatcher thread count from 200 down to 1-5 to avoid call queue overflow. This change allows us to use 200 dispatchers with little to no performance impact. Basic design changes: # getBlocks queries are O\(1\) instead of O\(n\) # avoids moving recently completed blocks to prevent disruption to active clients. # evenly returns blocks from storages with the least remaining space. The main issue in the triplets currently represent a terminated LIFO list. New blocks are inserted before the head, and become the new head. Hence the current implementation’s O\(N\) seek to a random location in the block list. Worse, the random seek needlessly iterates through all the blocks of previous storages. This design converts the triplets into a cyclic FIFO list. New blocks are inserted as the tail - before the head, but the current head is not changed. The getBlocks query becomes O\(1\) by starting from the current head, returning the “oldest” completed blocks, then updating the head so the next query resumes where it left off. The block iterators track the size of the returned blocks so an “expected” remaining storage capacity determines the sorting order of a node’s storage iterators to maintain roughly the same free space across all storages. FBR processing currently reconciles inconsistencies (phantom blocks to remove from blocks map) by adding a delimiter block as head, moving all reported blocks to the head, removing blocks from the delimiter to list termination. This change adds the delimiter as the tail, moves all reported blocks to the tail, removes blocks from the head to the delimiter. Note that HDFS-9260 replaced the triplets with a RB tree on trunk, so it requires a completely new design which I unfortunately do not have the cycles to implement. [~sfriberg], can you create a trunk patch? > Reduce the performance impact of the balancer > --------------------------------------------- > > Key: HDFS-7967 > URL: https://issues.apache.org/jira/browse/HDFS-7967 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: 2.0.0-alpha > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Attachments: HDFS-7967-branch-2.8.patch, HDFS-7967-branch-2.patch > > > The balancer needs to query for blocks to move from overly full DNs. The > block lookup is extremely inefficient. An iterator of the node's blocks is > created from the iterators of its storages' blocks. A random number is > chosen corresponding to how many blocks will be skipped via the iterator. > Each skip requires costly scanning of triplets. > The current design also only considers node imbalances while ignoring > imbalances within the nodes's storages. A more efficient and intelligent > design may eliminate the costly skipping of blocks via round-robin selection > of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org