[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338395#comment-14338395 ]
Charles Lamb commented on HDFS-7836: ------------------------------------ Hi [~arpit99], Thanks for reading over the design doc and commenting on it. bq. The DataNode can now split block reports per storage directory (post HDFS-2832), controlled by DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY. Did you get a chance to try it out and see if it helps? Splitting reports addresses all of the above. (edit: does not address network bandwidth gains from compression though) I think you may mean your work on HDFS-5153, right? If I understand that correctly, it sends one report per storage. We have seen block reports in the 100MB+ sizes so we suspect that an even small chunksize than a storage may yield benefits. That said, I am also watching [~daryn]'s work on HDFS-7435 which addresses a lot of this piece of this Jira's proposal. I think that once HDFS-7435 is committed, we will make some measurements and see if anything else in the area of chunking is necessary. As you point out, compression should also help. bq. Do you have any estimates for startup time overhead due to GCs? We know of at least one large deployment which experiences a full GC pause during startup. I'm not sure of the time, but in general, the off-heaping will help with NN throughput just by reducing the number of objects on the heap. bq. How does this affect block report processing? We cannot assume DataNodes will sort blocks by target stripe. Will the NameNode sort received reports or will it acquire+release a lock per block? If the former, then there should probably be some randomization of order across threads to avoid unintended serialization e.g. lock convoys. The idea is that currently, processing a block report requires taking the FSN lock. So this proposal is two part. First, use better locking semantics so that we don't have to take the FSN lock. Next, shard the blocksMap structure so that multiple threads can operate concurrently on that structure. Even if we continue to process BRs under one big happy FSN lock, having multiple threads operate concurrently will yield benefits. The sharding ("stripes") is along arbitrary boundaries. For instance, the design doc suggests that it could be striped by doing blockId % nStripes. nStripes would be configurable to a relatively small number (the dd suggests 4 to 16), and if the modulo calculation is used, then nStripes would be a prime that is roughly equal to the number of threads available. As long as block report processing per block does not need to access more than one shard at a time, this will be fine -- multiple threads can process blocks in parallel. It is a technique that Berkeley DB Java Edition uses for its lock table to improve concurrency. > BlockManager Scalability Improvements > ------------------------------------- > > Key: HDFS-7836 > URL: https://issues.apache.org/jira/browse/HDFS-7836 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Charles Lamb > Assignee: Charles Lamb > Attachments: BlockManagerScalabilityImprovementsDesign.pdf > > > Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)