[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

Charles Lamb (JIRA) Thu, 26 Feb 2015 05:40:30 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338395#comment-14338395
 ]


Charles Lamb commented on HDFS-7836:
------------------------------------

Hi [~arpit99],

Thanks for reading over the design doc and commenting on it.

bq. The DataNode can now split block reports per storage directory (post 
HDFS-2832), controlled by DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY. Did you get a 
chance to try it out and see if it helps? Splitting reports addresses all of 
the above. (edit: does not address network bandwidth gains from compression 
though)

I think you may mean your work on HDFS-5153, right? If I understand that 
correctly, it sends one report per storage. We have seen block reports in the 
100MB+ sizes so we suspect that an even small chunksize than a storage may 
yield benefits. That said, I am also watching [~daryn]'s work on HDFS-7435 
which addresses a lot of this piece of this Jira's proposal. I think that once 
HDFS-7435 is committed, we will make some measurements and see if anything else 
in the area of chunking is necessary. As you point out, compression should also 
help.

bq. Do you have any estimates for startup time overhead due to GCs?

We know of at least one large deployment which experiences a full GC pause 
during startup. I'm not sure of the time, but in general, the off-heaping will 
help with NN throughput just by reducing the number of objects on the heap.

bq. How does this affect block report processing? We cannot assume DataNodes 
will sort blocks by target stripe. Will the NameNode sort received reports or 
will it acquire+release a lock per block? If the former, then there should 
probably be some randomization of order across threads to avoid unintended 
serialization e.g. lock convoys.

The idea is that currently, processing a block report requires taking the FSN 
lock. So this proposal is two part. First, use better locking semantics so that 
we don't have to take the FSN lock. Next, shard the blocksMap structure so that 
multiple threads can operate concurrently on that structure. Even if we 
continue to process BRs under one big happy FSN lock, having multiple threads 
operate concurrently will yield benefits. The sharding ("stripes") is along 
arbitrary boundaries. For instance, the design doc suggests that it could be 
striped by doing blockId % nStripes. nStripes would be configurable to a 
relatively small number (the dd suggests 4 to 16), and if the modulo 
calculation is used, then nStripes would be a prime that is roughly equal to 
the number of threads available. As long as block report processing per block 
does not need to access more than one shard at a time, this will be fine -- 
multiple threads can process blocks in parallel. It is a technique that 
Berkeley DB Java Edition uses for its lock table to improve concurrency.


> BlockManager Scalability Improvements
> -------------------------------------
>
>                 Key: HDFS-7836
>                 URL: https://issues.apache.org/jira/browse/HDFS-7836
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

Reply via email to