[ 
https://issues.apache.org/jira/browse/HDFS-17864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17864:
----------------------------------
    Labels: pull-request-available  (was: )

> Improve fsimage load time by making LightWeightGSet and NameCache thread-safe
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-17864
>                 URL: https://issues.apache.org/jira/browse/HDFS-17864
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: khazhen
>            Priority: Major
>              Labels: pull-request-available
>
> HDFS-14617 allows the inode and inode directory sections of the fsimage to be 
> loaded in parallel.
> However, increasing the configured number of sections and threads has 
> diminishing returns as there are some synchronized points in the loading code 
> to protect some in memory structures.
> Currently, there are mainly 3 data structures that need to be protected by 
> synchronized blocks:
> # INodeMap (internally based on LightWeightGSet, but it is not thread-safe)
> # BlocksMap (internally based on LightWeightGSet, but it is not thread-safe)  
> # NameCache (it is not thread-safe by itself)
> To further improve FSImage loading speed, this PR attempts to make the above 
> 3 data structures thread-safe, and then use multiple threads to initialize 
> them when NameNode starts. 
> Additionally, some optimizations have been made to reduce GC overheads during 
> FSImage parsing.
> In our tests, the FSImage loading time (165M inodes & 258M blocks) was 
> reduced from 180s to 73s.
> *1. Making LightWeightGSet thread-safe*
>    LightWeightGSet is a HashMap-like data structure that uses a fixed-length 
> array as hash buckets, with each array element storing the head node of an 
> independent linked list.
>    Since each linked list is independent, we can allocate a lock for each 
> bucket to protect the corresponding linked list. 
>    To trade off between memory consumption and concurrency, we can let 
> multiple buckets share a lock and use a hash-based mapping.
>    To minimize changes, we don't plan to implement a completely thread-safe 
> GSet to replace LightWeightGSet, as this would require significant changes 
> and is unnecessary since all operations on LightWeightGSet are synchronized 
> once NameNode finishes starting up.
>    We introduced an external synchronization tool GSetConcurrencyController 
> to ensure the thread safety of LightWeightGSet during NameNode startup.
>    Another issue that needs to be addressed is the GSet's size. Currently, 
> the size in LightWeightGSet is not an atomic variable, and even if we use 
> segmented locks to protect hash buckets, the size is still inaccurate.
>    Fortunately, in the FSImage loading scenario, we can clearly know the 
> expected size of INodeMap and BlocksMap after loading, so we can correct its 
> size after loading is complete.
> *2. Making NameCache thread-safe*
>    This is simpler compared to LightWeightGSet. We only need to combine 
> ConcurrentHashMap and AtomicInteger to implement a thread-safe version of 
> NameCache.
> *3. Reducing GC pressure during FSImage loading*
>    After completing steps 1 and 2, we found that GC gradually became a new 
> bottleneck. After analysis, we discovered that the parseDelimitedFrom method 
> in ProtoBuffer creates a 4096-byte array as cache when parsing each INode 
> object. 
>    To optimize this issue, we introduced the DelimitedProtoBufParseHelper 
> utility class to reuse the cache array.
> Appendix: Test environment and configuration information
> *Hadoop version*: current master, including previous fsimage loading 
> optimizations: HDFS-13694, HDFS-14617, HDFS-15493
> *FSImage information*:
>     Size: 20G (165M inodes & 258M blocks)
> *Config:*
>       dfs.image.parallel.threads=16
>       dfs.image.parallel.target.sections=128
>       dfs.image.parallel.load=true
> *    new config in this patch:*
>       dfs.image.concurrent.init.inode.map.enable=true
>       dfs.image.name.cache.init.thread.num=16
>       dfs.image.block.map.init.thread.num=16



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to