[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292496#comment-14292496 ]
Zhe Zhang commented on HDFS-7339: --------------------------------- Thanks for the analysis [~szetszwo]. The basic tradeoff is the compactness of ID space versus lookup overhead. I agree option #1 should be ruled out (most compact allocation, slowest lookup). >From options #2~#5 the trend is sparser ID allocation; more invariants are >guaranteed as a benefit. However, it seems all of them require an additional lookup (either in {{blocksMap}} or in the map of inodes) to identify a non-EC block? For example, when a block report for *0x331* arrives, we don't know if it's a non-EC block, or an EC block in the group *0x330*. So we must lookup {{blocksMap}} for *0x330* and get a miss or find the inode and obtain the storage policy. Whereas separating the ID space with a binary flag leads to 1 lookup (except for legacy, randomly generated block IDs). > Allocating and persisting block groups in NameNode > -------------------------------------------------- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)