[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292295#comment-14292295
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7339:
-------------------------------------------

> Are you proposing we enforce 4 zero bits for all blocks, striped or regular?

No, only for the first block in a block group.  We have a few choices:

# Allocate IDs for normal blocks as usual.  Allocate *consecutive IDs for 
blocks in a group*.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x303..0x30B, allocateBlockGroup -> 0x30C..0x314, allocateBlock -> 0x315, ...
#* Since the block IDs in a block group could cross low 4-bit boundary, 
BlocksMap lookup need to be executed twice.  e.g. for 0x312, first try 
lookup(0x312) which returns null, and then try lookup(0x30F) which returns the 
block group with 0x30C first block.
#* *All lookup need to be executed twice!*  It does not seems a good solution 
(so that I did not mention it previously.)
# Allocate IDs for normal blocks as usual.  Allocate consecutive IDs for blocks 
in a group and *skip to next zero if it crosses the low 4-bit boundary*.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x303..0x30B, allocateBlockGroup -> 0x310..0x318, allocateBlock -> 0x319, ...
#* Only one lookup is needed.
# Allocate IDs for normal blocks as usual.  Allocate consecutive IDs for blocks 
in a group and *always skip to next zero low 4-bit*.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x310..0x308, allocateBlockGroup -> 0x320..0x328, allocateBlock -> 0x329, ...
#* The low 4-bit of the first block ID in block group is always zero.
# Allocate IDs for normal blocks and *skip if the low 4-bit is zero*.  Allocate 
consecutive IDs for blocks in a group and always skip to next zero low 4-bit.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x310..0x308, allocateBlockGroup -> 0x320..0x328, allocateBlock -> 0x329, 
allocateBlock -> 0x32A, allocateBlock -> 0x32B, allocateBlock -> 0x32C, 
allocateBlock -> 0x32D, allocateBlock -> 0x32E, allocateBlock -> 0x32F, 
*allocateBlock -> 0x331*, ...
#* If the low 4-bit of an ID is zero, it must be the first block in a block 
group.
# Allocate IDs for normal blocks, skip if the low 4-bit is zero and *skip to 
next low 4-bit if the previous allocation is for a block group*.  Allocate 
consecutive IDs for blocks in a group and always skip to next zero low 4-bit.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x310..0x308, allocateBlockGroup -> 0x320..0x328, *allocateBlock -> 0x331*, ...
#* Normal blocks and blocks in block group cannot share the same high 60-bit 
prefix.
# The same as before except that *do not skip if the low 4-bit is zero*.
#* e.g. allocateBlock -> 0x301, allocateBlock -> 0x302, allocateBlockGroup -> 
0x310..0x308, allocateBlockGroup -> 0x320..0x328, *allocateBlock -> 0x330*, ...
#* Normal block ID could have zero low 4-bit.


Which one is the best?


> Allocating and persisting block groups in NameNode
> --------------------------------------------------
>
>                 Key: HDFS-7339
>                 URL: https://issues.apache.org/jira/browse/HDFS-7339
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to