[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhe Zhang updated HDFS-7339: ---------------------------- Attachment: HDFS-7339-001.patch > Create block groups for initial block encoding > ---------------------------------------------- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: Encoding-design-NN.jpg, HDFS-7339-001.patch > > > All erasure codec operations center around the concept of _block groups_, > which are formed in encoding and looked up in decoding. This JIRA creates a > lightweight {{BlockGroup}} class to record the original and parity blocks in > an encoding group, as well as a pointer to the codec schema. Pluggable codec > schemas will be supported in HDFS-7337. > The NameNode creates and maintains {{BlockGroup}} instances through 2 new > components; the attached figure has an illustration of the architecture. > {{ECManager}}: This module manages {{BlockGroups}} and associated codec > schemas. As a simple example, it stores the codec schema of Reed-Solomon > algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each > {{BlockGroup}} points to the schema it uses. To facilitate lookups during > recovery requests, {{BlockGroups}} should be oraganized as a map keyed by > {{Blocks}}. > {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. > This module analyzes the incoming events, and dispatches tasks to > {{UnderReplicatedBlocks}} to create parity blocks. A new queue > ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues > to maintain the relative order of encoding and replication tasks. > * Whenever a block is finalized and meets EC criteria -- including 1) block > size is full; 2) the file’s storage policy allows EC -- > {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it > needs to store a set of blocks waiting to be encoded. Different grouping > algorithms can be applied -- e.g., always grouping blocks in the same file. > Blocks in a group should also reside on different DataNodes, and ideally on > different racks, to tolerate node and rack failures. If successful, it > records the formed group with {{ECManager}} and insert the parity blocks into > {{QUEUE_INITIAL_ENCODING}}. > * When a parity block or a raw block in {{ENCODED}} state is found missing, > {{ErasureCodingBlocks}} adds it to existing priority queues in > {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, > they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be > added for fine grained differentiation (e.g., loss of a raw block versus a > parity one). -- This message was sent by Atlassian JIRA (v6.3.4#6332)