[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296210#comment-14296210
 ] 

Zhe Zhang commented on HDFS-7285:
---------------------------------

We had a very productive meetup today. Please find a summary below:
*Attendees*: [~szetszwo], [~zhz], [~jingzhao]

*NameNode handling of block groups* (HDFS-7339):
# Under the striping layout, it's viable to use the first block to represent 
the entire block group.
# A separate map for block groups is not necessary; {{blocksMap}} can be used 
for both regular blocks and striped block groups.
# Block ID allocation: we will use the following protocol, which partitions the 
entire ID space with a binary flag
{code}
Contiguous: {reserved block IDs | flag | block ID}
Striped: {reserved block IDs | flag | reserved block group IDs | block group ID 
| index in group}
{code}
# When the cluster has randomly generated block IDs (from legacy code), the 
block group ID generator needs to check for ID conflicts in the entire range of 
IDs generated. We should file a follow-on JIRA to investigate possible 
optimizations for efficient conflict detection.
# To make HDFS-7339 more trackable, we should shrink its scope and remove the 
client RPC code. It should be limited to block management and INode handling.
# Existing block states are sufficient to represent a block group. A client 
should {{COMMIT}} a block group just as a block. The {{COMPLETE}} state needs 
to collect ack from all participating DNs in the group.
# We should subclass {{BlockInfo}} to remember the block group layout. This is 
an optimization to avoid frequently retrieving the info from file INode.

*EC and storage policy*:
# We agreed that _EC vs. replication_ is another configuration dimension, 
orthogonal to the current storage-type-based policies (HOT, WARM, COLD). Adding 
EC in the storage policy space will require too many combinations to be 
explicitly listed and chosen from.
# On-going development can still use HDFS-7347, which embeds EC as one of the 
storage policies (it has already been committed to HDFS-EC). HDFS-7337 should 
take the EC policy out from file header and put it as an XAttr. Other EC 
parameters, including codec algorithm and schema, should also be stored in XAttr
# HDFS-7343 fundamentally addresses the issue of complex storage policy space. 
It's a hard problem and should be kept separate from the HDFS-EC project.

*Client and DataNode*:
# At this point the design of HDFS-7545 -- which wraps around the 
{{DataStreamer}} logic -- looks reasonable. In the future we can consider 
adding a simpler and more efficient output class for the _one replica_ scenario.

We also went over the *list of subtasks*. Several high level comments:
# The list is already pretty long. We should reorder the items to have better 
grouping and more appropriate priorities. I will make a first pass.
# It seems HDFS-7689 should extend the {{ReplicationMonitor}} rather than 
creating another checker.
# We agreed the best way to support hflush/hsync is to write temporary parity 
data and update later, when a complete stripe is accumulated.
# We need another JIRA for truncate/append support.

> Erasure Coding Support inside HDFS
> ----------------------------------
>
>                 Key: HDFS-7285
>                 URL: https://issues.apache.org/jira/browse/HDFS-7285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Weihua Jiang
>            Assignee: Zhe Zhang
>         Attachments: ECAnalyzer.py, ECParser.py, 
> HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to