[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

Walter Su (JIRA) Thu, 20 Aug 2015 06:41:18 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704910#comment-14704910
 ]


Walter Su commented on HDFS-7285:
---------------------------------

Comments on HDFS-7285-merge branch.
1. Unused import.
{code}
+import com.google.common.base.Preconditions; //DFSInputStream.java
{code}
{code}
+import org.apache.commons.logging.Log;  //DFSOutputStream.java
+import org.apache.commons.logging.LogFactory;
{code}

3. Please replace it with {{stat.getErasureCodingPolicy()==null}}
{code}
+    if(stat.getReplication() == 0) {  //DFSOutputStream.java
+      throw new IOException("Not support appending to a striping layout file 
yet.");
+    }
{code}

4. redundant space character.
{code}
-public abstract class BlockInfo extends Block  //BlockInfo.java
+public abstract class  BlockInfo extends Block
{code}

5. indent
{code}
-  private LocatedBlock createLocatedBlock(final BlockInfo blk, final long pos) 
{  //BlockManager.java
+  private LocatedBlock createLocatedBlock(final BlockInfo blk, final long pos
+  ) throws IOException {
{code}

6. truncation is file level. Maybe assert !file.isStriped() is less confusing? 
And also {{createNewBlock(false)}} can be less confusing.
{code}
+     BlockInfo oldBlock = file.getLastBlock();  //FSDirTruncateOp.java
++    assert !oldBlock.isStriped();
...
++          fsn.createNewBlock(file.isStriped()) : new Block(
{code}

7. unnecessary change
{code}
-    fsn.getBlockManager().setBlockToken(lBlk, //FSDirWriteFileOp.java
-        BlockTokenIdentifier.AccessMode.WRITE);
+    fsn.getFSDirectory().getBlockManager().
+        setBlockToken(lBlk, BlockTokenIdentifier.AccessMode.WRITE);
{code}
8. unnecessary change. (wiki says using [Sun's 
conventions|http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-136091.html],
 which says "Break before an operator")
{code}
 -          if (blockInfo.getBlockCollection().getStoragePolicyID()  
//FSNamesystem.java
 -              == lpPolicy.getId()) {  // feature branch
 +          BlockInfo blockInfo = getStoredBlock(b);
-           if (blockInfo.getBlockCollection().getStoragePolicyID() == 
lpPolicy.getId()) {  // trunk
++          if (blockInfo.getBlockCollection().getStoragePolicyID() ==
++              lpPolicy.getId()) {
{code}
Thanks [~zhz] for the great work!

> Erasure Coding Support inside HDFS
> ----------------------------------
>
>                 Key: HDFS-7285
>                 URL: https://issues.apache.org/jira/browse/HDFS-7285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Weihua Jiang
>            Assignee: Zhe Zhang
>         Attachments: Consolidated-20150707.patch, 
> Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
> ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

Reply via email to