[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718197#comment-14718197
 ] 

GAO Rui commented on HDFS-7285:
-------------------------------

Thank you very much [~brahmareddy], [~zhz]. 

{quote}
   1) snapshot feature 
   2) balancer feature
{quote}
may will be developed in future EC work, we could add these into the system 
test plan, and implement the test later. 
{quote}
   4) parallel writes  
   5) parallel reads
{quote}
I think {{parallel reads}} means more than one client try to read the same EC 
file form HDFS, right? What is {{parallel writes}} refer to, in EC system 
testing? Could you explain the scenario?

{quote}
1. Good points from Brahma Reddy Battula, I suggest that we also add HSM/mover 
tests to the list.
2. In reading tests we can distinguish stateful read and pread. Maybe we should 
test seek-and-read scenario too.
3. It seems each test scenario in the "Tips for EC Writing/Reading" section is 
systematically labeled. Will the labels be used to drive automatic testing?
{quote}

We can also add {{HSM/mover}} to the test plan, and implement it in future 
work. 

For the reading distinguish, we currently implement system test by using 
FSShell command in terminal, like {{CopyFromLocal}} and {{CopyToLocal}}. Can we 
set the client to read EC file in particular mechanism like stateful read and 
pread by terminal command? 

The labels in EC Writing/Reading tests were generated by test script during the 
test process, but it is also possible to drive automatic testing by the 
scenario labels vice versa.


> Erasure Coding Support inside HDFS
> ----------------------------------
>
>                 Key: HDFS-7285
>                 URL: https://issues.apache.org/jira/browse/HDFS-7285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Weihua Jiang
>            Assignee: Zhe Zhang
>         Attachments: Compare-consolidated-20150824.diff, 
> Consolidated-20150707.patch, Consolidated-20150806.patch, 
> Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, 
> HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> HDFSErasureCodingSystemTestPlan-20150824.pdf, 
> HDFSErasureCodingSystemTestReport-20150826.pdf, fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to