[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384743#comment-14384743 ]
Zhe Zhang commented on HDFS-7285: --------------------------------- Thanks [~drankye] for the thoughts. I just talked to some HIVE folks about the HDFS directory structure in their typical workloads. In a nutshell, it looks like the following: {code} warehouse / \ db1 db2 / \ / \ ... ... table_1 table_2 / | \ part_1 part_2 part_3 ... {code} Each DB table is represented as a directory (usually with huge fan-out), under which each partition is stored as a file. Each partition maps to a fixed range in the key space. I was told that it's quietly common to see skewed partitions. A likely scenario is thousands of small partitions along with a few outliers that are much larger than average. In the EC context, it indicates a potential need for per-file policies (e.g. EC for large files, replication for small files). I still plan to look at a few more cases. At this stage, I think _extended storage policy_ is a good term to use in our APIs (maybe we can abbreviate it as {{XStoragePolicy}}). > Erasure Coding Support inside HDFS > ---------------------------------- > > Key: HDFS-7285 > URL: https://issues.apache.org/jira/browse/HDFS-7285 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Weihua Jiang > Assignee: Zhe Zhang > Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, > HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, > HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, > fsimage-analysis-20150105.pdf > > > Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice > of data reliability, comparing to the existing HDFS 3-replica approach. For > example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, > with storage overhead only being 40%. This makes EC a quite attractive > alternative for big data storage, particularly for cold data. > Facebook had a related open source project called HDFS-RAID. It used to be > one of the contribute packages in HDFS but had been removed since Hadoop 2.0 > for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends > on MapReduce to do encoding and decoding tasks; 2) it can only be used for > cold files that are intended not to be appended anymore; 3) the pure Java EC > coding implementation is extremely slow in practical use. Due to these, it > might not be a good idea to just bring HDFS-RAID back. > We (Intel and Cloudera) are working on a design to build EC into HDFS that > gets rid of any external dependencies, makes it self-contained and > independently maintained. This design lays the EC feature on the storage type > support and considers compatible with existing HDFS features like caching, > snapshot, encryption, high availability and etc. This design will also > support different EC coding schemes, implementations and policies for > different deployment scenarios. By utilizing advanced libraries (e.g. Intel > ISA-L library), an implementation can greatly improve the performance of EC > encoding/decoding and makes the EC solution even more attractive. We will > post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)