[ https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jing Zhao updated HDFS-16875: ----------------------------- Attachment: Erasure Coding Access Proxy.pdf > Erasure Coding: data access proxy to allow old clients to read EC data > ---------------------------------------------------------------------- > > Key: HDFS-16875 > URL: https://issues.apache.org/jira/browse/HDFS-16875 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ec, erasure-coding > Reporter: Jing Zhao > Assignee: Jing Zhao > Priority: Major > Attachments: Erasure Coding Access Proxy.pdf > > > Erasure Coding is only supported by Hadoop 3, while many production > deployments still depend on Hadoop 2. Upgrading the whole data tech stack to > the Hadoop 3 release may involve big migration efforts and even reliability > risks, considering the incompatibilities between these two Hadoop major > releases as well as the potential uncovered issues and risks hidden in newer > releases. Therefore, we need to find a solution, with the least amount of > migration effort and risk, to adopt Erasure Coding for cost efficiency but > still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in > a transparent manner. > Internally we have developed an EC access proxy which translates the EC data > for old clients. We also extend the NameNode RPC so it can recognize HDFS > clients with/without the EC support, and redirect the old clients to the > proxy. With the proxy we set up separate Erasure Coding clusters storing > hundreds of PB of data, while leaving other production clusters and all the > upper layer applications untouched. > Considering some changes are made at fundamental components of HDFS (e.g., > client-NN RPC header), we do not aim to merge the change to trunk. We will > use this ticket to share the design and implementation details (including the > code) and collect feedback. We may use a separate github repo to open source > the implementation later. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org