[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495442#comment-14495442 ]
Zhe Zhang commented on HDFS-7859: --------------------------------- [~szetszwo] / [~drankye]: The [phasing plan | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14391207&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391207] I posted might be a little confusing in regards of schemas. My apologies. In the offline meetup on 03/31, we didn't reach a clear conclusion on how much of schema work to include before merging. Therefore I left it in phase I, but marked it as optional. My thought was that we could make a better decision after observing how fast the work could proceed. Up to this point I think this thread is going pretty well and it seems we can have a multi-schema implementation when other HDFS-7285 tasks are done (see details below). Good [questions | https://issues.apache.org/jira/browse/HDFS-7859?focusedCommentId=14494933&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494933] on schema design. I think we eventually need to answer them in the broader scope of HDFS-7337. IIUC HDFS-7859 / HDFS-7866 are not touching most of the tricky scenarios. Based on Kai's latest [comment | https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14494050&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494050], HDFS-7866 will mostly handle _default_ schemas embedded in the {{ECSchemaManager}} code. The patch under this JIRA handles saving / loading these default schemas in fsimage. I think this is necessary even without loading custom schemas from XML. Otherwise we cannot guarantee the NameNode which loads the fsimage has the same default schemas as the NameNode which saved it. It is obviously even more necessary when we add custom schemas. The logic in the patch is quite straightforward; it's mostly about serialize / deserialize schemas. So here's my proposal: # Shrink this patch to get rid of logics on modifying and removing schemas ({{ECSchemaManager#modifyECSchema}} and {{OP_MODIFY_EC_SCHEMA}}). # Repurpose HDFS-7866 to focus on loading custom schemas from site xml files. [~szetszwo], [~drankye], [~vinayrpet]: let me know if you agree with the above. If we are all synced on this, how about moving this JIRA back to HDFS-7285 and keeping HDFS-7866 under HDFS-8031? > Erasure Coding: Persist EC schemas in NameNode > ---------------------------------------------- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Kai Zheng > Assignee: Xinwei Qin > Attachments: HDFS-7859.001.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)