[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186762#comment-15186762 ] Hudson commented on HDFS-7866: -- FAILURE: Integrated in Hadoop-trunk-Commit #9446 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9446/]) HDFS-7866. Erasure coding: NameNode manages multiple erasure coding (zhz: rev 7600e3c48ff2043654dbe9f415a186a336b5ea6c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReconstructStripedBlocksWithRackAwareness.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ErasureCodingPolicy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestStripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStripedINodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFileAttributes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirErasureCodingOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ErasureCodingPolicyManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicies.java > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Fix For: 3.0.0 > > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186632#comment-15186632 ] Kai Zheng commented on HDFS-7866: - Thanks [~zhz], [~lirui] and all for making of this major moving! > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Fix For: 3.0.0 > > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186623#comment-15186623 ] Rui Li commented on HDFS-7866: -- Thanks guys for the review! > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Fix For: 3.0.0 > > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186426#comment-15186426 ] Jing Zhao commented on HDFS-7866: - Yes, the plan sounds good to me. +1 on the v13 patch. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186359#comment-15186359 ] Zhe Zhang commented on HDFS-7866: - Thanks Jing for the clarification. bq. So we're good to leave the field as required right? I think so. As Jing suggested above we can do a follow-on to systematically change {{ErasureCodingPolicyProto}} (or even {{BlockStoragePolicyProto}}) fields to optional. +1 on the v13 patch. [~jingzhao] Could you confirm if the above plan sounds good? If so I'll commit the patch. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186314#comment-15186314 ] Rui Li commented on HDFS-7866: -- Thanks [~jingzhao] and [~zhz] for the great discussions. So we're good to leave the field as required right? I also checked the test failures and they cannot be reproduced locally. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186286#comment-15186286 ] Kai Zheng commented on HDFS-7866: - Thanks for the nice discussions. bq. Because EC is only in 3.0, we can ask existing EC users to upgrade to the new version, and not to allow new clients and old NN coexisting in the cluster. Makes sense. So we'll allow old clients to access the new clusters using 3.0 right. I guess we'll need to collect such information or discussion conclusions somewhere as a part of release note. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186270#comment-15186270 ] Jing Zhao commented on HDFS-7866: - Sorry for the confusion. I mean the change on {{ErasureCodingPolicyProto}} only affects RPC calls. It will break the wire compatibility between new client and old NN, as you mentioned. This is the scenario we have in rolling upgrade or different versions are mixed in the same cluster. Because EC is only in 3.0, we can ask existing EC users to upgrade to the new version, and not to allow new clients and old NN coexisting in the cluster. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186217#comment-15186217 ] Zhe Zhang commented on HDFS-7866: - bq. ErasureCodingPolicyProto is only used in getFileStatus RPC [~jingzhao] Seems {{getErasureCodingPolicy}} and {{getErasureCodingPolicies}} are also using the proto? Could you also elaborate a bit why this means the new filed only breaks compatibility during upgrades? IIUC, the incompatibility comes from a newer client (a jar after this commit) accessing an older NN (a jar before this commit), right? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186205#comment-15186205 ] Zhe Zhang commented on HDFS-7866: - Thanks for pointing out the change in protobuf 3. Yes I think we can always put the validation logic on protobuf receivers. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186089#comment-15186089 ] Jing Zhao commented on HDFS-7866: - I checked the current code again. {{ErasureCodingPolicyProto}} is only used in getFileStatus RPC thus looks like the new field will only break wire compatibility during rolling upgrade. I think this is acceptable for EC which is only in trunk. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186073#comment-15186073 ] Jing Zhao commented on HDFS-7866: - There are already too many debates about optional and required. Maybe that's why these two keywords are removed from protobuf 3. To me we should use optional for all the fields in ErasureCodingPolicyProto. This change can be done separately. But I think at least here we should use optional for the new field. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185855#comment-15185855 ] Zhe Zhang commented on HDFS-7866: - Thanks Jing for the review. Changing the proto field to optional is a valid concern. However, from a schema point of view, having the ID as optional looks a little weird -- it should be the "primary key" of the policy in the table. When receiving a policy proto, it's helpful to verify it with the policy table for diagnosis. It's also a little awkward to have {{BlockStoragePolicy#id}} as required but EC policy ID as optional. I think the pre-release incompatibility is probably worth it. But LMK if you agree. bq. And if I understand the above discussion correctly, we will address the INodeFile constructor (and the toLong method) in a follow-on jira, right? Yes that's the plan. Actually I plan to work on the cleanup JIRA. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1518#comment-1518 ] Jing Zhao commented on HDFS-7866: - The patch also LGTM. Thanks for the great work, [~lirui]. My only comment is about the change in hdfs.proto: {code} 340 required uint32 id = 4; // Actually a byte - only 8 bits used {code} Although EC has not been officially released yet, some ppl may already adopt it in test/production cluster considering the feature has been in 3.0 for a while. Adding a required field is an incompatible change for them. Thus it's better to use optional here. And if I understand the above discussion correctly, we will address the INodeFile constructor (and the toLong method) in a follow-on jira, right? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185526#comment-15185526 ] Zhe Zhang commented on HDFS-7866: - Sure Jing. Would be great to have your opinion. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185414#comment-15185414 ] Jing Zhao commented on HDFS-7866: - Thanks for the work and review, [~lirui], [~zhz] and [~walter.k.su]! Could you please hold on the commit today? I will also do a quick review of the patch. Thanks! > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185056#comment-15185056 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} hadoop-hdfs-project: patch generated 0 new + 152 unchanged - 10 fixed = 152 total (was 162) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 23s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 55s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 18s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 23s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | |
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184856#comment-15184856 ] Walter Su commented on HDFS-7866: - Sorry for the confusion, now the javadoc looks verbose. But thanks for trying. We can do some improvements in the follow-on JIRA. The last patch LGTM too. Thanks again, [~lirui]. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.13.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184300#comment-15184300 ] Rui Li commented on HDFS-7866: -- Thanks Walter for the suggestions. bq. separating the logic of manipulation of the 12-bits. 2 sets of set/get methods for them. It makes sense to have 2 separate sets of get/set methods. But if we do it here, we'll be just repeating the code. What do you think? bq. We are sure policyID is strictly <=7-bits right? I think 128 policies is enough for now, and we can just consider negative ID as invalid. If we have more policies in the future, we can support negative IDs which means we can have 256 different IDs. Then we can think about what to return if we call getErasureCodingPolicyID for non-EC files. Maybe throw an exception? [~zhz], please hold on the patch. The latest {{TestEditLog}} failure may be related. I'll look into it. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184267#comment-15184267 ] Walter Su commented on HDFS-7866: - 1. Not only javadoc, what I mean was separating the logic of manipulation of the 12-bits. 2 sets of set/get methods for them. Now the cuts are both (1,11), it's just the meanings of each part are different. But what if in the future the cut is different? You have the same concern about unifying set method: bq. 3. The biggest concern is INodeFile constructor – related to that, the toLong method. Currently when isStriped, we just interpret replication as the EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. By the way, if we're planning use unified cut (1,11) for both of them, why bother having one enum item BLOCK_LAYOUT_AND_REDUNDANCY(12-bits) and do the bit masking by myself, insead of 2 enum items as before which does the bit masking for us. Some nits: 1. {{LAYOUT_BIT_WIDTH}}, {{MAX_REDUNDANCY}} can be private inside {{HeaderFormat}}. 2. {code} /** * @return The ID of the erasure coding policy on the file. -1 represents no * EC policy. */ @VisibleForTesting @Override public byte getErasureCodingPolicyID() { if (isStriped()) { return (byte) HeaderFormat.getReplication(header); } return -1; } {code} {code} // check if the file has an EC policy ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp. getErasureCodingPolicy(fsd.getFSNamesystem(), existing); if (ecPolicy != null) { replication = ecPolicy.getId(); } {code} We are sure policyID is strictly <=7-bits right? Casting an ID with value >=128 to byte becomes negative, then the logic gets wild. Vice versa. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183969#comment-15183969 ] Zhe Zhang commented on HDFS-7866: - Thanks Rui for updating the patch. The latest version (v12) LGTM. I also like Walter's idea on the Javadoc. Interesting thought on EC with contiguous layout (phase 2). [~walter.k.su] Are you OK with committing the v12 patch and making the Javadoc change in the follow-on JIRA? As discussed above, we have a few cleanups to do, including adding a new constructor with policy ID. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183118#comment-15183118 ] Walter Su commented on HDFS-7866: - What do you think let it diverge instead of forcing unification? It takes more time to understand the new format from the code. I think add some javadoc would be nice. How about like this: {noformat} /** * Bit format: * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY] * [48-bit preferredBlockSize] * * BLOCK_LAYOUT_AND_REDUNDANCY format for replicated block: * 0 [11-bit replication] * * BLOCK_LAYOUT_AND_REDUNDANCY format for striped block: * 1 [11-bit ErasureCodingPolicy ID] * */ {noformat} I think getErasureCodingPolicyID() don't have to re-use getReplication(long header). Even though now they are both 11-bits. In the future, We might keep split 11-bit ec policy ID. The 2 methods keep diverging. I guess something like, {noformat} /** * Bit format: * [4-bit storagePolicyID][12-bit BLOCK_LAYOUT_AND_REDUNDANCY] * [48-bit preferredBlockSize] * * BLOCK_LAYOUT_AND_REDUNDANCY format for non-ec block: * 0 [11-bit replication] * * BLOCK_LAYOUT_AND_REDUNDANCY format for ec striped block: * 10 [4-bit replication][6-bit ErasureCodingPolicy ID] * * BLOCK_LAYOUT_AND_REDUNDANCY format for ec contiguous block: * 11 [4-bit replication][6-bit ErasureCodingPolicy ID] * */ {noformat} And I think we should reserve some high-value ID for custom policy. And reserve some for unknown policy which might intergrated(hard-coded) in the future. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.12.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182845#comment-15182845 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} hadoop-hdfs-project: patch generated 0 new + 152 unchanged - 10 fixed = 152 total (was 162) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_74. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 49s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | |
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180373#comment-15180373 ] Zhe Zhang commented on HDFS-7866: - bq. We're using 11 bits to store the policy ID, which means it could excess a byte (replication factor is also a short). What do you think? Right, on the {{INodeFile}} level we can't save space by switching to {{byte}}. But by constraining the ID to be a byte we are saving memory on block level. I don't think we will possibly support more than 256 EC policies in the foreseeable future; so maybe set it to byte now and consider changing to short when there's a clear need? Pinging [~drankye] and [~andrew.wang] for more insights here. bq. I also would like to know your opinions about randomly choosing a policy in the tests. Very interesting idea. I think we should do a follow-on {{test}} JIRA. And yes I think we are safe to switch back to RS-6-3 in the next rev. bq. I think essentially it's hacky because we allow user to specify the replication factor when creating a file but then overwrite it to store the EC policy ID. Fixing this requires changing the API, which I think is unacceptable. I don't think there's a compatibility issue here, because we are only changing the {{INodeFile}} constructor with the {{isStriped}} flag. I think we can just change the boolean to a byte, representing the EC policy ID. But given the size of the current patch I recommend we leave it as follow-on. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, > HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179541#comment-15179541 ] Rui Li commented on HDFS-7866: -- Thanks Zhe for your comments. 1. We're using 11 bits to store the policy ID, which means it could excess a byte (replication factor is also a short). What do you think? 2. This is to make the tests use the new policy to find any potential issues. If there's no further comments, I'll revert it to RS-6-3 in next update. I also would like to know your opinions about randomly choosing a policy in the tests. You can refer to my previous [discussions|https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=15171320=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15171320] with Kai. 3. I agree we can do this in a follow-on. I think essentially it's hacky because we allow user to specify the replication factor when creating a file but then overwrite it to store the EC policy ID. Fixing this requires changing the API, which I think is unacceptable. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, > HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179513#comment-15179513 ] Zhe Zhang commented on HDFS-7866: - Thanks Rui! I think we are pretty close. # To save mem overhead we should make {{ErasureCodingPolicy#id}} itsef a byte. Then we can do a follow-on task to change {{BlockInfoStriped#ecPolicy}} to be the policy ID. # Below is a typo? Default should be the RS-6-3 policy? {code} public static ErasureCodingPolicy getSystemDefaultPolicy() { // make this configurable? return SYS_POLICY2; } {code} I think it's fine to use hard-coded default for the initial release of EC. It's always easy to make it configurable later, if we see a request. But I think we can keep the comment there as a TODO. # The biggest concern is {{INodeFile}} constructor -- related to that, the {{toLong}} method. Currently when {{isStriped}}, we just interpret {{replication}} as the EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. How about adding a TODO here and file a follow-on? I think what we need to do is to add some util methods to convert both 0 + {{replicationFactor}} and 1 + {{ecPolicy}} into a {{short}} typed variable, like {{blockLayoutRedudancy}}, and pass it to the constructor. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, > HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177063#comment-15177063 ] Rui Li commented on HDFS-7866: -- Test failures cannot be reproduced. I'll fix the checkstyle in next update. (we need at least revert the default policy back to RS-6-3 and make sure the tests pass) > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, > HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175448#comment-15175448 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s {color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 153 unchanged - 10 fixed = 154 total (was 163) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 44s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 4s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color}
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175128#comment-15175128 ] Zhe Zhang commented on HDFS-7866: - And yes, {{RS_6_3_POLICY_ID}} sounds good. If we later support multiple cell sizes we can reflect that in naming as well. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175122#comment-15175122 ] Zhe Zhang commented on HDFS-7866: - Sorry for the confusion Rui. I meant some text-base illustration like below. I think it will be useful to illustrate the header format for both contiguous and striped blocks. If it's still confusing, feel free to skip it and I'll be happy to do it as a follow-on. {code} // StripedBlockUtil * | < Block Group > | <- Block Group: logical unit composing * | |striped HDFS files. * blk_0 blk_1 blk_2 <- Internal Blocks: each internal block *| | | represents a physically stored local *v v v block file * +--+ +--+ +--+ * |cell_0| |cell_1| |cell_2| <- {@link StripingCell} represents the * +--+ +--+ +--+ logical order that a Block Group should * |cell_3| |cell_4| |cell_5| be accessed: cell_0, cell_1, ... * +--+ +--+ +--+ * |cell_6| |cell_7| |cell_8| * +--+ +--+ +--+ * |cell_9| * +--+ <- A cell contains cellSize bytes of data {code} > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174861#comment-15174861 ] Rui Li commented on HDFS-7866: -- Thanks Zhe for the review and comments! Forgive my ignorance but what do you mean by ASCII art? As to the policy IDs, I think we can put them in {{HdfsConstants}} and name as {{RS_6_3_POLICY_ID}}. Sounds good? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174741#comment-15174741 ] Zhe Zhang commented on HDFS-7866: - Thanks Rui, very nice work here! I only finished reviewing the {{INodeFile}} header encoding logic. The main logic LGTM overall. Just some recommendations on code structure: # The below is not very accurate. For striped blocks, the tail 11 bits do not only represent redundancy. E.g. there could be different coders like RS, HH. 3-2 coding and 6-4 have same level of redundancy but they have different trade-offs. I see the intention here is to unify the terminology for contiguous and striped blocks. But I think it is pretty hard. {code} /** * Number of bits used to encode block layout type. * Different types can be replica or EC */ public static final int LAYOUT_BIT_WIDTH = 1; /** * Number of bits used to encode block redundancy. * For replicated block, the redundancy is the replication factor; * for erasure coded block, the redundancy is the EC policy's ID. */ public static final int REDUNDANCY_BIT_WIDTH = 11; {code} # So instead of the above, I think we can keep the {{LAYOUT_BIT_WIDTH}}, and then explicitly parse the {{BLOCK_LAYOUT_AND_REDUNDANCY}} section for both striped and contiguous blocks. To avoid repeating code we can add a util method {{maskLayoutBit}}. Not sure if it's worth it, since the code is very simple. # In the "Bit format:" Javadoc we should also explain the {{BLOCK_LAYOUT_AND_REDUNDANCY}} section more clearly. Some ASCII art here will be really helpful. Also, {{SYS_POLICY1_ID}} and {{SYS_POLICY2_ID}} look a little hacky. Can we do something similar to {{BlockStoragePolicySuite#createDefaultSuite}} and create some constants? We can also make it a byte to save space. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171824#comment-15171824 ] Rui Li commented on HDFS-7866: -- The latest failures are not related. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171647#comment-15171647 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} hadoop-hdfs-project: patch generated 0 new + 154 unchanged - 9 fixed = 154 total (was 163) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 50s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | |
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171367#comment-15171367 ] Kai Zheng commented on HDFS-7866: - bq. The methods in INodeFileAttributes are all public so not sure why it's complaining the public is redundant. Methods defined in Java interface are public by default, that's why it complains. I guess you can remove all the modifier to fix it. bq. One way may be to make all the tests get the EC policy from StripedFileTestUtil, which in turn picks a policy randomly from all system policies. This can be done in a separate JIRA of course. Sounds good to update tests separately. We may also need to consider this per test: 1) the system default policy should be ensured to get tested in all cases; 2) some tests may consider multiple policies co-existing or co-working together. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171326#comment-15171326 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-7866 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790395/HDFS-7866.9.patch | | JIRA Issue | HDFS-7866 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14648/console | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167653#comment-15167653 ] Hadoop QA commented on HDFS-7866: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 25s {color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 160 unchanged - 3 fixed = 161 total (was 163) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 2s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 6s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} |
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129089#comment-15129089 ] Zhe Zhang commented on HDFS-7866: - [~lirui] [~drankye] Thanks for the discussions. bq. As to policy ID mapping, do you mean we should pre-define the IDs for system EC policies, like what we do in HdfsConstants for storage policies? Yes I think we should do this for the current phase. As Kai suggested above, and quoted Andrew's comment, I think full pluggability is still a pretty remote target. When we do implement customer-pluggable policies, we should revisit the issue together with block storage policies. bq. Maybe 7 bits for EC policies (up to 128 ones) and 4 bits for repl factor (up to 16 sounds enough for striped blocks). Well I think this decision can be left open now. Right now we can just use the 11 bits for EC policies, and then decide how many to squeeze for repl factor. I've seen some special cases where the repl factor of some contiguous blocks are tuned high so all DNs in the cluster will have them. But for striped blocks, maybe that's not a requirement anymore. So 16 might be OK. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127923#comment-15127923 ] Kai Zheng commented on HDFS-7866: - bq. For striped blocks, we will possibly support replication factor in the future. At that time we can have 4 bits for EC policy (16) and 7 for repl factor (128). Sounds good to support repl for striped blocks as well. Maybe 7 bits for EC policies (up to 128 ones) and 4 bits for repl factor (up to 16 sounds enough for striped blocks). Please help clarify if I misunderstood. Thanks. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127885#comment-15127885 ] Rui Li commented on HDFS-7866: -- Thanks [~zhz] for pointing me to the discussion and {{BlockStoragePolicySuite}}. I'll update the patch once we reach an agreement. As to policy ID mapping, do you mean we should pre-define the IDs for system EC policies, like what we do in {{HdfsConstants}} for storage policies? If we want to support custom policies later, we still need a way to generate unique IDs right? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127930#comment-15127930 ] Kai Zheng commented on HDFS-7866: - bq. If we want to support custom policies later, we still need a way to generate unique IDs right? I agree we can do the simpler approach here to move on. Later (maybe in a long future) we can think about how to customize an EC policy where there is real system requirement. Also please ref. [the comment made by Andrew here|https://issues.apache.org/jira/browse/HDFS-7337?focusedCommentId=14632005=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14632005]. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127149#comment-15127149 ] Zhe Zhang commented on HDFS-7866: - Thanks Rui for the work. I think HDFS-9658 is almost ready. I think the below issues are worth more discussions: # Under HDFS-8833 we [discussed | https://issues.apache.org/jira/browse/HDFS-8833?focusedCommentId=14718396=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14718396] the idea of encoding {{replication}} and {{ecPolicy}} together. Let's revisit the discussion and reach an agreement here. [~jingzhao], [~andrew.wang], [~walter.k.su], would be great to have your opinions here. {code} // For erasure coded files, replication is used to store ec policy id ... public short getErasureCodingPolicyID() { if (isStriped()) { return HeaderFormat.getReplication(header); } return -1; } {code} Rather than directly reusing the {{REPLICATION}} section, I think we should combine the current {{REPLICATION}} and {{IS_STRIPED}} sections and give it a meaningful name. Something like {{BLOCK_LAYOUT_AND_REDUNDANCY}}. Then we have decoding methods to retrieve replication factor, EC policy, and so forth from it. I think 12 bits is enough to encode all possible layout/redundancy policies. For contiguous blocks, we have 11 bits for repl factor, up to 2048. For striped blocks, we will possibly support replication factor in the future. At that time we can have 4 bits for EC policy (16) and 7 for repl factor (128). # About policy IDs, should we simplify the mapping with a similar approach as {{BlockStoragePolicySuite}}? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105209#comment-15105209 ] Rui Li commented on HDFS-7866: -- Thanks Kai for the suggestions. I just filed HDFS-9658 to break this task into small pieces. Will create more as we progress. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, > HDFS-7866.7.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)