[ https://issues.apache.org/jira/browse/HDFS-11382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852374#comment-15852374 ]
Andrew Wang commented on HDFS-11382: ------------------------------------ Thanks for working on this Manoj! Overall looks good, some review comments: * Noticed we now have a BlockTypeProto as well as erasureCodingPolicyID now in the INodeFile PB. The history here is that we used to have an {{isStriped}} boolean, which turned into the BlockTypeProto enum in HDFS-10759. Now that we have this optional field, it seems like BlockTypeProto is unnecessary. [~ehiggs], any thoughts on this? * Related, I'm wondering if we should make {{isStriped}} just be {{return getBlockType() == STRIPED}}, looks equivalent to me. * FSImageFormatPBINode, it'd be better to check if the policy is present with {{f.hasErasureCodingPolicyID}} rather than getting based on if the block type is striped. This way we don't accidentally get the default value of 0. * INodeFile, I think having the combined {{replicationOrEcPolicyId}} parameter is confusing from the caller's perspective. It'd be better to have separate parameters that pass null or -1 if not set, and combine as appropriate in INodeFile.HeaderFormat#toLong (with requisite Precondition checks). This relates to the comment in toLong about this being hacky, I dug up this comment from Zhe during the initial review: {quote} The biggest concern is INodeFile constructor – related to that, the toLong method. Currently when isStriped, we just interpret replication as the EC policy ID. This looks pretty hacky. But it looks pretty tricky to fix. How about adding a TODO here and file a follow-on? I think what we need to do is to add some util methods to convert both 0 + replicationFactor and 1 + ecPolicy into a short typed variable, like blockLayoutRedudancy, and pass it to the constructor. {quote} Seems like this JIRA is a good time to fix the hack :) * FSImageFormatPBInode#buildINodeFile, it looks like we unconditionally set the EC policy ID. Shouldn't we only set when there is a policy set, similar to the AclFeature and XAttrFeature? A unit test for this would be good. * INodeFile#getFileReplication, IIRC S3 returns a replication factor of 1. Could you check this and use the same setting for our striped files? IMO 1 is the most reasonable value here, since each data block has a repl factor of one. The javadoc also needs to be updated. We also should check any for callers that assume this is the EC policy ID, if you have't yet. * Is the PBXMLWriter change intended for this JIRA? > Persist Erasure Coding Policy ID in a new optional field in INodeFile in > FSImage > -------------------------------------------------------------------------------- > > Key: HDFS-11382 > URL: https://issues.apache.org/jira/browse/HDFS-11382 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-11382.01.patch > > > For Erasure Coded files, replication field in INodeFile message is re-used > to store the EC Policy ID. > *FSDirWriteFileOp#addFile* > {noformat} > private static INodesInPath addFile( > FSDirectory fsd, INodesInPath existing, byte[] localName, > PermissionStatus permissions, short replication, long > preferredBlockSize, > String clientName, String clientMachine) > throws IOException { > .. .. .. > try { > ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp. > getErasureCodingPolicy(fsd.getFSNamesystem(), existing); > if (ecPolicy != null) { > replication = ecPolicy.getId(); <=== > } > final BlockType blockType = ecPolicy != null? > BlockType.STRIPED : BlockType.CONTIGUOUS; > INodeFile newNode = newINodeFile(fsd.allocateNewInodeId(), permissions, > modTime, modTime, replication, preferredBlockSize, blockType); > newNode.setLocalName(localName); > newNode.toUnderConstruction(clientName, clientMachine); > newiip = fsd.addINode(existing, newNode, permissions.getPermission()); > {noformat} > With HDFS-11268 fix, {{FSImageFormatPBINode#Loader#loadInodeFile}} is rightly > getting the EC ID from the replication field and then uses the right Policy > to construct the blocks. > *FSImageFormatPBINode#Loader#loadInodeFile* > {noformat} > ErasureCodingPolicy ecPolicy = (blockType == BlockType.STRIPED) ? > ErasureCodingPolicyManager.getPolicyByPolicyID((byte) replication) : > null; > {noformat} > The original intention was to re-use the replication field so the in-memory > representation would be compact. But, this isn't necessary for the on-disk > representation. replication is an optional field, and if we add another > optional field for the EC policy, it won't be any extra space. > Also, we need to make sure to have the appropriate asserts in place to make > sure both fields aren't set for the same INodeField. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org