[ 
https://issues.apache.org/jira/browse/HDFS-11382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888059#comment-15888059
 ] 

Ewan Higgs commented on HDFS-11382:
-----------------------------------

{quote}
Ewan Higgs, my concern is that encoding whether a file is erasure coded in both 
the EC policy and the BlockTypeProto fields opens us up to possible incongruity 
between the two fields. Since I'm not proposing we do away with BlockType 
entirely, I double checked the Precondition checks we have in this patch, and 
it looks okay.
{quote}
protobuf doesn't have sum types so the best we can do is tag a message with a 
type and group the fields which should be distinct in each type. 

So what we have now is:
{code}
  message INodeFile {                                                           
    optional uint32 replication = 1;                                            
    optional uint64 modificationTime = 2;                                       
    optional uint64 accessTime = 3;                                             
    optional uint64 preferredBlockSize = 4;                                     
    optional fixed64 permission = 5;                                            
    repeated BlockProto blocks = 6;                                             
    optional FileUnderConstructionFeature fileUC = 7;                           
    optional AclFeatureProto acl = 8;                                           
    optional XAttrFeatureProto xAttrs = 9;                                      
    optional uint32 storagePolicyID = 10;                                       
    optional BlockTypeProto blockType = 11;               
    optional int32 erasureCodingPolicyID = 12;                      
  }   
{code}
What we might want is:

{code}
  message INodeFile {      
    message StripedFields { 
      optional int32 erasureCodingPolicyID = 1;
    }
    message  ProvidedFields { 
      optional string someField = 1;
    }
                                                     
    optional uint32 replication = 1;                                            
    optional uint64 modificationTime = 2;                                       
    optional uint64 accessTime = 3;                                             
    optional uint64 preferredBlockSize = 4;                                     
    optional fixed64 permission = 5;                                            
    repeated BlockProto blocks = 6;                                             
    optional FileUnderConstructionFeature fileUC = 7;                           
    optional AclFeatureProto acl = 8;                                           
    optional XAttrFeatureProto xAttrs = 9;                                      
    optional uint32 storagePolicyID = 10;                                       
    optional BlockTypeProto blockType = 11;   
    optional StripedFields stripedFields = 12;
    optional ProvidedFields providedFields = 13;
{code}

This approach is how directories, files, and symlinks are encoded as INodes:
{code}
  message INode {                                                               
    enum Type {                                                                 
      FILE = 1;                                                                 
      DIRECTORY = 2;                                                            
      SYMLINK = 3;                                                              
    };                                                                          
    required Type type = 1;                                                     
    required uint64 id = 2;                                                     
    optional bytes name = 3;                                                    
                                                                                
    optional INodeFile file = 4;                                                
    optional INodeDirectory directory = 5;                                      
    optional INodeSymlink symlink = 6;                                          
  }
{code}

> Persist Erasure Coding Policy ID in a new optional field in INodeFile in 
> FSImage
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-11382
>                 URL: https://issues.apache.org/jira/browse/HDFS-11382
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>             Fix For: 3.0.0-alpha3
>
>         Attachments: HDFS-11382.01.patch, HDFS-11382.02.patch, 
> HDFS-11382.03.patch, HDFS-11382.04.patch, HDFS-11382.05.patch
>
>
> For Erasure Coded files, replication field in INodeFile message is re-used  
> to store the EC Policy ID. 
> *FSDirWriteFileOp#addFile*
> {noformat}
>   private static INodesInPath addFile(
>       FSDirectory fsd, INodesInPath existing, byte[] localName,
>       PermissionStatus permissions, short replication, long 
> preferredBlockSize,
>       String clientName, String clientMachine)
>       throws IOException {
>     .. .. ..
>     try {
>       ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp.
>           getErasureCodingPolicy(fsd.getFSNamesystem(), existing);
>       if (ecPolicy != null) {
>         replication = ecPolicy.getId();   <===
>       }
>       final BlockType blockType = ecPolicy != null?
>           BlockType.STRIPED : BlockType.CONTIGUOUS;
>       INodeFile newNode = newINodeFile(fsd.allocateNewInodeId(), permissions,
>           modTime, modTime, replication, preferredBlockSize, blockType);
>       newNode.setLocalName(localName);
>       newNode.toUnderConstruction(clientName, clientMachine);
>       newiip = fsd.addINode(existing, newNode, permissions.getPermission());
> {noformat}
> With HDFS-11268 fix, {{FSImageFormatPBINode#Loader#loadInodeFile}} is rightly 
> getting the EC ID from the replication field and then uses the right Policy 
> to construct the blocks.
> *FSImageFormatPBINode#Loader#loadInodeFile*
> {noformat}
>       ErasureCodingPolicy ecPolicy = (blockType == BlockType.STRIPED) ?
>           ErasureCodingPolicyManager.getPolicyByPolicyID((byte) replication) :
>           null;
> {noformat}
> The original intention was to re-use the replication field so the in-memory 
> representation would be compact. But, this isn't necessary for the on-disk 
> representation. replication is an optional field, and if we add another 
> optional field for the EC policy, it won't be any extra space.
> Also, we need to make sure to have the appropriate asserts in place to make 
> sure both fields aren't set for the same INodeField.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to