[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891284#comment-13891284
 ] 

Todd Lipcon commented on HDFS-5698:
-----------------------------------

A few notes on the patch:

{code}
+        is = new FileInputStream(file);
+        if (is.read(magic) == magic.length
{code}

Should use IOUtils.readFully here

-----

Can we rename INodeSection and the other nested proto classes to end in "PB" or 
"Proto"? It's helpful when reading the code to distinguish the generated 
protobuf classes from the other structures, and given that these inner classes 
get imported, it's not always obvious.

----

Performance-wise, I think you can really improve things by re-using protobuf 
objects. In particular, rather than doing something like:

{code}
+      INodeSection.INodeReference ref = INodeSection.INodeReference
+          .parseDelimitedFrom(in);
+      return loadINodeReference(ref, dir);
{code}

you can make a thread-local INodeSection.INodeReference.Builder object (similar 
to how we use thread-local ops in the editlog loader code). Then use 
Builder.mergeDelimitedFrom instead of the static parseDelimitedFrom method. You 
can check isInitialized() after this to ensure that all of the required fields 
are present, and then use the builder itself to read the fields. This avoids 
repeated object allocation/deallocation costs without having to resort to 
manual parsing that you mention in the design doc.

The generated code also has a handy "FooProtoOrBuilder" interface that both the 
generated PB and its builder implement, with all of the appropriate getters. 
The code that actually handles constructing HDFS objects from PBs could easily 
take this interface.

----

For many of the repeated int64 fields, you should probably use the 
{{[packed=true]}} option in the protobuf definition. This saves a good amount 
of space and probably improves decoding performance as well.

----

One question: would existing ImageVisitor implementation classes continue to 
work against the PB-ified image? My reading of the patch is that they wouldn't, 
but would be nice to confirm.

----

I don't think any of the above needs to block the merge, but the 
format-breaking one (packed=true) should probably be done sooner rather than 
later.

> Use protobuf to serialize / deserialize FSImage
> -----------------------------------------------
>
>                 Key: HDFS-5698
>                 URL: https://issues.apache.org/jira/browse/HDFS-5698
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-5698-design.pdf, HDFS-5698.000.patch, 
> HDFS-5698.001.patch, HDFS-5698.002.patch, HDFS-5698.003.patch
>
>
> Currently, the code serializes FSImage using in-house serialization 
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization / 
> deserialization. The current code paths of serialization / deserialization 
> have spent a lot of effort on maintaining compatibility. What is worse is 
> that they are mixed with the complex logic of reconstructing the namespace, 
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage 
> is practically defined by the implementation. An bug in implementation means 
> a bug in the specification. Furthermore, it also makes writing third-party 
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
> the layout version every time. Bumping out layout version requires (1) the 
> users to explicitly upgrade the clusters, and (2) putting new code to 
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the 
> responsibility of serialization and reconstructing the namespace. The 
> protobuf files document the current format of the FSImage. The developers now 
> can add optional fields with ease, since the old code can always read the new 
> FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to