[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983361#comment-13983361
 ] 

Haohui Mai commented on HDFS-6293:
----------------------------------

bq. Another issue is the complete change of format/content in OIV's XML output.

The XML format in both the legacy and the PB-base code intends to match the 
physical layout of the FSImage for fast processing. The layout of the FSImage 
is totally private, which means that there are very few compatibility 
guarantees that you can rely on. We should have clarify it early on.

bq.  It does not provide readily usable directory/file information as it used 
to in pre-2.4/protobuf versions.

This is by design. A format based on records instead of hierarchical structure 
is more robust (especially with snapshot), and it allows parallel processing. 
The rationale has been articulated in the document attached on HDFS-5698.

With a FSImage that is as big as yours, I suggest parsing the protobuf records 
directly and importing them to hive / pig for more efficient queries. This has 
been articulated in HDFS-5952.

> Issues with OIV processing PB-based fsimages
> --------------------------------------------
>
>                 Key: HDFS-6293
>                 URL: https://issues.apache.org/jira/browse/HDFS-6293
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to