[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990204#comment-13990204 ]
Haohui Mai commented on HDFS-6293: ---------------------------------- bq. There is existing apps that use a custom Visitor similar to lsr. It outputs directory entries with full path and list of blocks for files. [~kihwal], can you please elaborate it? If you're talking about use cases like hdfs-du, there is no need to construct the whole namespace from bottom up. Scanning through the records would be sufficient. bq. That was the first thing I thought about doing, but the processing time matters too. It might not be as bad as you thought. I ran an experiments to see how much time is required to convert an fsimage to a level db on an 8-core Xeon E5530 CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers reported in HDFS-5698. |Size in Old|512M|1G|2G|4G|8G| |Size in PB|469M|950M|1.9G|3.7G|7.0G| |Converting to LevelDB (ms)|30505|56531|121579|373108|1047121| The additional latency for a 8G fsimage is around 15mins, which looks reasonable for me for the use cases of an offline tool. > Issues with OIV processing PB-based fsimages > -------------------------------------------- > > Key: HDFS-6293 > URL: https://issues.apache.org/jira/browse/HDFS-6293 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Kihwal Lee > Assignee: Haohui Mai > Priority: Blocker > Attachments: HDFS-6293.000.patch, Heap Histogram.html > > > There are issues with OIV when processing fsimages in protobuf. > Due to the internal layout changes introduced by the protobuf-based fsimage, > OIV consumes excessive amount of memory. We have tested with a fsimage with > about 140M files/directories. The peak heap usage when processing this image > in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting > the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of > heap (max new size was 1GB). It should be possible to process any image with > the default heap size of 1.5GB. > Another issue is the complete change of format/content in OIV's XML output. > I also noticed that the secret manager section has no tokens while there were > unexpired tokens in the original image (pre-2.4.0). I did not check whether > they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)