[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-08-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103861#comment-14103861
 ] 

Kihwal Lee commented on HDFS-6293:
--

[~fnie]  Please see the release note. This is available in release 2.5.0.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-08-20 Thread Nie Gus (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103647#comment-14103647
 ] 

Nie Gus commented on HDFS-6293:
---

Here is one user case to process OIV output, we export the fsimage to Delimited 
Text files, it contain full pathname, filesize, block number, quota 
information, then we can easily use HIVE or PIG to analysis the data or get the 
hdfsdu data. 

However, the oiv on PB is not support it now, we really need this function back 
since there should be no such complex tec problem to do this. 

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992501#comment-13992501
 ] 

Haohui Mai commented on HDFS-6293:
--

bq. In the results above, the amount of memory on the machine is far larger 
than the image so everything happens in memory and seeks are free.

Reads on fsimage are mostly sequential, so it really doesn't matter whether the 
whole fsimage can fit into the memory or not.

bq. Can you run an experiment with a large fsimage (25G or so) with a 
representative fs hierarchy (not totally flat) and then generate DB and convert 
to LSR on a smaller machine (16G or so)?

The fsimage that I've experimented with originates from a production cluster. 
It was in the old format which requires a big machine to do convert it to a 
PB-based fsimage. I have to strip it down to fit it into my machine. Please see 
HDFS-5698 on how the image is generated. If you can send me your PB-based 
fsimage then I can experiment with it.

Since the image comes from a production cluster, the fs hierarchy is definitely 
not flat. I generated the DB by in a Java VM with 22G heap.


> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999012#comment-13999012
 ] 

Hudson commented on HDFS-6293:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HDFS-6293. Issues with OIV processing PB-based fsimages. Contributed by Kihwal 
Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594439)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageSerialization.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DelimitedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DepthCounter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FileDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/IndentedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/NameDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TextWriterImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenod

[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998220#comment-13998220
 ] 

Hudson commented on HDFS-6293:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
HDFS-6293. Issues with OIV processing PB-based fsimages. Contributed by Kihwal 
Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594439)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageSerialization.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DelimitedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DepthCounter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FileDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/IndentedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/NameDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TextWriterImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha

[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996839#comment-13996839
 ] 

Hadoop QA commented on HDFS-6293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644653/HDFS-6293.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1282 javac 
compiler warnings (more than the trunk's current 1275 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6894//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6894//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6894//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6894//console

This message is automatically generated.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996528#comment-13996528
 ] 

Kihwal Lee commented on HDFS-6293:
--

I have filed HDFS-6384 for further discussion and design of the alternative 
offline file system analysis support.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.trunk.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997584#comment-13997584
 ] 

Akira AJISAKA commented on HDFS-6293:
-

I found "hdfs oiv_legacy" cannot be executed.
{code}
[root@trunk ~]# hdfs oiv_legacy
Error: Could not find or load main class oiv_legacy
{code}
I filed HDFS-6400 and attached a patch.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998182#comment-13998182
 ] 

Hudson commented on HDFS-6293:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
HDFS-6293. Issues with OIV processing PB-based fsimages. Contributed by Kihwal 
Lee. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594439)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageSerialization.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DelimitedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/DepthCounter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/FileDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageLoaderCurrent.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/ImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/IndentedImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/NameDistributionVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TextWriterImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/n

[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995711#comment-13995711
 ] 

Haohui Mai commented on HDFS-6293:
--

bq. If we only think about this, we can deprecate OIV and NN's ability to save 
the old format, once we have a long term alternative solution.

My only concern is that the legacy fsimage / oiv are somewhat broken already, 
they are slightly out of dated and contain incomplete information. For example, 
features like ACL and rolling upgrades are in the new PB-based fsimage only.As 
the system continue to evolve down the road (e.g., HDFS-2006), the use cases of 
the legacy fsimage / oiv are limited to what we have right now, and it's 
unlikely that the new use cases will be addressed.

Thanks to PB and the new format, I think we have a much better story on the 
compatibility of the fsimage. I think it might be worthwhile to consider to 
keep the format stable in 2.x and to externalize the responsibility of 
analyzing the fsimage.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993944#comment-13993944
 ] 

Haohui Mai commented on HDFS-6293:
--

Thanks [~kihwal], please feel free to modify the patch.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993859#comment-13993859
 ] 

Kihwal Lee commented on HDFS-6293:
--

Thanks for the patch, Haohui. 
I tested the output generated by it. I modified the patch for easier testing.  
It took 9 minutes for NN to read in a 14GB PB-based fsimage with 140M files & 
directories.  Saving in the old format took a little less than 5 minutes.   
Adding few minutes to SBN's checkpointing time doesn't seem terrible. I was 
then able to process the image using the 2.3 OIV. 

I will post a patch with little bit of tweaks.

bq. Note that the v2 patch only intends to be committed to branch-2. For trunk 
it might make more sense to export the records in the fsimage directly 
processing tools like hive.

We now have more time to think about alternatives. Once we have a better 
design, we can introduce it in branch-2 and deprecate this approach.



> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996968#comment-13996968
 ] 

Kihwal Lee commented on HDFS-6293:
--

Thanks for the review, Suresh. I will commit this after precomit comes back.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996873#comment-13996873
 ] 

Kihwal Lee commented on HDFS-6293:
--

I will fix all warnings except the two coming from FSImage due to the use of 
deprecated old image saving methods. 

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997142#comment-13997142
 ] 

Kihwal Lee commented on HDFS-6293:
--

The failed test is unrelated to this JIRA. It will be fixed in HDFS-6257.
The two warnings (deprecation) are expected.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997051#comment-13997051
 ] 

Hadoop QA commented on HDFS-6293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12644696/HDFS-6293.v2.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1277 javac 
compiler warnings (more than the trunk's current 1275 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6896//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6896//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6896//console

This message is automatically generated.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996582#comment-13996582
 ] 

Hadoop QA commented on HDFS-6293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644631/HDFS-6293.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6892//console

This message is automatically generated.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.trunk.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996922#comment-13996922
 ] 

Suresh Srinivas commented on HDFS-6293:
---

+1 for the patch. Thanks [~hmai] and [~kihwal] for making these changes.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.branch-2.patch, 
> HDFS-6293.trunk.patch, HDFS-6293.trunk.patch, HDFS-6293.v2.branch-2.patch, 
> HDFS-6293.v2.trunk.patch, HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996649#comment-13996649
 ] 

Kihwal Lee commented on HDFS-6293:
--

Ouch. The patch applies to trunk but it introduces a duplicate import in the 
test case. I will upload separate patches.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, HDFS-6293.trunk.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996442#comment-13996442
 ] 

Kihwal Lee commented on HDFS-6293:
--

Here is recap.
* oiv_legacy for debugging and inspecting old fsimages. As long as NN is 
capable of reading and upgrading from it, we should keep it.
* NN's ability to save in old fsimage format:  This should go away once we have 
a more sustainable solution. Postprocessing ([~wheat9] 's proposal) will cover 
many use cases.  We may need something else. This is to be determined.
* Keep adding features to the old fsimage format: Please don't!
* Changing the layout of the new PB-based fsimage back to the old way: Heck no!

We will deprecate NN's ability to save in the old fsimage format, once we agree 
on what the long term solution is. We should create a separate JIRA and 
continue discussion.

As [~sureshms] suggested, I will prepare the patch for trunk as well with 
[~ajisakaa]'s  comment incorporated.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995444#comment-13995444
 ] 

Kihwal Lee commented on HDFS-6293:
--

bq. can you please mark the class that are brought back deprecated?
I am not sure deprecating it now is a good idea.  There are two apsects in this.
* Ability to read the old fsimage format in OIV. This has always been the case 
for OIV. Now it will be the combination of oiv and oiv_legacy that will 
continue this.   As long as NN can read/upgrade old images, there is no reason 
to hurry and remove this capability from oiv.
* Support for the use case built around the old oiv.  Users won't care much 
about the image format as long as they can obtain what they want, using 
reasonable and comparable amount of resources as before.  If we only think 
about this, we can deprecate OIV and NN's ability to save the old format, once 
we have a long term alternative solution.


> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-12 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995668#comment-13995668
 ] 

Suresh Srinivas commented on HDFS-6293:
---

I agree with [~kihwal] on deprecation. Is this also going into trunk? I think 
first it should be committed to trunk and then branch-2. If we are in a 
position to remove it from trunk later, then it can be removed later. Thoughts?

+1 for the patch with Jenkins +1 (if it is going to trunk as well).

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-12 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995480#comment-13995480
 ] 

Akira AJISAKA commented on HDFS-6293:
-

The patch looks good to me. Two comments about oiv_legacy:
# -maxSize and -step options will fail (originally reported by HDFS-5866)
# missing '\n' in the usage (HDFS-5864)

I'm thinking the main purpose of the issue is to bring back the legacy code to 
keep backward compatibility, so we can fix them in a separate jira.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995335#comment-13995335
 ] 

Haohui Mai commented on HDFS-6293:
--

The patch looks good to me. One nit is that can you please mark the class that 
are brought back deprecated?

+1 once it is addressed.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, 
> HDFS-6293_sbn_ckpt_retention.patch, 
> HDFS-6293_sbn_ckpt_retention_oiv_legacy.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-11 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992502#comment-13992502
 ] 

Haohui Mai commented on HDFS-6293:
--

The v2 patch simply brings back the old saver of the fsimage, so that the SNN 
can save the fsimage into the -51 format, which allows external tools continue 
to work.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, HDFS-6293.001.patch, 
> HDFS-6293.002-save-deprecated-fsimage.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990204#comment-13990204
 ] 

Haohui Mai commented on HDFS-6293:
--

bq. There is existing apps that use a custom Visitor similar to lsr. It outputs 
directory entries with full path and list of blocks for files.

[~kihwal], can you please elaborate it? If you're talking about use cases like 
hdfs-du, there is no need to construct the whole namespace from bottom up. 
Scanning through the records would be sufficient.

bq. That was the first thing I thought about doing, but the processing time 
matters too.

It might not be as bad as you thought. I ran an experiments to see how much 
time is required to convert an fsimage to a level db on an 8-core Xeon E5530 
CPU @ 2.4GHz, 24G memory, 2TB SATA 3 drive @ 7200 rpm. The machine is running 
RHEL 6.2, Java 1.6. The numbers reported below are comparable to the numbers 
reported in HDFS-5698.

|Size in Old|512M|1G|2G|4G|8G| 
|Size in PB|469M|950M|1.9G|3.7G|7.0G| 
|Converting to LevelDB (ms)|30505|56531|121579|373108|1047121|

The additional latency for a 8G fsimage is around 15mins, which looks 
reasonable for me for the use cases of an offline tool.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989540#comment-13989540
 ] 

Kihwal Lee commented on HDFS-6293:
--

bq. To demonstrate my points, I'm attaching a patch which stores the current 
PB-based fsimage into a LevelDB, and perform lsr on top of the LevelDB.

That was the first thing I thought about doing, but the processing time matters 
too. 

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989427#comment-13989427
 ] 

Hadoop QA commented on HDFS-6293:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643332/HDFS-6293.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6812//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6812//console

This message is automatically generated.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-05 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989348#comment-13989348
 ] 

Haohui Mai commented on HDFS-6293:
--

Just to recap:

# The requirement is to build an offline tool that can process PB-based fsimage.
# The namespace is mostly a hierarchal structure with snapshots. That is the 
exact reason why the PB-based fsimage have moved towards a record based storage.
# It is useful to recover the hierarchal structure in the namespace in some use 
cases. Based on design choices made in (2), any in-memory processing algorithm 
requires Theta(n) memory, where n is the number of inodes. It requires too much 
resources.
# Various solutions to leverage the resource on SNN have been proposed.

Here are my two cents:

# Though the current sets of tools do load the whole fsimage into memory and 
process it, there is no reason that any offline tool has to be implemented in 
that way. For example, building an index can solve the above use cases.
# The namespace is no longer a tree with snapshots. Forcing it into a 
hierarchal structure sometimes requires fitting square pegs through round pipes.
# Note that the goal of SBN / SNN are to improve the reliability of the system. 
The simpler the code is, the more likely the code can be throughly reasoned 
about and become more reliable. Personally I don't like any solutions that add 
complexity to SBN / SNN to solve the use case of offline image viewer. It 
doesn't seem right to solve an offline problem using an online machine that is 
accounted for the reliability of the system.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-6293.000.patch, Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988538#comment-13988538
 ] 

Suresh Srinivas commented on HDFS-6293:
---

bq. I created two subtasks for PB and an HTTP interface.
Please make them related jiras.

bq.  have looked at the PB fsimage also, even with JSON we'd need to do similar 
things to avoid having a giant array
It depends on how you write it. In fact I am okay just writing in format that 
the old fsimage wrote. Complete directory path followed by file information. I 
am also okay writing it in the first cut a text file, since this is holding up 
rolling out release onto a cluster and valuable testing that come out of it.

bq. we may write out the fsimage, copy it over, and then fail while writing out 
the second listing. If edit logs get cleaned up in the meantime, we might have 
a gap between the listing and the start of the edit logs.
I think the use case you are talking about it certainly different. All that I 
have seen is, people want to be able to process namespace for reporting 
purposes. There is no guarantee that for every fsimage it should be created etc.

bq. I'm not sure how to interpret this. I just feel that Marcelo or myself 
could have shared this feedback more immediately over the higher-bandwidth 
medium of a phone, and we clearly had an interest in this JIRA since Marcelo's 
been commenting since the beginning. I'm not sure why you'd be offended that I 
asked that the rest of us be included in future phone calls.
You are overreacting. This is nothing different from conversations you have 
with your colleagues or customers. What is important is, the information 
relevant to others in the community is shared and they can participate in the 
discussion and get to comment. Which you have had an opportunity to do.



> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988498#comment-13988498
 ] 

Andrew Wang commented on HDFS-6293:
---

I created two subtasks for PB and an HTTP interface. I have looked at the PB 
fsimage also, even with JSON we'd need to do similar things to avoid having a 
giant array. This requires a bit of custom parsing with Jackson or whatever, so 
it's still extra work.

bq. This report has the same state as fsimage, given it is done right after the 
checkpoint.

My concern is in an HA environment, we may write out the fsimage, copy it over, 
and then fail while writing out the second listing. If edit logs get cleaned up 
in the meantime, we might have a gap between the listing and the start of the 
edit logs.

bq. Certainly, where possible. But you have all the information in the jira and 
have an opportunity to discuss it, right?

I'm not sure how to interpret this. I just feel that Marcelo or myself could 
have shared this feedback more immediately over the higher-bandwidth medium of 
a phone, and we clearly had an interest in this JIRA since Marcelo's been 
commenting since the beginning. I'm not sure why you'd be offended that I asked 
that the rest of us be included in future phone calls.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988435#comment-13988435
 ] 

Suresh Srinivas commented on HDFS-6293:
---

bq. PB would be preferable to JSON. I'd be interested to hear your reasoning 
why JSON is significantly easier; I figured since we already have PB in the 
build and experience using it, it wouldn't be that much work.
PB implementation for large number of objects as an array has read side issues 
and requires designing the protobuf more carefully and investment of time. 
[~wheat9] understands this better and can answer (or you can see the structure 
of current fsimage proto, where this is considered). If you want to pursue that 
direction you are welcome. You can add that by adding a new configuration for 
configuring output format. Doing it in JSON gives us a quick solution and is 
sufficient for use cases we are looking for.

bq. Can we provide some kind of REST API for fetching this extra listing file? 
This is preferable to manually finding the file and doing scp.
Good idea. Lets do it in another jira. Please create a related jira.

bq. What kinds of atomicity guarantees are there between the fsimage and this 
listing? We'd like to be able to take the listing and replay the edit log on 
top. Including the txid in the listing is also important for this work.
Not sure what your question means. This report has the same state as fsimage, 
given it is done right after the checkpoint. The printed report would include 
transaction id information.

bq. Will this also be done by other saveNamespaces besides checkpointing (i.e. 
"-saveNamespace" as well as at startup)?
Not sure if that is necessary. If it is, we can certainly add that in another 
jira. One thing that I should have mentioned is, currently this file exists 
only in standby and will not be shipped to active.

bq. I'd also appreciate if you posted any further call-ins to this JIRA
Certainly, where possible. But you have all the information in the jira and 
have an opportunity to discuss it, right?

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988414#comment-13988414
 ] 

Andrew Wang commented on HDFS-6293:
---

Hey Suresh,

This plan sounds generally good to me, thanks for working this out. I talked to 
our internal users, and had a few questions/comments.

- PB would be preferable to JSON. I'd be interested to hear your reasoning why 
JSON is significantly easier; I figured since we already have PB in the build 
and experience using it, it wouldn't be that much work.
- Can we provide some kind of REST API for fetching this extra listing file? 
This is preferable to manually finding the file and doing scp.
- What kinds of atomicity guarantees are there between the fsimage and this 
listing? We'd like to be able to take the listing and replay the edit log on 
top. Including the txid in the listing is also important for this work.
- Will this also be done by other saveNamespaces besides checkpointing (i.e. 
"-saveNamespace" as well as at startup)?

I'd also appreciate if you posted any further call-ins to this JIRA, since we'd 
like to be included in the future. Thanks!

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-05-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988321#comment-13988321
 ] 

Suresh Srinivas commented on HDFS-6293:
---

Here is the summary of a quick call I had with [~nroberts], [~kihwal], and 
[~wheat9].

Requirements for the tool:
- It should be able to print a consistent file system information. This rules 
out just doing ls -r from standby (lets assume standby support reads), a 
directory should not appear twice due to renames.
- The tool should print hierarchical namespace information to avoid having to 
use a process without a lot of memory to consume the information.

Here is the proposal:
- Add a flag (turned off by default) to print hierarchical namespace after 
checkpointing is complete in a configurable directory location
- This information will only be printed in the standby namenode
- Last configurable N number of such namespace information files will be 
retained

We did consider printing this information as protobuf. But printing large 
hierarchical information is not straightforward and takes time. In the interest 
of time, we will print this in Json or text (let me know what you think).

In future, we can have the output format of the tool configurable, possibly to 
protobuf. This tool can nicely develop in the future into including other stats 
related to namespace. [~kihwal], and [~wheat9], let me know if I got this right.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985707#comment-13985707
 ] 

Marcelo Vanzin commented on HDFS-6293:
--

[~ajisakaa] my modified parser does not have an upper-bound on the memory; 
because it still needs to load information about all inodes, it's still O(n) 
for the number of inodes in the image.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-30 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985395#comment-13985395
 ] 

Akira AJISAKA commented on HDFS-6293:
-

Created HDFS-6310 for output tokens, and attached a patch.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-30 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985368#comment-13985368
 ] 

Akira AJISAKA commented on HDFS-6293:
-

bq. It will be great if someone can come up with a standalone tool that allows 
dumping directory structure and content with, say, 1-2GB heap AND completes in 
comparable execution time.
The way seems to be the best, however, I'm okay with using more memory (e.g. 
10-20GB). I'm curious about [~vanzin]'s idea.
By the way,
bq. The 2.4.0 pb-fsimage does contain tokens, but OIV does not show any tokens.
I think the issue can be separated. I'll create a jira for tracking the issue.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984360#comment-13984360
 ] 

Kihwal Lee commented on HDFS-6293:
--

Another option may be adding an option for checkpointing (2NN, SBN and the 
checkpoint command) to write out an image in a format that is readily 
consumable. This image will be written out in addition to the real fsimage and 
only meant to be used by OIV. 

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984353#comment-13984353
 ] 

Kihwal Lee commented on HDFS-6293:
--

bq. One solution we can do is - Add an option to print directory tree 
information (along the lines ls -r) that works against fsimage.
It is still there in 2.4. It was removed by HDFS-6164 in 2.5.  Even if we add 
it back, the excessive memory requirement makes it useless.  

bq. We could also consider either building a tool that works efficiently in 
memory or reorganize the fsimage to make that possible.
I will be great if someone can come up with a standalone tool that allows 
dumping directory structure and content with, say, 1-2GB heap AND completes in 
comparable execution time.  Otherwise, we will have to rearrange the internal 
layout of fsimage similar to the previous format.

bq. Kihwal Lee, can you please provide the use cases you are using OIV for?
There is existing apps that use a custom Visitor similar to lsr. It outputs 
directory entries with full path and list of blocks for files.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983734#comment-13983734
 ] 

Suresh Srinivas commented on HDFS-6293:
---

OfflineImageViewer just dumps the fsimage in a readable format. In the past 
given hierarchical nature of the fsimage, the information printed was 
consumable. Now it is no longer so.

One solution we can do is - Add an option to print directory tree information 
(along the lines ls -r) that works against fsimage. Given that the information 
is printed is no longer dependent on fsimage structure itself, this can be 
backward compatible output (with the caveats tools having to deal with extra 
information for newly added features such as ACLs). Once this is in place, we 
can have backward compatibility expectations on that. What do you guys think? 
We could also consider either building a tool that works efficiently in memory 
or reorganize the fsimage to make that possible (hope we do not have to change 
fsimage, due to incompatibility issues).

[~kihwal], can you please provide the use cases you are using OIV for?


> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983412#comment-13983412
 ] 

Kihwal Lee commented on HDFS-6293:
--

bq. This is by design.
I understand that it has merits over the old way. But you cannot simply ignore 
existing use cases.  

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983362#comment-13983362
 ] 

Kihwal Lee commented on HDFS-6293:
--

[~va...@rededc.com.br]: Thanks for sharing your experience. That's certainly an 
improvement, but that's still too big and 140M is not the largest name space we 
have to deal with.  

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983361#comment-13983361
 ] 

Haohui Mai commented on HDFS-6293:
--

bq. Another issue is the complete change of format/content in OIV's XML output.

The XML format in both the legacy and the PB-base code intends to match the 
physical layout of the FSImage for fast processing. The layout of the FSImage 
is totally private, which means that there are very few compatibility 
guarantees that you can rely on. We should have clarify it early on.

bq.  It does not provide readily usable directory/file information as it used 
to in pre-2.4/protobuf versions.

This is by design. A format based on records instead of hierarchical structure 
is more robust (especially with snapshot), and it allows parallel processing. 
The rationale has been articulated in the document attached on HDFS-5698.

With a FSImage that is as big as yours, I suggest parsing the protobuf records 
directly and importing them to hive / pig for more efficient queries. This has 
been articulated in HDFS-5952.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983288#comment-13983288
 ] 

Marcelo Vanzin commented on HDFS-6293:
--

Hi Kihwal,

We have developed some code internally that mitigates (but does not eliminate) 
some of these problems. For an image with 140M entries it would need in the 
ballpark of 7-8GB of heap space, from my pencil-and-napkin calculations. Also, 
it does not generate entries in order like LsrPBImage does, and it's tailored 
for the use case of listing the contents of the file system (so it completely 
ignores things like snapshots).

(The reason it still requires a lot of memory is, as you note, that it needs to 
load information about all inodes in memory; our code is just a little smarter 
about what information it loads. I don't think it's possible to make it much 
better without changing the data in the fsimage itself.)

If people are ok with those limitations, we could clean up our code and post it 
as a patch.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983131#comment-13983131
 ] 

Kihwal Lee commented on HDFS-6293:
--

The 2.4.0 pb-fsimage does contain tokens, but OIV does not show any tokens.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
> Attachments: Heap Histogram.html
>
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983095#comment-13983095
 ] 

Kihwal Lee commented on HDFS-6293:
--

Outputting in the new XML format is fast and consumes little memory because it 
is essentially dumping what is in the image in order. It does not provide 
readily usable directory/file information as it used to in pre-2.4/protobuf 
versions. 

Using something like the "ls -l" format or any custom visitor for dumping file 
system tree will require loading of all inodes upfront and linking them 
afterwards.  This requires considerably larger amount of memory. The smallest 
footprint will be similar to NN's without triplets.  It is clearly 
unacceptable.   Reducing memory consumption at the price of considerably longer 
processing time is also unacceptable.

> Issues with OIV processing PB-based fsimages
> 
>
> Key: HDFS-6293
> URL: https://issues.apache.org/jira/browse/HDFS-6293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> There are issues with OIV when processing fsimages in protobuf. 
> Due to the internal layout changes introduced by the protobuf-based fsimage, 
> OIV consumes excessive amount of memory.  We have tested with a fsimage with 
> about 140M files/directories. The peak heap usage when processing this image 
> in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
> the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
> heap (max new size was 1GB).  It should be possible to process any image with 
> the default heap size of 1.5GB.
> Another issue is the complete change of format/content in OIV's XML output.  
> I also noticed that the secret manager section has no tokens while there were 
> unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
> they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)