[ 
https://issues.apache.org/jira/browse/HDFS-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969004#comment-14969004
 ] 

Brahma Reddy Battula commented on HDFS-7526:
--------------------------------------------

When listing a large directory from the command line using the default heap 
configuration, FsShell often runs out of memory. This is because all stats of 
the entries under the directory need to be in memory before printing them..

Currently FsShell use {{fs.listStatus[]}} and do processPaths() itself to 
iterate direcotory,Instead FsShell  should use 
{{FileSystem.listStatusIterator()}}  .



> SetReplication OutOfMemoryError
> -------------------------------
>
>                 Key: HDFS-7526
>                 URL: https://issues.apache.org/jira/browse/HDFS-7526
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Philipp Schuegerl
>
> Setting the replication of a HDFS folder recursively can run out of memory. 
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
>       at java.lang.String.<init>(String.java:203)
>       at java.lang.String.substring(String.java:1913)
>       at java.net.URI$Parser.substring(URI.java:2850)
>       at java.net.URI$Parser.parse(URI.java:3046)
>       at java.net.URI.<init>(URI.java:753)
>       at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:116)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:94)
>       at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
>       at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>       at 
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
>       at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>       at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>       at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to