[ https://issues.apache.org/jira/browse/HDFS-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969004#comment-14969004 ]
Brahma Reddy Battula commented on HDFS-7526: -------------------------------------------- When listing a large directory from the command line using the default heap configuration, FsShell often runs out of memory. This is because all stats of the entries under the directory need to be in memory before printing them.. Currently FsShell use {{fs.listStatus[]}} and do processPaths() itself to iterate direcotory,Instead FsShell should use {{FileSystem.listStatusIterator()}} . > SetReplication OutOfMemoryError > ------------------------------- > > Key: HDFS-7526 > URL: https://issues.apache.org/jira/browse/HDFS-7526 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: Philipp Schuegerl > > Setting the replication of a HDFS folder recursively can run out of memory. > E.g. with a large /var/log directory: > hdfs dfs -setrep -R -w 1 /var/log > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:2694) > at java.lang.String.<init>(String.java:203) > at java.lang.String.substring(String.java:1913) > at java.net.URI$Parser.substring(URI.java:2850) > at java.net.URI$Parser.parse(URI.java:3046) > at java.net.URI.<init>(URI.java:753) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > at org.apache.hadoop.fs.Path.<init>(Path.java:116) > at org.apache.hadoop.fs.Path.<init>(Path.java:94) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) > at > org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v6.3.4#6332)