[ https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000099#comment-13000099 ]
Hairong Kuang commented on HDFS-1658: ------------------------------------- Adding parameter depth works. But still an application needs to call getFileInfo first. After it figures out the path is a directory, it then has to call getContentSummary to get the directory size. If we change FileStatus.length to represent directory size, one call getFileInfo is enough. > A less expensive way to figure out directory size > ------------------------------------------------- > > Key: HDFS-1658 > URL: https://issues.apache.org/jira/browse/HDFS-1658 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Hairong Kuang > Assignee: Hairong Kuang > > Currently in order to figure out a directory size, we have to list a > directory by calling RPC getListing and get the number of its children. This > is an expensive operation especially when a directory has many children > because it may require multiple RPCs. > On the other hand when fetching the status of a path (i.e. calling RPC > getFileInfo), the length field of FileStatus is set to be 0 if the path is a > directory. > I am thinking to change this field (FileStatus#length) to be the directory > size when the path is a directory. So we can call getFileInfo to get the > directory size. This call is much less expensive and simpler than getListing. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira