[ https://issues.apache.org/jira/browse/HADOOP-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853019#action_12853019 ]
Eli Collins commented on HADOOP-6678: ------------------------------------- +1 HADOOP-6585 (Add FileStatus#isDirectory and isFile) (Replace usage of FileStatus#isDir()) > Propose some changes to FileContext > ----------------------------------- > > Key: HADOOP-6678 > URL: https://issues.apache.org/jira/browse/HADOOP-6678 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Reporter: Hairong Kuang > Fix For: 0.21.0, 0.22.0 > > > # Add a method Iterator<FileStatus> listStatus(Path), which allows HDFS > client not to have the whole listing in the memory, benefit more from the > iterative listing added in HDFS-985. Move the current FileStatus[] > listStatus(Path) to be a utility method. > # Remove methods isFile(Path), isDirectory(Path), and exists. > All these methods are implemented by calling getFileStatus(Path).But most > users are not aware of this. They would write code as below: > {code} > FileContext fc = ..; > if (fc.exists(path)) { > if (fc.isFile(path)) { > ... > } else { > ... > } > } > {code} > The above code adds unnecessary getFileInfo RPC to NameNode. In our > production clusters, we often see that the number of getFileStatus calls is > multiple times of the open calls. If we remove isFile, isDirectory, and > exists from FileContext, users have to explicitly call getFileStatus first, > it is more likely that they will write more efficient code as follow: > {code} > FileContext fc = ...; > FileStatus fstatus = fc.getFileStatus(path); > if (fstatus.isFile() { > ... > } else { > ... > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.