[ https://issues.apache.org/jira/browse/HDFS-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
J.Andreina updated HDFS-8234: ----------------------------- Attachment: HDFS-8234.1.patch Attached an initial patch. Please review. > DistributedFileSystem and Globber should apply PathFilter early > --------------------------------------------------------------- > > Key: HDFS-8234 > URL: https://issues.apache.org/jira/browse/HDFS-8234 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: J.Andreina > Labels: newbie > Attachments: HDFS-8234.1.patch > > > HDFS-985 added partial listing in listStatus to avoid listing entries of > large directory in one go. If listStatus(Path p, PathFilter f) call is made, > filter is applied after fetching all the entries resulting in a big list > being constructed on the client side. If the > DistributedFileSystem.listStatusInternal() applied the PathFilter it would be > more efficient. So DistributedFileSystem should override listStatus(Path f, > PathFilter filter) and apply PathFilter early. > Globber.java also applies filter after calling listStatus. It should call > listStatus with the PathFilter. > {code} > FileStatus[] children = listStatus(candidate.getPath()); > ......... > for (FileStatus child : children) { > // Set the child path based on the parent path. > child.setPath(new Path(candidate.getPath(), > child.getPath().getName())); > if (globFilter.accept(child.getPath())) { > newCandidates.add(child); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)