[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029920#comment-14029920 ]
Hadoop QA commented on HADOOP-8989: ----------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650144/HADOOP-8989.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4054//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4054//console This message is automatically generated. > hadoop dfs -find feature > ------------------------ > > Key: HADOOP-8989 > URL: https://issues.apache.org/jira/browse/HADOOP-8989 > Project: Hadoop Common > Issue Type: New Feature > Reporter: Marco Nicosia > Assignee: Jonathan Allen > Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch > > > Both sysadmins and users make frequent use of the unix 'find' command, but > Hadoop has no correlate. Without this, users are writing scripts which make > heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs > -lsr is somewhat taxing on the NameNode, and a really slow experience on the > client side. Possibly an in-NameNode find operation would be only a bit more > taxing on the NameNode, but significantly faster from the client's point of > view? > The minimum set of options I can think of which would make a Hadoop find > command generally useful is (in priority order): > * -type (file or directory, for now) > * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) > * -print0 (for piping to xargs -0) > * -depth > * -owner/-group (and -nouser/-nogroup) > * -name (allowing for shell pattern, or even regex?) > * -perm > * -size > One possible special case, but could possibly be really cool if it ran from > within the NameNode: > * -delete > The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow. > Lower priority, some people do use operators, mostly to execute -or searches > such as: > * find / \(-nouser -or -nogroup\) > Finally, I thought I'd include a link to the [Posix spec for > find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.2#6252)