[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025845#comment-14025845 ]
Akira AJISAKA commented on HADOOP-8989: --------------------------------------- Thanks for updating the patch. I compiled with the patch and tried some options. 1. When I executed -print0 option, newlines are included in the output as follows: {code} # hdfs dfs -find /user/root -name '*.txt' -print0 abc.txt def.txt {code} Newlines shouldn't be included in -print0 option. 2. In Find.java, spaces should be used instead of tabs. {code} // Initialize the static variables. EXPRESSIONS = new Class[] { // Operator Expressions And.class, // Action Expressions Print.class, // Navigation Expressions // Matcher Expressions Name.class }; {code} Minor nits: 3. In Expressions.java, {code} /** * Returns the precendence of this expression * (only applicable to operators). */ {code} precendence should be precedence. 4. There are some trailing white spaces in Find.java. 5. In Print.java, {code} private Print(boolean appendNull) { super(); setUsage(USAGE); setHelp(HELP); setAppendNull(appendNull); } private void setAppendNull(boolean appendNull) { this.appendNull = appendNull; } {code} can be simplified as {code} private Print(boolean appendNull) { super(); setUsage(USAGE); setHelp(HELP); this.appendNull = appendNull; } {code} 6. {code} /** * Construct a Print {@link Expression} with an operational ASCII NUL * suffix. */ {code} operational should be optional? > hadoop dfs -find feature > ------------------------ > > Key: HADOOP-8989 > URL: https://issues.apache.org/jira/browse/HADOOP-8989 > Project: Hadoop Common > Issue Type: New Feature > Reporter: Marco Nicosia > Assignee: Jonathan Allen > Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch > > > Both sysadmins and users make frequent use of the unix 'find' command, but > Hadoop has no correlate. Without this, users are writing scripts which make > heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs > -lsr is somewhat taxing on the NameNode, and a really slow experience on the > client side. Possibly an in-NameNode find operation would be only a bit more > taxing on the NameNode, but significantly faster from the client's point of > view? > The minimum set of options I can think of which would make a Hadoop find > command generally useful is (in priority order): > * -type (file or directory, for now) > * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) > * -print0 (for piping to xargs -0) > * -depth > * -owner/-group (and -nouser/-nogroup) > * -name (allowing for shell pattern, or even regex?) > * -perm > * -size > One possible special case, but could possibly be really cool if it ran from > within the NameNode: > * -delete > The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow. > Lower priority, some people do use operators, mostly to execute -or searches > such as: > * find / \(-nouser -or -nogroup\) > Finally, I thought I'd include a link to the [Posix spec for > find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.2#6252)