[jira] [Updated] (HADOOP-8989) hadoop dfs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Allen updated HADOOP-8989: --- Attachment: HADOOP-8989.patch I've removed all of the @inheritDoc references (misunderstanding on part about when they were needed, didn't realise javadoc automatically inherited). Updated the web site documentation to use the format. Having a more detailed look at the use of LinkedList in the Command method signatures. It seems to be used as a mixture of List and Deque in different subclasses and so it's not a straightforward replacement with an interface. hadoop dfs -find feature Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580351#comment-13580351 ] Hadoop QA commented on HADOOP-8989: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12569735/HADOOP-8989.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 41 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2198//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2198//console This message is automatically generated. hadoop dfs -find feature Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9313) Remove spurious mkdir from hadoop-config.cmd
Ivan Mitic created HADOOP-9313: -- Summary: Remove spurious mkdir from hadoop-config.cmd Key: HADOOP-9313 URL: https://issues.apache.org/jira/browse/HADOOP-9313 Project: Hadoop Common Issue Type: Bug Reporter: Ivan Mitic The following mkdir seems to have been accidentally added to Windows cmd script and should be removed: {code} mkdir c:\tmp\dir1 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9313) Remove spurious mkdir from hadoop-config.cmd
[ https://issues.apache.org/jira/browse/HADOOP-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated HADOOP-9313: --- Attachment: HADOOP-9313.branch-trunk-win.cmd.patch Attaching the patch. Remove spurious mkdir from hadoop-config.cmd Key: HADOOP-9313 URL: https://issues.apache.org/jira/browse/HADOOP-9313 Project: Hadoop Common Issue Type: Bug Reporter: Ivan Mitic Attachments: HADOOP-9313.branch-trunk-win.cmd.patch The following mkdir seems to have been accidentally added to Windows cmd script and should be removed: {code} mkdir c:\tmp\dir1 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira