subject:"\[jira\] \[Commented\] \(HADOOP\-8989\) hadoop dfs \-find feature"

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-26 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149072#comment-14149072
 ] 

Jonathan Allen commented on HADOOP-8989:


HADOOP-11142 created to change the FS docs to refer to {{hadoop fs}} rather 
than {{hdfs dfs}}.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-23 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145451#comment-14145451
 ] 

Allen Wittenauer commented on HADOOP-8989:
--

[~daryn], were you going to commit this or should I?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-09 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127128#comment-14127128
 ] 

Daryn Sharp commented on HADOOP-8989:
-

+1 I think it looks ok.  Please file a jira though for universally changing the 
docs to refer to hadoop fs instead of hdfs dfs.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-08 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125423#comment-14125423
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks [~jonallen] for the update. LGTM, +1(non-binding). The test failure is 
not related to the patch.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125101#comment-14125101
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667105/HADOOP-8989.patch
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4668//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4668//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-05 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122494#comment-14122494
 ] 

Jonathan Allen commented on HADOOP-8989:


I should have time to update things this weekend.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-09-04 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122294#comment-14122294
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Hi [~jonallen], what's going on this jira? If you don't have time to work on 
this issue, I'd like to update the patch based on the above comments.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-08-13 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095822#comment-14095822
 ] 

Allen Wittenauer commented on HADOOP-8989:
--

I've started to play with this patch.  At first glance, I do see a (minor) 
documentation bug:

{code}
+   Usage: hdfs -find path ... expression ... 
{code}

This should probably be:

{code}
+   Usage: hdfs dfs -find path ... expression ... 
{code}

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-08-13 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095891#comment-14095891
 ] 

Allen Wittenauer commented on HADOOP-8989:
--

Two things to fix:

# The hdfs -find doc bug
# Documentation should mention that the path parameter is optional and defaults 
to the user's home directory

+1 lgtm, pending fixes to those two items. 

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-08-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096162#comment-14096162
 ] 

Daryn Sharp commented on HADOOP-8989:
-

It should be hadoop fs -find.  FsShell is not hdfs specific.

We should have deprecated hdfs dfs long ago...  All it effectively does is 
call hadoop fs.  Does it make sense to hdfs dfs -ls file:/tmp?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-08-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096167#comment-14096167
 ] 

Daryn Sharp commented on HADOOP-8989:
-

Unless I missed them, add tests to ensure that relative paths are displayed as 
relative paths, and no path defaults to pwd (which is relative too) and I'm +1.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-08-11 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092562#comment-14092562
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks [~jonallen] for the update. +1 (non-binding).
Sorry for the late response.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047294#comment-14047294
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653073/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4188//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4188//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-24 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042538#comment-14042538
 ] 

Jonathan Allen commented on HADOOP-8989:


bq. If no paths are given, shouldn't it default to the pwd instead of aborting?

I've now checked the Linux implementation of find and this does default to pwd 
so I'll make the change to this (it's only a single line either way). The Posix 
find specifies path as mandatory but I think we tend to favour Linux - correct 
me if I'm wrong about this.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-24 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042627#comment-14042627
 ] 

Allen Wittenauer commented on HADOOP-8989:
--

We should work on trying to avoid GNU or Linux -isms, but in this particular 
case it seems reasonable to default to pwd.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042948#comment-14042948
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652301/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestCrcCorruption

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4166//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4166//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-23 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040817#comment-14040817
 ] 

Daryn Sharp commented on HADOOP-8989:
-

I think you can simplify the processing flow by having {{processOptions}} also 
parse out the expressions, leaving only the paths in the list.  The method is 
intended to strip all option related arguments.

If no paths are given, shouldn't it default to the pwd instead of aborting?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-23 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041245#comment-14041245
 ] 

Jonathan Allen commented on HADOOP-8989:


bq. I think you can simplify the processing flow by having processOptions also 
parse out the expressions, leaving only the paths in the list.

Yes, I think you're right about that. I hadn't thought about skipping past the 
path arguments and stripping the remaining expressions out of the list (though 
obvious now it's been suggested).

bq. If no paths are given, shouldn't it default to the pwd instead of aborting?

That doesn't seem to be the standard find behaviour (or at least not on my Mac)

{code}
$ find -print
find: illegal option -- p
usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
   find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]
{code}

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040238#comment-14040238
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651882/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.cli.TestCLI
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4136//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4136//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040318#comment-14040318
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651889/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4137//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4137//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-18 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036295#comment-14036295
 ] 

Jonathan Allen commented on HADOOP-8989:


[~daryn]

bq. Do you actually need the changes to Command and CommandFactory to add a new 
command?

Yes, I think I do.

The change to Command#processPaths introduces a hook to allow recursion through 
something other than directories, specifically this allows Find to recurse 
through symbolic links when required. This could be implemented by overriding 
the whole processPaths method in Find but this seems likely to introduce 
maintenance problems going forwards (and in a [previous 
comment|https://issues.apache.org/jira/browse/HADOOP-8989?focusedCommentId=13585968page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13585968]
 you suggested that this shouldn't be done).

The CommandFactory is added into the Command so that it can be used by the 
-exec expression (it needs to know what class to pass the found files into). It 
seemed cleaner to do it this way than to call FsShell.run for each file.

They both seem fairly simple and safe changes that could be useful to other 
commands in the future.

bq. I'm not sure the path handling/expansion is consistent with the rest of the 
commands. Find is a bit odd in that paths come before all options, but I think 
there has to be a cleaner way to implement the parsing.

Possibly, but I couldn't think of one. If you can see something I've missed 
then feel free to suggest it.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036770#comment-14036770
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651267/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.cli.TestCLI
  org.apache.hadoop.cli.TestHDFSCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4099//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4099//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-17 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034491#comment-14034491
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks for updating the patch! Mostly looks good. I verified -print0 option and 
the following command worked well.
{code}
# hdfs dfs -find /user/root/ -name file* -print0 | xargs -0 hdfs dfs -rm
{code}

Minor comment:
{code}
  private static String[] HELP =
  { Finds all files that match the specified expression and 
  + applies selected actions to them. };
{code}
Would you change the code like other commands? i.e.
{code}
  private static String[] HELP =
  { Finds all files that match the specified expression and,
applies selected actions to them. };
{code}

In addition, can you please include a document in the patch for keeping 
consistency between source code and document? (i.e. Would you split 
HADOOP-10580?)

[~umamaheswararao], do you have any comments on this patch? I'll appreciate 
your reviews.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-17 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034650#comment-14034650
 ] 

Daryn Sharp commented on HADOOP-8989:
-

Do you actually need the changes to {{Command}} and {{CommandFactory}} to add a 
new command?

I'm not sure the path handling/expansion is consistent with the rest of the 
commands.  Find is a bit odd in that paths come before all options, but I think 
there has to be a cleaner way to implement the parsing.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029920#comment-14029920
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650144/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4054//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4054//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-09 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025444#comment-14025444
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

[~jonallen], sorry for late response. I'll review the patch by tomorrow.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-09 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025845#comment-14025845
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks for updating the patch. I compiled with the patch and tried some options.

1. When I executed -print0 option, newlines are included in the output as 
follows:
{code}
# hdfs dfs -find /user/root -name '*.txt' -print0
abc.txt
def.txt
{code}
Newlines shouldn't be included in -print0 option.

2. In Find.java, spaces should be used instead of tabs.
{code}
// Initialize the static variables.
EXPRESSIONS = new Class[] {
// Operator Expressions
And.class,
// Action Expressions
Print.class,
// Navigation Expressions
// Matcher Expressions
Name.class
};
{code}
Minor nits:

3. In Expressions.java,
{code}
  /**
   * Returns the precendence of this expression
   * (only applicable to operators).
   */
{code}
precendence should be precedence.

4. There are some trailing white spaces in Find.java.

5. In Print.java,
{code}
  private Print(boolean appendNull) {
super();
setUsage(USAGE);
setHelp(HELP);
setAppendNull(appendNull);
  }

  private void setAppendNull(boolean appendNull) {
this.appendNull = appendNull;
  }
{code}
can be simplified as
{code}
  private Print(boolean appendNull) {
super();
setUsage(USAGE);
setHelp(HELP);
this.appendNull = appendNull;
  }
{code}

6.
{code}
  /**
   * Construct a Print {@link Expression} with an operational ASCII NUL
   * suffix.
   */
{code}
operational should be optional?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-06-01 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014950#comment-14014950
 ] 

Jonathan Allen commented on HADOOP-8989:


[~ajisakaa] are you able to continue the review of this patch?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-05-01 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986396#comment-13986396
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks [~jonallen] for splitting the patch!
{quote}
I think you get the results as full paths
{code}
hdfs://127.0.0.1:59894/texttest
hdfs://127.0.0.1:59894/texttest/file..bz2
hdfs://127.0.0.1:59894/texttest/file.deflate
hdfs://127.0.0.1:59894/texttest/file.gz
{code}
This should be relative paths I think from where you are finding, no?
{quote}
Would you please reflect the above [~umamaheswararao]'s comment?

A new comment:
* The tests for -iname option is needed.

Two minor nits:
* Typo in TestAnd.java
{code}
// test both expressions failining
{code}
* There're some lines over 80 characters in TestFind.java

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982438#comment-13982438
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642147/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3860//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3860//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-18 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973912#comment-13973912
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

I think you can add a depends upon link to an issue to clarify the order.
ex.) If the patch 2 at issue 2 depends on the patch 1 at issue 1, it's better 
to add depends upon issue 1 link to issue 2.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-17 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973145#comment-13973145
 ] 

Jonathan Allen commented on HADOOP-8989:


OK, I should have time to do that over the weekend. How do I make sure the 
patches get applied in the correct order? (ie patch 2 depends on patch 1, etc).

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-15 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969272#comment-13969272
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

Thanks [~jonallen] for updating the patch.
bq. How about splitting this to small patches?
+1 (non-binding) for splitting. If you don't have a time to do this, I'm 
willing to help.

{code}
/**
 * Construct a Print {@Expression} with an operational ASCII NULL
 * suffix.
 */
  private Print(boolean appendNull) {
{code}
{{@Expression}} should be {{@link Expression}}.
{code}
public int apply(int current, int shift, int value) {
  return current | (value  shift);
};
{code}
In {{Perm.java}} some semicolons can be removed.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-15 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970026#comment-13970026
 ] 

Jonathan Allen commented on HADOOP-8989:


I'm happy to split into small patches but could do with advice on how to do 
this. Are you suggesting creating extra issues or multiple files attached to 
this one or something else?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-15 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970295#comment-13970295
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

I suggest to create extra issues. You can create sub-tasks and attach a small 
patch on each issue.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-09 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964584#comment-13964584
 ] 

Uma Maheswara Rao G commented on HADOOP-8989:
-

On quick look at your patch:
 I think you get the results as full paths
{noformat}
hdfs://127.0.0.1:59894/texttest
hdfs://127.0.0.1:59894/texttest/file..bz2
hdfs://127.0.0.1:59894/texttest/file.deflate
hdfs://127.0.0.1:59894/texttest/file.gz
{noformat}
This should be relative paths I think from where you are finding, no?

Overall your work looks to be great. I think having bulk patch may be had to 
review and many things may be overlooked. How about splitting this to small 
patches?

{code}
/**
 * Implements the -mindepth expression for the
 * {@link org.apache.hadoop.fs.shell.find.Find} command.
 */
final class MaxDepth extends BaseExpression {
{code}
javadoc should say maxdepth



 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-04-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962970#comment-13962970
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639169/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3756//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3756//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-03-31 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954992#comment-13954992
 ] 

Akira AJISAKA commented on HADOOP-8989:
---

The patch mostly looks good. I built with the patch and verified some 
expressions worked.
This feature is also very useful to search files/dirs in a fsimage with new 
OfflineImageViewer, and therefore I want this patch will be merged in.
Minor nit: There's some trailing whitespaces.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878145#comment-13878145
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624208/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3450//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3450//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-01-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876716#comment-13876716
 ] 

Mark Miller commented on HADOOP-8989:
-

Lot's of watchers, Hadoop QA was happy, anyone willing to tip this in?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-01-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876719#comment-13876719
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573783/HADOOP-8989.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3447//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2014-01-20 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876872#comment-13876872
 ] 

Jonathan Allen commented on HADOOP-8989:


It was waiting for a review, I know Daryn was part way through but needed to 
put it aside for a while and I don't know if he had a chance to get back to it. 
I'll sort out the problems it seems to have against the latest trunk and submit 
a new patch (hopefully won't require major changes).

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-03-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602935#comment-13602935
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573783/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2330//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2330//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-03-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598444#comment-13598444
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572993/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestDataDirs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2304//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2304//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-25 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585968#comment-13585968
 ] 

Daryn Sharp commented on HADOOP-8989:
-

Initial comments:
* Why is {{setCommandFactory}} exposed, and why is {{FsShell}} setting it in 
the instance?
* Use of {{static}} blocks is discouraged because it leads to bizarre problems 
during class loading.  Is there an easy way to avoid them?
* recognised should be recognized
* You can retrieve {{options = cf.getOpts}} instead of individually querying 
each option and building a new set.
* Why is {{parsePathData}} checking for {{(arg == null)}}?  That shouldn't 
happen since {{isEmpty}} is already being checked?
* {{processPaths}} really shouldn't be overridden.  It was only exposed for 
some difficult to handle {{ls}} behavior.  The logic should be split into 
{{recursePath}} and {{processPath}}.
* I'd suggest against making the expression classes public.

I still need to study the expression handling.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-25 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586375#comment-13586375
 ] 

Jonathan Allen commented on HADOOP-8989:


Some quick answers to your questions ...

The CommandFactory is added into the Command so that it can be used by the 
-exec expression (it needs to know what class to pass the found files into). It 
seemed cleaner to do it this way than to call FsShell.run for each file. In 
hindsight I think the factory should be setting it rather than FsShell (in the 
same place it calls setConf).

Should be able to avoid most of the static blocks. As an aside, why are static 
blocks worse than initialising static variables? (or are static variables other 
than constants also bad?)

(arg == null) needs to be checked because there may be a null on the Deque, 
nothing to do with args being empty or not. I'm not sure how a null would have 
got on the Deque but the method shouldn't be making assumptions about that.

I needed to override processPaths so that I could implement the -depth option. 
processPaths always processes the current item before recursing the directory. 
It looks like somebody has added a postProcessPath method since I started this 
so that might solve the problem, I'll revisit.

I should get a chance to action the comments later this week.


 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580351#comment-13580351
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12569735/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 41 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2198//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2198//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579223#comment-13579223
 ] 

Daryn Sharp commented on HADOOP-8989:
-

I don't remember, but I think LinkedList was used in signatures because of 
methods unique to that class.  I'd recommend any large sweeping changes be 
considered on another jira.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-14 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578772#comment-13578772
 ] 

Jonathan Allen commented on HADOOP-8989:


Luke - a couple of questions about your comments:
1) where are you thinking of here? I can't find any @inheritDoc on an inner 
class so I feel I'm missing something? I notice inheritDoc isn't used very much 
throughout the code, is it generally frowned on?
2) completely agree with the thought but this isn't something I can fix without 
changing nearly all of the fs.shell classes and this jira seems the wrong place 
to do that; happy to be guided by you and Daryn if you think I should.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-11 Thread Luke Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575707#comment-13575707
 ] 

Luke Lu commented on HADOOP-8989:
-

Sorry Jonathan, I was hoping Daryn could take a look as he did the fs shell 
refactor :)

Anyway, a few nits after a quick pass (more later):
# get rid of the @inheritDoc comment for all the inner classes, which are 
implementations. 
# avoid using concrete classes in method signatures e.g. use ListString args 
instead of LinkedListString args. I think this is largely due to the refactor 
done in HADOOP-7202. I think we should fix this in the base classes as well. 
Daryn, what do you think?

I don't think HADOOP-9195 (though independently useful) would help the 
implementation here.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-11 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575819#comment-13575819
 ] 

Daryn Sharp commented on HADOOP-8989:
-

For some reason I thought this was committed so it fell off my radar.  I will 
try to squeeze out some time for reviewing.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-02-10 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575529#comment-13575529
 ] 

Jonathan Allen commented on HADOOP-8989:


Luke - did you get a chance to finish reviewing this?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2013-01-09 Thread Caleb Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549411#comment-13549411
 ] 

Caleb Jones commented on HADOOP-8989:
-

I've submitted a patch for HADOOP-9195 (DateTimePathFilter based on mtime) 
which may have some functionality similar to this.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-12-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533489#comment-13533489
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561197/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 41 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1889//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1889//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533056#comment-13533056
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561115/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 40 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.cli.TestHDFSCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1880//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1880//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533139#comment-13533139
 ] 

Hadoop QA commented on HADOOP-8989:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561136/HADOOP-8989.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 40 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1885//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1885//console

This message is automatically generated.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-12-04 Thread Parameswaran Raman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510039#comment-13510039
 ] 

Parameswaran Raman commented on HADOOP-8989:


Is this feature available yet? If not, any timelines?
I ran into a use-case today where it would be easier to have 'find' support in 
hadoop dfs. hadoop fs -ls -R runs out of memory!


 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, 
 HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-28 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505509#comment-13505509
 ] 

Daryn Sharp commented on HADOOP-8989:
-

Sorry, this fell off my radar.  I've been swamped.

bq. Has anyone studied the impact on the NN yet?
W/o having reviewed the patch, I initially don't think this could be any worse 
than ls -R.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-28 Thread Jonathan Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506017#comment-13506017
 ] 

Jonathan Allen commented on HADOOP-8989:


Allen - I haven't done any specific studies of the NN impact but I would expect 
it to be the same as any of the existing recursive command (e.g. ls -R).  The 
code uses the existing recursion code and applies the expressions to the 
returned FileStatus on the client side.  The -exec option obviously impacts the 
NN but no more than if you ran the command directly (though you don't have the 
JVM start-up overhead so the rate of NN messages will be higher).

Wolfgang - I'll look at adding those option expressions in (I got to the point 
where it looked like I'd be adding expressions forever so I decided to stop and 
see what people asked for).

Daryn - if you do get a chance to review this then the main code is pretty much 
finished (though I need to sort out the formatting before submitting properly) 
but I've reworked the unit tests since uploading the last patch (I'll upload a 
new at the weekend).  I still need to update the documentation web page and add 
some stuff in the CLI test suites.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-27 Thread wolfgang hoschek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505074#comment-13505074
 ] 

wolfgang hoschek commented on HADOOP-8989:
--

I gave it a try and this patch is awesome. Would be really useful to add the 
following options as defined here: http://linux.die.net/man/1/find 

-regextype java (no need for other regex flavours here, but should be 
possible to add classic flavours later)
-regex
-maxdepth
-mindepth
-printf



 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-27 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505080#comment-13505080
 ] 

Allen Wittenauer commented on HADOOP-8989:
--

Has anyone studied the impact on the NN yet?

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-27 Thread wolfgang hoschek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505130#comment-13505130
 ] 

wolfgang hoschek commented on HADOOP-8989:
--

In addition, would be good to have a -starttime dateTime dateTimePattern 
option to express the time related options (e.g. -mmin, -mtime, -amin, -atime) 
to work relative to a specific absolute timestamp (say, give me all files 
modified since yesterday midnight) instead of relative to whatever now 
happens to be at the time of command execution. The dateTimePattern might be 
ISO 8601 by default, and could be set to any java.text.SimpleDateFormat format 
pattern. 

The FindOptions class already seems to foresee such usage. It's just a matter 
of exposing it at the command level.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature

2012-11-09 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494356#comment-13494356
 ] 

Daryn Sharp commented on HADOOP-8989:
-

Wow, nice complement of expressions.  I'll try to review soon.

 hadoop dfs -find feature
 

 Key: HADOOP-8989
 URL: https://issues.apache.org/jira/browse/HADOOP-8989
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Marco Nicosia
Assignee: Jonathan Allen
 Attachments: HADOOP-8989.patch, HADOOP-8989.patch


 Both sysadmins and users make frequent use of the unix 'find' command, but 
 Hadoop has no correlate. Without this, users are writing scripts which make 
 heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
 -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
 client side. Possibly an in-NameNode find operation would be only a bit more 
 taxing on the NameNode, but significantly faster from the client's point of 
 view?
 The minimum set of options I can think of which would make a Hadoop find 
 command generally useful is (in priority order):
 * -type (file or directory, for now)
 * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
 * -print0 (for piping to xargs -0)
 * -depth
 * -owner/-group (and -nouser/-nogroup)
 * -name (allowing for shell pattern, or even regex?)
 * -perm
 * -size
 One possible special case, but could possibly be really cool if it ran from 
 within the NameNode:
 * -delete
 The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow.
 Lower priority, some people do use operators, mostly to execute -or searches 
 such as:
 * find / \(-nouser -or -nogroup\)
 Finally, I thought I'd include a link to the [Posix spec for 
 find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

63 matches

Mail list logo