[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841428#action_12841428 ] Hudson commented on MAPREDUCE-1501: --- Integrated in Hadoop-Mapreduce-trunk #248 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/]) . FileInputFormat supports multi-level, recursive directory listing. (Zheng Shao via dhruba) FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841658#action_12841658 ] Chris Douglas commented on MAPREDUCE-1501: -- {noformat} +import com.sun.org.apache.commons.logging.Log; +import com.sun.org.apache.commons.logging.LogFactory; {noformat} Should these imports be {{org.apache.hadoop.commons.logging}}, not {{com.sun...}} ? Is there a reason this feature was only added to a deprecated class, instead of the {{FileInputFormat}} in the {{mapreduce}} package? FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840541#action_12840541 ] dhruba borthakur commented on MAPREDUCE-1501: - The failed unit test is TestMiniMRLocalFS.testWithLocal and is not related to this patch. I will commit this patch. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841042#action_12841042 ] Hudson commented on MAPREDUCE-1501: --- Integrated in Hadoop-Mapreduce-trunk-Commit #257 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/257/]) . FileInputFormat supports multi-level, recursive directory listing. (Zheng Shao via dhruba) FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840122#action_12840122 ] Hadoop QA commented on MAPREDUCE-1501: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436481/MAPREDUCE-1501.1.trunk.patch against trunk revision 916823. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/339/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/339/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/339/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/339/console This message is automatically generated. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836602#action_12836602 ] Hadoop QA commented on MAPREDUCE-1501: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436481/MAPREDUCE-1501.1.trunk.patch against trunk revision 912471. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/469/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/469/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/469/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/469/console This message is automatically generated. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836936#action_12836936 ] Zheng Shao commented on MAPREDUCE-1501: --- Thanks for the feedback Ian. I don't think FileSystem.listPath() returns . or ... If it does, I believe the current code in trunk will also break. The new unit test will also fail if that's the case. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836948#action_12836948 ] dhruba borthakur commented on MAPREDUCE-1501: - I think Ian mentioned that you can enhance this feature by allowing the user to register a set of PathFilters. That will allow the job to process only a selected subset of the subdirectories. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836963#action_12836963 ] Zheng Shao commented on MAPREDUCE-1501: --- Thanks Dhruba. I missed the part and other hidden directories. We do call PathFilter on the sub directories as well (see addInputPathRecursively(...)). Is that good enough or we want to split the PathFilters for files and the PathFilters for directories? FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837030#action_12837030 ] dhruba borthakur commented on MAPREDUCE-1501: - That should be good enough, unless Ian has some other ideas. FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: MAPREDUCE-1501.1.branch-0.20.patch, MAPREDUCE-1501.1.trunk.patch As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.