[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663210#comment-13663210 ] Sandy Ryza commented on MAPREDUCE-5038: --- To elaborate on that a little further, the relevant change on the MR side was: {code} - Path p = new Path(paths[i].toUri().getPath()); + FileSystem fs = paths[i].getFileSystem(conf); + Path p = fs.makeQualified(paths[i]); {code} so a path that was previously not examined now is, which makes me think that the change exposed a preexisting bug. Not having had success at running the Hive tests on top of a patched MR, I can't speak with 100% confidence, but if anyone knowledgeable about the Hive side wants to take a look, I'd be happy to help investigate on or offline. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662578#comment-13662578 ] Arun C Murthy commented on MAPREDUCE-5038: -- [~sandyr] Do you know why we are getting the wrong URLs? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662608#comment-13662608 ] Sandy Ryza commented on MAPREDUCE-5038: --- From taking a deep look at the CombineFileInputFormat code, as well as copying this code into Hive and running it to see what happens, there doesn't appear to be anything on the MapReduce that could be modifying the authority in the URL that's passed in. So I think the wrong URLs must be coming from Hive. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662609#comment-13662609 ] Sandy Ryza commented on MAPREDUCE-5038: --- *on the MapReduce side old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656509#comment-13656509 ] Sandy Ryza commented on MAPREDUCE-5038: --- [~hagleitn], did you get a chance to look at this? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653132#comment-13653132 ] Sandy Ryza commented on MAPREDUCE-5038: --- I've spent a while attempting to run Hive 0.11.0 tests over a version of Hadoop that includes this patch, due to unfamiliarity with ant and the Hive build system, have been unable to figure out how to do it. I've tried copying into Hive the MapReduce code that touches the Paths in between Hive and where that exception is hit, and I verified that on my setup it produces the correct URL, with localhost instead of : in the authority. Is someone who's more familiar with Hive able to figure out where that : in the authority is coming from? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652044#comment-13652044 ] Sandy Ryza commented on MAPREDUCE-5038: --- Taking a look at this today old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652333#comment-13652333 ] Sandy Ryza commented on MAPREDUCE-5038: --- Does the version of Hive that's encountering this include HIVE-3338? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652443#comment-13652443 ] Arun C Murthy commented on MAPREDUCE-5038: -- I ran the test on hive-0.11, so yes, it includes HIVE-3338. Thanks for checking on this [~sandyr] old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651559#comment-13651559 ] Gunther Hagleitner commented on MAPREDUCE-5038: --- Here are the other hive tests that fail with/pass without this patch: combine2.q, sample_is_localmode_hook.q, stats_partscan_1.q old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651566#comment-13651566 ] Sandy Ryza commented on MAPREDUCE-5038: --- Before MAPREDUCE-1806, CombineFileInputFormat would not interpret har URIs as such. With the patch, it does, but it seems to be being passed an invalid har URI. Do we know where that URI is coming from? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651619#comment-13651619 ] Arun C Murthy commented on MAPREDUCE-5038: -- [~hagleitn] Thanks for pointing this, along with the test. I verified Hive breaks only with this patch too ([~sandyr] see the test Gunther pointed to). [~sandyr] I'm mystified too - but Hive has used o.a.h.mapred.CombineFileInputFormat forever and that test (archive_multi.q) has been around since 2011 (HIVE-2278). Also, Hive has had support for HAR for a long while too (ARCHIVE syntax). Looks like we need to investigate this further to see why this broke Hive... can you please look? Since this is not in 1.2 (currently under vote) I'm not too worried, we can revert this patch if we don't come up with a fix quickly? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605490#comment-13605490 ] Sandy Ryza commented on MAPREDUCE-5038: --- Attached patch that makes it so that MAPREDUCE-5049 doesn't need to be reverted as well. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605493#comment-13605493 ] Alejandro Abdelnur commented on MAPREDUCE-5038: --- Sandy, I've just reverted the commit, please upload a rebased patch (note I had to resolve a conflict with MAPREDUCE-5049) old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605499#comment-13605499 ] Hadoop QA commented on MAPREDUCE-5038: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574195/MAPREDUCE-5038-revised-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3426//console This message is automatically generated. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605550#comment-13605550 ] Hadoop QA commented on MAPREDUCE-5038: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574202/MAPREDUCE-5038-revised-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3427//console This message is automatically generated. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605662#comment-13605662 ] Alejandro Abdelnur commented on MAPREDUCE-5038: --- +1 old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605681#comment-13605681 ] Alejandro Abdelnur commented on MAPREDUCE-5038: --- I've just reverted this from branch-1.2 as well. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised-1.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604464#comment-13604464 ] Sandy Ryza commented on MAPREDUCE-5038: --- Posted a replacement patch old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604467#comment-13604467 ] Hadoop QA commented on MAPREDUCE-5038: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574029/MAPREDUCE-5038-revised.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3422//console This message is automatically generated. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch, MAPREDUCE-5038-revised.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603940#comment-13603940 ] Sandy Ryza commented on MAPREDUCE-5038: --- It looks like I messed this up and left out part of MAPREDUCE-1597. Working on a replacement patch. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596115#comment-13596115 ] Tom White commented on MAPREDUCE-5038: -- +1 old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596501#comment-13596501 ] Sangjin Lee commented on MAPREDUCE-5038: I think MAPREDUCE-5046 can be closed, as it is a subset of this patch. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0 Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594002#comment-13594002 ] Sangjin Lee commented on MAPREDUCE-5038: I filed MAPREDUCE-5046 to backport MAPREDUCE-1423, then found this. I took a look at the patch here, but I'm not sure if it subsumes the changes contained in MAPREDUCE-1423. Specifically, rackToNodes seems still static, which is a thread-safety problem. Could you absorb the fix that's in MAPREDUCE-1423? I'd be happy to look at that if you want. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594329#comment-13594329 ] Sandy Ryza commented on MAPREDUCE-5038: --- Filed MAPREDUCE-5049 to handle SplittableCompressionCodec. Uploaded a new patch that includes MAPREDUCE-1423. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594334#comment-13594334 ] Hadoop QA commented on MAPREDUCE-5038: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572246/MAPREDUCE-5038-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3387//console This message is automatically generated. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592433#comment-13592433 ] Tom White commented on MAPREDUCE-5038: -- I agree, it should test for {{codec instanceof SplittableCompressionCodec}} like the other FileInputFormats. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590992#comment-13590992 ] Sandy Ryza commented on MAPREDUCE-5038: --- Straightforward patch brings the code in mapred up to speed with the code in mapreduce. I ported the new tests as well and verified that they all pass. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591038#comment-13591038 ] Alejandro Abdelnur commented on MAPREDUCE-5038: --- The backport looks OK. Still, there is something that worries me {code} + @Override + protected boolean isSplitable(FileSystem fs, Path file) { +final CompressionCodec codec = + new CompressionCodecFactory(fs.getConf()).getCodec(file); +return codec == null; + } {code} We should take into account splittable codecs, trunk does take into account and so does 0.22. I wonder where/why this got drop in Hadoop 1. Any idea? old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591050#comment-13591050 ] Sandy Ryza commented on MAPREDUCE-5038: --- As some additional background, MAPREDUCE-1597, the original patch that added isSplitable to CombineFileInputFormat, took splittable codecs into account. The only place in any history I can find CombineFileInputFormat with an isSplitable method, but without taking splittable codecs into account, is in branch-1. old API CombineFileInputFormat missing fixes that are in new API - Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5038.patch The following changes patched the CombineFileInputFormat in mapreduce, but neglected the one in mapred MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files MAPREDUCE-2021 solved returning duplicate hostnames in split locations MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default FS In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira