[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784463#comment-16784463 ] David Mollitor commented on MAPREDUCE-207: -- Came across a situation lately where a user had the LZO compression codec enabled in the cluster. The codec was installed across the cluster. However, MR jobs, that did not even require the codec, were failing because the compression codec was not installed on the client node where the jobs were being submitted from. As part of the client's role in calculating splits, the client loads the codec configuration and all the associated codec implementations. This fails on external clients because they did not have the codec installed. The user understandably did not want to have to install the LZO codec on every client node, but it was at the cost of having to maintain separate hdfs-site files for different client hosts. Moving all of this work into the cluster removes this dependency from the clients. > Computing Input Splits on the MR Cluster > > > Key: MAPREDUCE-207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: applicationmaster, mrv2 >Reporter: Philip Zeyliger >Assignee: Gera Shegalov >Priority: Major > Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, > MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, > MAPREDUCE-207.v07.patch > > > Instead of computing the input splits as part of job submission, Hadoop could > have a separate "job task type" that computes the input splits, therefore > allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469438#comment-16469438 ] BELUGA BEHR commented on MAPREDUCE-207: --- This feature would be interesting to the Hive server since the server could have many MapReduce clients running in a single instance, at the same time, on large data sets. > Computing Input Splits on the MR Cluster > > > Key: MAPREDUCE-207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: applicationmaster, mrv2 >Reporter: Philip Zeyliger >Assignee: Gera Shegalov >Priority: Major > Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, > MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, > MAPREDUCE-207.v07.patch > > > Instead of computing the input splits as part of job submission, Hadoop could > have a separate "job task type" that computes the input splits, therefore > allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308003#comment-14308003 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch against trunk revision e1990ab. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5167//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059583#comment-14059583 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-tools/hadoop-gridmix {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048618#comment-14048618 ] Gera Shegalov commented on MAPREDUCE-207: - v06 does not address [~mingma]'s review yet (thank you) . Assigned this jira to myself as nobody else seems to be working on it. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048726#comment-14048726 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653342/MAPREDUCE-207.v06.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication The test build failed in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046497#comment-14046497 ] Ming Ma commented on MAPREDUCE-207: --- Thanks, Gera. Nice work and this will be quite useful. Overall it looks good. Per offline discussion with Gera, 1. It is unclear if there is any security related implication such as https://issues.apache.org/jira/browse/MAPREDUCE-5663. 2. The compatibility between new MR client with this feature and cluster with old MR. Given new MR client won't compute the split by default; the job will fail if the cluster still uses old MR. So in this case, new MR client needs to be configured to compute split. For a more general case where new MR client can talk to some cluster with old MR and some cluster with new MR, it will be nice if client can discover if the cluster supports this feature. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009533#comment-14009533 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646848/MAPREDUCE-207.v05.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010259#comment-14010259 ] Gera Shegalov commented on MAPREDUCE-207: - Assuming that TestPipeApplication is MAPREDUCE-5868, v05 is ready for review. The code can further be optimized to avoid reading splits back when they are written for the first time. We can incorporate it if the approach is accepted in general. There is plenty of coverage for job submission that helped shape the patch. Since it's mere refactoring, no new functional tests are urgently needed. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004332#comment-14004332 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645924/MAPREDUCE-207.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies org.apache.hadoop.mapreduce.v2.app.TestMRApp org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator org.apache.hadoop.mapreduce.v2.app.TestFail org.apache.hadoop.mapreduce.v2.app.TestFetchFailure org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM org.apache.hadoop.mapreduce.v2.app.TestMRClientService org.apache.hadoop.mapreduce.v2.app.TestAMInfos org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp org.apache.hadoop.mapreduce.v2.app.TestKill org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997964#comment-13997964 ] Gera Shegalov commented on MAPREDUCE-207: - [~ste...@apache.org], thanks for your [comment|https://issues.apache.org/jira/browse/MAPREDUCE-5887?focusedCommentId=13997431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13997431] in MAPREDUCE-5887. Moving it to here. bq. One test to try there is what happens when the blocksize is reported as very, very small (you can configure this in swiftfs). in the client this will cause the submitting process to OOM and fail. Presumably the same outcome in the AM is the simplest to implement -we just need to make sure that YARN recognises this as a failure and only tries a couple of times OOM's as any other AM failure are treated as an Application attempt failure ({{yarn.resourcemanager.am.max-attempts}}). We've experienced such issues in production, and it is actually usually indirectly related to splits, i.e. the job state comprising all map and reduce attempts is too big for the default MR-AM container size. Before doing the work on moving split calculation to MR-AM, I was actually thinking about auto-tuning {{yarn.app.mapreduce.am.resource.mb}} and Xmx opts in JobSubmitter. However, even if the split calculation happens in AM, we can come up with an AM-RM RPC like start a new attempt with the new settings. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995583#comment-13995583 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644428/MAPREDUCE-207.v02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.lib.aggregate.TestAggregates org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat org.apache.hadoop.mapred.TestFieldSelection org.apache.hadoop.mapred.TestOldCombinerGrouping org.apache.hadoop.mapreduce.TestLocalRunner org.apache.hadoop.mapred.TestUserDefinedCounters org.apache.hadoop.mapreduce.TestMROutputFormat org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection org.apache.hadoop.mapred.TestLocalMRNotification org.apache.hadoop.mapred.TestLineRecordReaderJobs org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper org.apache.hadoop.mapreduce.TestNewCombinerGrouping org.apache.hadoop.mapred.lib.TestChainMapReduce org.apache.hadoop.mapreduce.TestMapReduce org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner org.apache.hadoop.mapreduce.TestMapperReducerCleanup org.apache.hadoop.mapred.lib.TestMultipleOutputs org.apache.hadoop.mapred.TestJavaSerialization org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs org.apache.hadoop.mapred.TestCollect org.apache.hadoop.mapred.join.TestDatamerge org.apache.hadoop.mapreduce.TestMapCollection org.apache.hadoop.mapreduce.lib.aggregate.TestMapReduceAggregates org.apache.hadoop.mapred.TestMapRed org.apache.hadoop.mapred.TestFileOutputFormat org.apache.hadoop.mapreduce.TestValueIterReset org.apache.hadoop.mapred.TestMapOutputType org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator org.apache.hadoop.mapreduce.lib.chain.TestChainErrors org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs org.apache.hadoop.mapred.TestComparators org.apache.hadoop.mapreduce.lib.input.TestLineRecordReaderJobs org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596692#comment-13596692 ] Sandy Ryza commented on MAPREDUCE-207: -- Arun, are you still planning on working on this? If not, do you mind if I pick it up? Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447584#comment-13447584 ] Johannes Zillmann commented on MAPREDUCE-207: - Currently in our hadoop applications we calculate the splits before we submit it to the client (then the client simply looks up the existing splits). We do that mainly to influence the reducer count base on the number of splits/map-tasks. In case hadoop does the splitting on the cluster (which makes sense), it would be nice to have a hook to influence configuration! Sometimes it also makes sense for us to decide on the map-reduce assembly after we know the splits (different join strategies for different data constellations). Just dumping some ideas here... Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Arun C Murthy Attachments: MAPREDUCE-207.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira