[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2019-03-05 Thread David Mollitor (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784463#comment-16784463
 ] 

David Mollitor commented on MAPREDUCE-207:
--

Came across a situation lately where a user had the LZO compression codec 
enabled in the cluster.  The codec was installed across the cluster.  However, 
MR jobs, that did not even require the codec, were failing because the 
compression codec was not installed on the client node where the jobs were 
being submitted from.  As part of the client's role in calculating splits, the 
client loads the codec configuration and all the associated codec 
implementations.  This fails on external clients because they did not have the 
codec installed.  The user understandably did not want to have to install the 
LZO codec on every client node, but it was at the cost of having to maintain 
separate hdfs-site files for different client hosts.

Moving all of this work into the cluster removes this dependency from the 
clients.

> Computing Input Splits on the MR Cluster
> 
>
> Key: MAPREDUCE-207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: applicationmaster, mrv2
>Reporter: Philip Zeyliger
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
> MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
> MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could 
> have a separate "job task type" that computes the input splits, therefore 
> allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2018-05-09 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469438#comment-16469438
 ] 

BELUGA BEHR commented on MAPREDUCE-207:
---

This feature would be interesting to the Hive server since the server could 
have many MapReduce clients running in a single instance, at the same time, on 
large data sets.

> Computing Input Splits on the MR Cluster
> 
>
> Key: MAPREDUCE-207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: applicationmaster, mrv2
>Reporter: Philip Zeyliger
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
> MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
> MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could 
> have a separate "job task type" that computes the input splits, therefore 
> allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2015-02-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308003#comment-14308003
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch
  against trunk revision e1990ab.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5167//console

This message is automatically generated.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
 MAPREDUCE-207.v07.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-07-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059583#comment-14059583
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-tools/hadoop-gridmix 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//console

This message is automatically generated.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
 MAPREDUCE-207.v07.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-07-01 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048618#comment-14048618
 ] 

Gera Shegalov commented on MAPREDUCE-207:
-

v06 does not address [~mingma]'s review yet (thank you) . Assigned this jira to 
myself as nobody else seems to be working on it.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-07-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048726#comment-14048726
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653342/MAPREDUCE-207.v06.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapred.pipes.TestPipeApplication

  The test build failed in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//console

This message is automatically generated.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-06-27 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046497#comment-14046497
 ] 

Ming Ma commented on MAPREDUCE-207:
---

Thanks, Gera. Nice work and this will be quite useful. Overall it looks good. 
Per offline discussion with Gera,

1. It is unclear if there is any security related implication such as 
https://issues.apache.org/jira/browse/MAPREDUCE-5663.
2. The compatibility between new MR client with this feature and cluster with 
old MR. Given new MR client won't compute the split by default; the job will 
fail if the cluster still uses old MR. So in this case, new MR client needs to 
be configured to compute split. For a more general case where new MR client can 
talk to some cluster with old MR and some cluster with new MR, it will be nice 
if client can discover if the cluster supports this feature.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009533#comment-14009533
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12646848/MAPREDUCE-207.v05.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapred.pipes.TestPipeApplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//console

This message is automatically generated.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-05-27 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010259#comment-14010259
 ] 

Gera Shegalov commented on MAPREDUCE-207:
-

Assuming that TestPipeApplication is MAPREDUCE-5868, v05 is ready for review. 
The code can further be optimized to avoid reading splits back when they are 
written for the first time. We can incorporate it if the approach is accepted 
in general. There is plenty of coverage for job submission that helped shape 
the patch. Since it's mere refactoring, no new functional tests are urgently 
needed. 

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004332#comment-14004332
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645924/MAPREDUCE-207.v03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier
org.apache.hadoop.mapreduce.v2.app.TestRecovery
org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
org.apache.hadoop.mapreduce.v2.app.TestMRApp
org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
org.apache.hadoop.mapreduce.v2.app.TestFail
org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM
org.apache.hadoop.mapreduce.v2.app.TestMRClientService
org.apache.hadoop.mapreduce.v2.app.TestAMInfos
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp
org.apache.hadoop.mapreduce.v2.app.TestKill
org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher
org.apache.hadoop.mapred.pipes.TestPipeApplication
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//console

This message is automatically generated.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-05-15 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997964#comment-13997964
 ] 

Gera Shegalov commented on MAPREDUCE-207:
-

[~ste...@apache.org], thanks for your 
[comment|https://issues.apache.org/jira/browse/MAPREDUCE-5887?focusedCommentId=13997431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13997431]
  in MAPREDUCE-5887. Moving it to here.

bq. One test to try there is what happens when the blocksize is reported as 
very, very small (you can configure this in swiftfs). in the client this will 
cause the submitting process to OOM and fail. Presumably the same outcome in 
the AM is the simplest to implement -we just need to make sure that YARN 
recognises this as a failure and only tries a couple of times

OOM's as any other AM failure are treated as an Application attempt failure 
({{yarn.resourcemanager.am.max-attempts}}). We've experienced such issues in 
production, and it is actually usually indirectly related to splits, i.e. the 
job state comprising all map and reduce attempts is too big for the default 
MR-AM container size. 

Before doing the work on moving split calculation to MR-AM, I was actually 
thinking about auto-tuning {{yarn.app.mapreduce.am.resource.mb}} and Xmx opts 
in JobSubmitter. However, even if the split calculation happens in AM, we can 
come up with an AM-RM RPC like start a new attempt with the new settings.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2014-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995583#comment-13995583
 ] 

Hadoop QA commented on MAPREDUCE-207:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12644428/MAPREDUCE-207.v02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapred.lib.aggregate.TestAggregates
  org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat
  org.apache.hadoop.mapred.TestFieldSelection
  org.apache.hadoop.mapred.TestOldCombinerGrouping
  org.apache.hadoop.mapreduce.TestLocalRunner
  org.apache.hadoop.mapred.TestUserDefinedCounters
  org.apache.hadoop.mapreduce.TestMROutputFormat
  org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection
  org.apache.hadoop.mapred.TestLocalMRNotification
  org.apache.hadoop.mapred.TestLineRecordReaderJobs
  org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper
  org.apache.hadoop.mapreduce.TestNewCombinerGrouping
  org.apache.hadoop.mapred.lib.TestChainMapReduce
  org.apache.hadoop.mapreduce.TestMapReduce
  org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge
  org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator
  org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner
  org.apache.hadoop.mapreduce.TestMapperReducerCleanup
  org.apache.hadoop.mapred.lib.TestMultipleOutputs
  org.apache.hadoop.mapred.TestJavaSerialization
  org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs
  org.apache.hadoop.mapred.TestCollect
  org.apache.hadoop.mapred.join.TestDatamerge
  org.apache.hadoop.mapreduce.TestMapCollection
  
org.apache.hadoop.mapreduce.lib.aggregate.TestMapReduceAggregates
  org.apache.hadoop.mapred.TestMapRed
  org.apache.hadoop.mapred.TestFileOutputFormat
  org.apache.hadoop.mapreduce.TestValueIterReset
  org.apache.hadoop.mapred.TestMapOutputType
  org.apache.hadoop.mapred.TestJobCounters
  org.apache.hadoop.conf.TestNoDefaultsJobConf
  org.apache.hadoop.mapred.TestReporter
  
org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator
  org.apache.hadoop.mapreduce.lib.chain.TestChainErrors
  org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain
  org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs
  org.apache.hadoop.mapred.TestComparators
  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReaderJobs
  org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain
  org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl
  
org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM

[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2013-03-07 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596692#comment-13596692
 ] 

Sandy Ryza commented on MAPREDUCE-207:
--

Arun, are you still planning on working on this?  If not, do you mind if I pick 
it up?

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2012-09-04 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447584#comment-13447584
 ] 

Johannes Zillmann commented on MAPREDUCE-207:
-

Currently in our hadoop applications we calculate the splits before we submit 
it to the client (then the client simply looks up the existing splits). We do 
that mainly to influence the reducer count base on the number of 
splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would 
be nice to have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after 
we know the splits (different join strategies for different data 
constellations).

Just dumping some ideas here...


 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira