date:20130117


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved MAPREDUCE-4077.
--

Resolution: Not A Problem

 Issues while using Hadoop Streaming job
 ---

 Key: MAPREDUCE-4077
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4077
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Devaraj K
Assignee: Devaraj K

 When we use -file option it says deprecated and use -files.
 {code:xml}
 linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop 
 jar 
 ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop 
 -output /test/output/3 -mapper cat -reducer wc -file hadoop
 02/02/19 10:55:51 WARN streaming.StreamJob: -file option is deprecated, 
 please use generic option -files instead.
 {code}
 But when we use -files option, it says unrecognized option.
 {code:xml}
 linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop 
 jar 
 ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop 
 -output 
 /test/output/3 -mapper cat -reducer wc -files hadoop
 02/02/19 10:56:42 ERROR streaming.StreamJob: Unrecognized option: -files
 Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
 {code}
 When we use -archives option,  it says unrecognized option.
 {code:xml}
 linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop 
 jar 
 ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop 
 -output 
 /test/output/3 -mapper cat -reducer wc -archives testarchive.rar
 02/02/19 11:05:43 ERROR streaming.StreamJob: Unrecognized option: -archives
 Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
 {code}
 But in the options it will display the usage of the -archives.
 {code:xml}
 linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop 
 jar 
 ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop 
 -output 
 /test/output/3 -mapper cat -reducer wc -archives testarchive.rar
 02/02/19 11:05:43 ERROR streaming.StreamJob: Unrecognized option: -archives
 Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
 ..
 ..
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2309) While querying the Job Statics from the command-line, if we give wrong status name then there is no warning or response.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-2309:
-

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 While querying the Job Statics from the command-line, if we give wrong status 
 name then there is no warning or response.
 

 Key: MAPREDUCE-2309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Minor
 Fix For: 0.22.1

 Attachments: MAPREDUCE-2309-0.20.patch, MAPREDUCE-2309-trunk.patch


 If we try to get the jobs information by giving the wrong status name from 
 the command line interface, it is not giving any warning or response.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2548) Log improvements in DBOutputFormat.java and CounterGroup.java


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-2548:
-

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Log improvements in DBOutputFormat.java and CounterGroup.java
 -

 Key: MAPREDUCE-2548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2548
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-2548-1.patch, MAPREDUCE-2548.patch


 1. Instead of the printing the stack trace on the console, It can be logged. 
 {code:title=DBOutputFormat.java|borderStyle=solid}
 
 public void write(K key, V value) throws IOException {
   try {
 key.write(statement);
 statement.addBatch();
   } catch (SQLException e) {
 e.printStackTrace();
   }
 }
 {code}
 2. Missing resource information can be logged. 
 {code:title=CounterGroup .java|borderStyle=solid}
 protected CounterGroup(String name) {
 this.name = name;
 try {
   bundle = getResourceBundle(name);
 }
 catch (MissingResourceException neverMind) {
 }
 displayName = localize(CounterGroupName, name);
   }
   private String localize(String key, String defaultValue) {
 String result = defaultValue;
 if (bundle != null) {
   try {
 result = bundle.getString(key);
   }
   catch (MissingResourceException mre) {
   }
 }
 return result;
   }
 {code}
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2562) NullPointerException in Jobtracker when it is started without Name Node


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-2562:
-

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 NullPointerException in Jobtracker when it is started without Name Node
 ---

 Key: MAPREDUCE-2562
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2562
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 0.22.1

 Attachments: MAPREDUCE-2562.patch


 It is throwing NullPointerException in job tracker logs when job tracker is 
 started without NameNode.
 {code:title=Bar.java|borderStyle=solid}
 2011-06-03 01:50:04,304 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.18.52.225:9000. Already tried 7 time(s).
 2011-06-03 01:50:05,307 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.18.52.225:9000. Already tried 8 time(s).
 2011-06-03 01:50:06,310 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.18.52.225:9000. Already tried 9 time(s).
 2011-06-03 01:50:21,243 FATAL org.apache.hadoop.mapred.JobTracker: 
 java.lang.NullPointerException
   at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1635)
   at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:287)
   at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:279)
   at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:274)
   at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4312)
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3207) TestMRCLI failing on trunk


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned MAPREDUCE-3207:


Assignee: (was: Devaraj K)

 TestMRCLI failing on trunk  
 

 Key: MAPREDUCE-3207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3207
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Hitesh Shah
Priority: Minor
 Fix For: 0.24.0

 Attachments: TEST-org.apache.hadoop.cli.TestMRCLI.txt


 Failing tests:
   7: Archive: Deleting a file in archive
   8: Archive: Renaming a file in archive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3222) ant test TestTaskContext failing on trunk


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned MAPREDUCE-3222:


Assignee: (was: Devaraj K)

 ant test TestTaskContext failing on trunk
 -

 Key: MAPREDUCE-3222
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3222
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Hitesh Shah
Priority: Minor
 Fix For: 0.24.0


 Testcase: testContextStatus took 29.977 sec
 FAILED
 null expected:map[  sort] but was:map[]
 junit.framework.ComparisonFailure: null expected:map[  sort] but 
 was:map[]
 at 
 org.apache.hadoop.mapreduce.TestTaskContext.testContextStatus(TestTaskContext.java:120)
 Testcase: testMapContextProgress took 17.371 sec
 Testcase: testReduceContextProgress took 16.267 sec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4743) Job is marking as FAILED and also throwing the Transition exception instead of KILLED when issues a KILL command


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-4743:
-

Summary: Job is marking as FAILED and also throwing the Transition 
exception instead of KILLED when issues a KILL command  (was: Job is marking as 
FAILED and also throwing thhe Transition exception instead of KILLED when 
issues a KILL command)

 Job is marking as FAILED and also throwing the Transition exception instead 
 of KILLED when issues a KILL command
 

 Key: MAPREDUCE-4743
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4743
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha
Reporter: Devaraj K
Assignee: Devaraj K

 {code:xml}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_KILL at SUCCEEDED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:605)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:903)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:897)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3841) Broken Server metrics and Local logs link under the tools menu


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned MAPREDUCE-3841:


Assignee: (was: Devaraj K)

 Broken Server metrics and Local logs link under the tools menu
 --

 Key: MAPREDUCE-3841
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3841
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil

 Local logs link redirects to the cluster page and Server metrics opens an 
 empty page on the RM/JHS homepage. So does the links from nodemanager UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3556) Resource Leaks in key flows


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-3556:
-

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Resource Leaks in key flows
 ---

 Key: MAPREDUCE-3556
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3556
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-3556.patch


 {code:title=MapTask.java|borderStyle=solid}
 {code} 
 {code:xml} 
  if (combinerRunner == null || numSpills  minSpillsForCombine) {
 Merger.writeFile(kvIter, writer, reporter, job);
   } else {
 combineCollector.setWriter(writer);
 combinerRunner.combine(kvIter, combineCollector);
   }
   //close
   writer.close();
 {code} 
 {code:title=InputSampler.java|borderStyle=solid}
 {code} 
 {code:xml} 
  for(int i = 1; i  numPartitions; ++i) {
   int k = Math.round(stepSize * i);
   while (last = k  comparator.compare(samples[last], samples[k]) == 0) 
 {
 ++k;
   }
   writer.append(samples[k], nullValue);
   last = k;
 }
 writer.close();{code} 
 The key flows have potential resource leaks. 
 {code:title=JobSplitWriter.java|borderStyle=solid}
 {code} 
 {code:xml} 
 SplitMetaInfo[] info = writeNewSplits(conf, splits, out);
 out.close();
 SplitMetaInfo[] info = writeOldSplits(splits, out);
 out.close();
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-3232) AM should handle reboot from Resource Manager


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved MAPREDUCE-3232.
--

Resolution: Not A Problem

 AM should  handle reboot from Resource Manager
 --

 Key: MAPREDUCE-3232
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3232
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Devaraj K
Assignee: Devaraj K

 When the RM doesn't have last response id for app attempt(or the request 
 response id is less than the last response id), RM sends reboot response but 
 AM doesn't handle this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4286) TestClientProtocolProviderImpls passes on failure conditions also


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-4286:
-

Affects Version/s: (was: 2.0.1-alpha)
   (was: 2.0.0-alpha)
   2.0.3-alpha
   0.23.5

 TestClientProtocolProviderImpls passes on failure conditions also
 -

 Key: MAPREDUCE-4286
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4286
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 2.0.3-alpha, 0.23.5
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-4286.patch, MAPREDUCE-4286.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556213#comment-13556213
]

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Regarding ...ShuffleConsumerPlugin API does not respect the API of the
ReduceCopier,... I think this comment is wrong. Please clarify!... You are
right, please disregard that comment. After integrating my comments into the
consumer side I think it (the consumer) is ready to go in.

Regarding the producer changes, I think that the default producer
implementation should implement the producer plugin interface as well. Once we
have that, the multiplexor plugin would be trivial, I'd be happy to help with
that. We can do the producer plugin as a subtask of this JIRA.

plugin for generic shuffle service
--

Key: MAPREDUCE-4049
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
Labels: merge, plugin, rdma, shuffle
Fix For: 3.0.0

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch

Support generic shuffle service as set of two plugins: ShuffleProvider
ShuffleConsumer.
This will satisfy the following needs:
# Better shuffle and merge performance. For example: we are working on
shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE,
or Infiniband) instead of using the current HTTP shuffle. Based on the fast
RDMA shuffle, the plugin can also utilize a suitable merge approach during
the intermediate merges. Hence, getting much better performance.
# Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden
dependency of NodeManager with a specific version of mapreduce shuffle
(currently targeted to 0.24.0).
References:
# Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu
from Auburn University with others,
[http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
# I am attaching 2 documents with suggested Top Level Design for both plugins
(currently, based on 1.0 branch)
# I am providing link for downloading UDA - Mellanox's open source plugin
that implements generic shuffle service using RDMA and levitated merge.
Note: At this phase, the code is in C++ through JNI and you should consider
it as beta only. Still, it can serve anyone that wants to implement or
contribute to levitated merge. (Please be advised that levitated merge is
mostly suit in very fast networks) -
[http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Arun C Murthy (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556247#comment-13556247
]

Arun C Murthy commented on MAPREDUCE-4808:
--

bq. The goal is to be able to write alternate implementations of the Shuffle

Alejandro - it seems like you understand something about the use-case that I
don't. Maybe you Asokan have had a private chat?

What are the use-cases for alternate implementations of the Shuffle? Like Chris
also mentioned with MAPREDUCE-4049 we already allow alternate implementations
of Shuffle, is this redundant then?

bq. While some of this logic replacement could be done at Merge level as you
suggested, other, like MapOutput allocation cannot be done there as this is
driven by the MergeManager.

So, a combination of MapOutput re-factor and Merger interface should suffice?

IAC, what are the use-cases for alternate implementations of MapOutput? Or, is
it the MapOutput re-factor merely a code-hygiene issue?

I'm not trying to be difficult here. But, I feel like I just don't understand
the use-case. So, I'd appreciate if we could focus on concrete use-cases for
the plugin. I admit I still am having a hard time understanding why we need
this complexity.

Thanks.

Refactor MapOutput and MergeManager to facilitate reuse by Shuffle
implementations
--

Key: MAPREDUCE-4808
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
Project: Hadoop Map/Reduce
Issue Type: New Feature
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
Fix For: 2.0.3-alpha

Attachments: COMBO-mapreduce-4809-4812-4808.patch,
mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
mapreduce-4808.patch, MergeManagerPlugin.pdf

Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for
alternate implementations to be able to reuse portions of the default
implementation.
This would come with the strong caveat that these classes are LimitedPrivate
and Unstable.

[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4808:
-

Fix Version/s: (was: 2.0.3-alpha)

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4808:
-

Affects Version/s: (was: 2.0.2-alpha)

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2013-01-17 Thread Avner BenHanoch (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556262#comment-13556262
]

Avner BenHanoch commented on MAPREDUCE-4049:

Thanks. So, we agreed upon the consumer details.

Now, For the producer details:
- Again, throughout the lifetime of this JIRA issue, the consumer producer
come together, since they are the 2 sides of the shuffle service. *This JIRA
issue has no value if it has one without the other.* Hence, they should be kept
together!
Additionally, I want it to be one patch that can be ported to any hadoop-1.x.y
version at once.
- The default producer implementation already implements the producer plugin
interface! (though, it is still loaded via HttpServlet interface) As I said, I
went on a keep it simple solution, in which I only support 1 extra provider
(with simple code and simple conf). Please clarify whether this is enough, or
rather you ask me to support N providers. I don't want to write a new
feature and then have someone say that we have a problem to introduce new
feature in hadoop-1.

plugin for generic shuffle service
--

Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf,
MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2013-01-17 Thread Avner BenHanoch (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556344#comment-13556344
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Additionally, as I wrote once, currently, there is no request and no use case 
for N providers.  Hence, do we really want that?

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: 3.0.0

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs

2013-01-17 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-4946:
-

 Summary: Type conversion of map completion events leads to 
performance problems with large jobs
 Key: MAPREDUCE-4946
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Jason Lowe
Priority: Critical


We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where 
reducers fail to connect back to the AM after being launched due to connection 
timeout.  Looking at stack traces of the AM during this time we see a lot of 
IPC servers stuck waiting for a lock to get the application ID while type 
converting the map completion events.  What's odd is that normally getting the 
application ID should be very cheap, but in this case we're type-converting 
thousands of map completion events for *each* reducer connecting.  That means 
we end up type-converting the map completion events over 45 million times 
during the lifetime of the example job (13,000 * 3,500).

We either need to make the type conversion much cheaper (i.e.: lockless or at 
least read-write locked) or, even better, store the completion events in a form 
that does not require type conversion when serving them up to reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs

2013-01-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556391#comment-13556391
 ] 

Jason Lowe commented on MAPREDUCE-4946:
---

This performance problem prevents the AM from reliably supporting very large 
jobs (i.e.: tens of thousands of maps and thousands of reducers) because it can 
take too long to serve up requests and other clients end up being ignored and 
timeout.  If the same task times out enough attempts then the whole job fails.

 Type conversion of map completion events leads to performance problems with 
 large jobs
 --

 Key: MAPREDUCE-4946
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Critical

 We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where 
 reducers fail to connect back to the AM after being launched due to 
 connection timeout.  Looking at stack traces of the AM during this time we 
 see a lot of IPC servers stuck waiting for a lock to get the application ID 
 while type converting the map completion events.  What's odd is that normally 
 getting the application ID should be very cheap, but in this case we're 
 type-converting thousands of map completion events for *each* reducer 
 connecting.  That means we end up type-converting the map completion events 
 over 45 million times during the lifetime of the example job (13,000 * 3,500).
 We either need to make the type conversion much cheaper (i.e.: lockless or at 
 least read-write locked) or, even better, store the completion events in a 
 form that does not require type conversion when serving them up to reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs

2013-01-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556394#comment-13556394
 ] 

Jason Lowe commented on MAPREDUCE-4946:
---

Sample stacktrace from one of the many IPC server threads waiting for a lock 
during type-conversion of the map completion events:

{noformat}
IPC Server handler 9 on 45874 daemon prio=10 tid=0x08f76800 nid=0x1c27 
waiting for monitor entry [0x10583000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.getAppId(JobIdPBImpl.java:78)
- waiting to lock 0x21e729b8 (a 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:65)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:119)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:211)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:185)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:178)
at 
org.apache.hadoop.mapred.TaskAttemptListenerImpl.getMapCompletionEvents(TaskAttemptListenerImpl.java:284)
at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:394)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1530)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1526)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1524)
{noformat}

 Type conversion of map completion events leads to performance problems with 
 large jobs
 --

 Key: MAPREDUCE-4946
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Critical

 We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where 
 reducers fail to connect back to the AM after being launched due to 
 connection timeout.  Looking at stack traces of the AM during this time we 
 see a lot of IPC servers stuck waiting for a lock to get the application ID 
 while type converting the map completion events.  What's odd is that normally 
 getting the application ID should be very cheap, but in this case we're 
 type-converting thousands of map completion events for *each* reducer 
 connecting.  That means we end up type-converting the map completion events 
 over 45 million times during the lifetime of the example job (13,000 * 3,500).
 We either need to make the type conversion much cheaper (i.e.: lockless or at 
 least read-write locked) or, even better, store the completion events in a 
 form that does not require type conversion when serving them up to reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4907) TrackerDistributedCacheManager issues too many getFileStatus calls

2013-01-17 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Graves updated MAPREDUCE-4907:
-

Fix Version/s: 0.23.7

merged this to branch-0.23

TrackerDistributedCacheManager issues too many getFileStatus calls
--

Key: MAPREDUCE-4907
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4907
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: mrv1, tasktracker
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Fix For: 1.2.0, 2.0.3-alpha, 0.23.7

Attachments: MAPREDUCE-4907.patch, MAPREDUCE-4907-trunk-1.patch,
MAPREDUCE-4907-trunk-1.patch, MAPREDUCE-4907-trunk-1.patch,
MAPREDUCE-4907-trunk.patch

TrackerDistributedCacheManager issues a number of redundant getFileStatus
calls when determining the timestamps and visibilities of files in the
distributed cache. 300 distributed cache files deep in the directory
structure can hammer HDFS with a couple thousand requests.
A couple optimizations can reduce this load:
1. determineTimestamps and determineCacheVisibilities both call getFileStatus
on every file. We could cache the results of the former and use them for the
latter.
2. determineCacheVisibilities needs to check that all ancestor directories of
each file have execute permissions for everyone. This currently entails a
getFileStatus on each ancestor directory for each file. The results of these
getFileStatus calls could be cached as well.

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

[
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556405#comment-13556405
]

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
MAPREDUCE-4049 expects the plugin implementer to implement the shuffle from
scratch. With the default implementation of HTTP shuffle being robust and
secure it is possible to reuse it in majority of the situations.

The alternate implementation of MapOutput can be left to the plugin
implementer. For example, it can be optimized to use less JVM memory and
minimize Java garbage collection.

Some of the concrete use cases for the plugin are: hash aggregation, hash join,
limit-N query, etc.

Thanks.

-- Asokan

Refactor MapOutput and MergeManager to facilitate reuse by Shuffle
implementations
--

Key: MAPREDUCE-4808
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
Project: Hadoop Map/Reduce
Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
Attachments: COMBO-mapreduce-4809-4812-4808.patch,
mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch,
mapreduce-4808.patch, MergeManagerPlugin.pdf

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556524#comment-13556524
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4049:
---

Regarding 'new features in Hadoop-1'. Small or big, this is a new feature and 
it should be treated as such. I'm all for having this in Hadoop 1. If you want 
I can start the discussion in common-dev@.

Regarding Again, throughout the lifetime of this JIRA issue,..., I see 
different ways this can be done:

* Keep everything in the same JIRA (as it is now) and wait till the whole patch 
is ready
* Break the JIRA in 2 subtasks, consumer and producer side
** Do it in branch-1 directly
** Do it in a dev branch (seems an overkill)

I'm OK with any approach, your call.

Regarding default implementation already implements the producer API.

Ahh, missed that because the initialize method it is not used.

With some minor tweaks to your patch I think we could get things done in a 
simple way:

* Add to the TT a 'public Server getHttpServer()' method
* In the TT constructor, where the MapOutputServlet is added to the HttpServer 
'server', remove that line and discover, instantiate and initialize the 
provider plugin.
* Don't make the MapOutputServlet to extends the provider interface.
* The default provider should be a class that simply adds the MapOutputServlet 
to the server via the TT.getHttpServer() method.
* Remove the logic to instantiate a custom single provider plugin.


A provider multiplexor would be a very simple class, something along the 
following lines:

{code}
public class MultiShuffleProviderPlugin implements ShuffleProviderPlugin {
  public static final String PLUGIN_CLASSES = 
hadoop.mapreduce.multi.shuffle.provider.classes;

  private ShuffleProviderPlugin[] plugins;

  public void initialize(TaskTracker tt) {
Configuration conf = tt.getJobConf();
Class[] klasses = conf.getClasses(PLUGIN_CLASSES, 
DefaultShuffleProvider.class);
//LOG INFO list of plugin classes
plugins = new ShuffleProviderPlugin[klasses.length];
for (int i = 0; i  klasses.length; i++) {
  plugins[i] = ReflectionUtils.newInstance(klasses[i], conf);
}
for (ShuffleProviderPlugin plugin : plugins) {
  plugin.initialize(tt);
}
  }

  public void destroy() {
if (plugins != null) {
  for (ShuffleProviderPlugin plugin : plugins) {
try {
  plugin.destroy();
} catch (Throwable ex) {
  //LOG WARN and ignore exception
}
  }
}
  }


}
{code}

And the default provider class would be:

{code}

public static class DefaultShuffleProviderPlugin implements 
ShuffleProviderPlugin {

  public void initialize(TaskTracker tt) {
  tt.getHttpServer().addInternalServlet(mapOutput, /mapOutput, 
MapOutputServlet.class);
  }

  public void destroy() {
  }


}
{code}


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: 3.0.0

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast

[jira] [Updated] (MAPREDUCE-4278) cannot run two local jobs in parallel from the same gateway.

2013-01-17 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4278:
-

Fix Version/s: 0.23.7

merged to branch-0.23

 cannot run two local jobs in parallel from the same gateway.
 

 Key: MAPREDUCE-4278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Araceli Henley
Assignee: Sandy Ryza
 Fix For: 1.2.0, 2.0.3-alpha, 0.23.7

 Attachments: MAPREDUCE-4278-2-branch1.patch, 
 MAPREDUCE-4278-3-branch1.patch, MAPREDUCE-4278-branch1.patch, 
 MAPREDUCE-4278-trunk.patch, MAPREDUCE-4278-trunk.patch


 I cannot run two local mode jobs from Pig in parallel from the same gateway, 
 this is a typical use case. If I re-run the tests sequentially, then the test 
 pass. This seems to be a problem from Hadoop.
 Additionally, the pig harness, expects to be able to run 
 Pig-version-undertest against Pig-version-stable from the same gateway.
 To replicate the error:
 I have two clusters running from the same gateway.
 If I run the Pig regression suites nightly.conf in local mode in paralell - 
 once on each cluster. Conflicts in M/R local mode result in failures in the 
 tests. 
 ERROR1:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
 output/file.out in any of the configured local directories
 at
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
 at
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
 at
 org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
 at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
 at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
 at org.apache.hadoop.mapred.Task.done(Task.java:875)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
 ---
 ERROR2:
 2012-05-17 20:25:36,762 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 -
 HadoopJobId: job_local_0001
 2012-05-17 20:25:36,778 [Thread-3] INFO  org.apache.hadoop.mapred.Task -
 Using ResourceCalculatorPlugin : org.apache.
 hadoop.util.LinuxResourceCalculatorPlugin@ffa490e
 2012-05-17 20:25:36,837 [Thread-3] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java
 :153)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputForm
 at.java:106)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:489)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 2012-05-17 20:25:41,291 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556536#comment-13556536
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

Asokan, the concern is that even breaking an API, even if it's marked unstable, 
is an incompatible change. Since the pluggable shuffle is particularly useful 
for frameworks, breaking this contract could require 
patching/validation/rewrite of plugin and optimizer code in projects that 
invest in it (Hive, Pig, etc.). Moreover, if we wanted to change the default 
{{Shuffle}} to a different implementation, then user/framework code would 
perform badly- or break- unless we exposed this implementation-specific 
mechanism in the _new_ impl. So it's fair to press for use cases, to ensure 
it's _sufficient_ and that the abstraction could apply to most {{Shuffle}} 
implementations.

Personally, I'm ambivalent about exposing this as an API and am +1 on the patch 
overall (mostly because I like the {{MapOutput}} refactoring). The user can 
always configure the current {{Shuffle}}, which is exactly how frameworks would 
handle this until they port/specialize their efficient {{MergeManager}} plugin.

As a compromise, would it make sense to just add a protected 
{{createMergeManager}} method to the {{Shuffle}}? The user still needs to 
configure their custom {{Shuffle}} impl now, but that's better than the 
inevitable future where they configure both. It also makes its tie to this 
implementation explicit.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556580#comment-13556580
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

Chris, are you suggesting?

* remove the MergeManagerPlugin interface
* introduce a protected createMergerManager() in the Shuffle class to 
instantiate (via new)  initialize the existing MergerManager.




 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556595#comment-13556595
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
  I will think about your suggestion to make the Merger class pluggable and 
post my findings for different use cases.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556600#comment-13556600
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Chris,
  I will work on creating a real working plugin for the use cases to show that 
the proposed API is sufficient to handle them.

-- Asokan

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556602#comment-13556602
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Alejandro,
  If the MergeManagerPlugin is to be removed, it should be possible to extend 
the framework's MergeManager by an external implementation.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556606#comment-13556606
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Alejandro,
  I meant to ask whether it is okay to make the existing MergeManager to be 
extendable?

-- Asokan

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556705#comment-13556705
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
  I will try to explain a simple use case of an external implementation of 
merge on the reduce side.  Let us say this merge implementation has some fixed 
area of memory (Java byte array) allocated to store the shuffled data.  This 
may be done to avoid frequent garbage collection by JVM or for better processor 
cache efficiency.

Looking at the methods in the {{Merge}} class, they either accept input to the 
merge in disk files(array of {{Path}} objects) or memory segments(list of 
{{Segment}} objects.)  The former is not suitable since merge is done in memory 
first and any intermediate merged output file is under the control of the 
plugin implementation.  The latter is not suitable because memory for the 
shuffled data is not under the control of the plugin implementation.

Ideally, if an {{InputStream}} object is available, the external implementation 
can read shuffled data from the stream to the fixed area of memory at a 
specific offset in the byte array.

With the {{MergeManagerPlugin,}} the external implementation will get the HTTP 
connection's {{InputStream}} object via the {{shuffle()}} method in 
{{MapOutput}} object.  In addition, if merge goes though multiple passes 
because the memory area is limited in size, there should be some way for the 
{{Shuffle}} to wait until memory is released by a merge pass.  There is no 
method in {{Merge}} for that either.

I find that it is possible to define the interaction points between current 
{{Shuffle}} and {{MergeManager}} using the {{MergeManagerPlugin}} interface.  
The plugin interface has only three methods and it allows the external plugin 
to have a lot of freedom in its implementation.  As a side effect, the 
{{MapOutput}} is also refactored.

Hope I explained this well.  If you have any questions, please let me know.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4808:
--

Attachment: MR-4808.patch

I've taken the liberty to tweak the patch a bit based the last comments in the 
JIRA.

* removed pluggability via config of the MergeManager
* Shuffle has a protected createMergeManager() method
* MergeManager is annotated as Private
* Kept MergeManagerPlugin interface
* Removed MergeManagerPlugin.Context
* MergeManagerPlugin interface annotated as Private

These changes avoid having an extra know (the MergeManager class) in the 
config. Keep the MergeManager owned by the Shuffle class. The interface allows, 
for impls like Jerry's and Asokan's, for alternate implementations.

Asokan, Arun, Chris?

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556738#comment-13556738
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

+1 Looked through it; the latest patch lgtm. Asokan, is that sufficient for 
your use cases? Arun?

_Very_ minor, optional nit: {{s/MergeManager/MergeManagerImpl/}} and 
{{s/MergeManagerPlugin/MergeManager/}}. There's an argument to be made for 
doing the same with the {{ShuffleScheduler}} while we're at it, but neither of 
these are blocking, IMO.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

[
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556749#comment-13556749
]

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Chris,
Thanks for your quick feedback. I looked at the patch. It has one minor
nit. The {{createMergeManager}} method should take
{{ShuffleConsumerPlugin.Context}} object. I will go over it one more time, work
out the change, run tests, and post the patch shortly.

Thanks.

-- Asokan

Refactor MapOutput and MergeManager to facilitate reuse by Shuffle
implementations
--

[jira] [Commented] (MAPREDUCE-4923) Add toString method to TaggedInputSplit


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556772#comment-13556772
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4923:
---

+1. Please open another jira for the annotation changes, thx

 Add toString method to TaggedInputSplit
 ---

 Key: MAPREDUCE-4923
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2, task
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch


 Per MAPREDUCE-3678, map task logs now contain information about the input 
 split being processed.  Because TaggedInputSplit has no overridden toString 
 method, nothing useful gets printed out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4929) mapreduce.task.timeout is ignored


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556774#comment-13556774
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4929:
---

It looks good to me, but before committing it, what is the precedence behavior 
in trunk? to make sure we have the same behavior.

 mapreduce.task.timeout is ignored
 -

 Key: MAPREDUCE-4929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4929
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4929-branch-1.patch


 In MR1, only mapred.task.timeout works.  Both should be made to work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4929) mapreduce.task.timeout is ignored

2013-01-17 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556795#comment-13556795
 ] 

Sandy Ryza commented on MAPREDUCE-4929:
---

It doesn't exactly appear that there is precedence behavior in trunk.  When a 
Configuration#set() is called for a config with deprecations, all the 
corresponding configs are set.  So if we came across mapred.task.timeout first 
in a config file, both mapred.task.timeout and mapreduce.task.timeout would get 
set. Then if we came across mapreduce.task.timeout afterwards, both would get 
overriden.

 mapreduce.task.timeout is ignored
 -

 Key: MAPREDUCE-4929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4929
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4929-branch-1.patch


 In MR1, only mapred.task.timeout works.  Both should be made to work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4923) Add toString method to TaggedInputSplit

2013-01-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556822#comment-13556822
 ] 

Hudson commented on MAPREDUCE-4923:
---

Integrated in Hadoop-trunk-Commit #3258 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3258/])
MAPREDUCE-4923. Add toString method to TaggedInputSplit. (sandyr via tucu) 
(Revision 1434993)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1434993
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/TaggedInputSplit.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplit.java


 Add toString method to TaggedInputSplit
 ---

 Key: MAPREDUCE-4923
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2, task
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch


 Per MAPREDUCE-3678, map task logs now contain information about the input 
 split being processed.  Because TaggedInputSplit has no overridden toString 
 method, nothing useful gets printed out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4923) Add toString method to TaggedInputSplit


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4923:
--

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
   1.2.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Sandy. Committed to trunk, branch-1 and branch-2.

 Add toString method to TaggedInputSplit
 ---

 Key: MAPREDUCE-4923
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2, task
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch


 Per MAPREDUCE-3678, map task logs now contain information about the input 
 split being processed.  Because TaggedInputSplit has no overridden toString 
 method, nothing useful gets printed out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4911:
--

Attachment: MAPREDUCE-4911.patch

Add JobConf to configuration about node-level aggregation.
This patch also includes tests against the changes.

 Add node-level aggregation flag feature(setLocalAggregation(boolean)) to 
 JobConf
 

 Key: MAPREDUCE-4911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4911.patch


 This JIRA adds node-level aggregation flag 
 feature(setLocalAggregation(boolean)) to JobConf.
 This task is subtask of MAPREDUCE-4502.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4911:
--

 Target Version/s: trunk
Affects Version/s: trunk

 Add node-level aggregation flag feature(setLocalAggregation(boolean)) to 
 JobConf
 

 Key: MAPREDUCE-4911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Affects Versions: trunk
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4911.patch


 This JIRA adds node-level aggregation flag 
 feature(setLocalAggregation(boolean)) to JobConf.
 This task is subtask of MAPREDUCE-4502.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-4911 started by Tsuyoshi OZAWA.

 Add node-level aggregation flag feature(setLocalAggregation(boolean)) to 
 JobConf
 

 Key: MAPREDUCE-4911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Affects Versions: trunk
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4911.patch


 This JIRA adds node-level aggregation flag 
 feature(setLocalAggregation(boolean)) to JobConf.
 This task is subtask of MAPREDUCE-4502.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4911:
--

Status: Patch Available  (was: In Progress)

 Add node-level aggregation flag feature(setLocalAggregation(boolean)) to 
 JobConf
 

 Key: MAPREDUCE-4911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Affects Versions: trunk
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4911.patch


 This JIRA adds node-level aggregation flag 
 feature(setLocalAggregation(boolean)) to JobConf.
 This task is subtask of MAPREDUCE-4502.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-4863) Adding aggregationWaitMap for node-level combiner.

[
https://issues.apache.org/jira/browse/MAPREDUCE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on MAPREDUCE-4863 started by Tsuyoshi OZAWA.

Adding aggregationWaitMap for node-level combiner.
--

Key: MAPREDUCE-4863
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4863
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: applicationmaster
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Attachments:
0002-Adding-aggregationWaitMap-for-node-level-combiner.patch

To manage node/rack-level combining, MRAppMaster needs to have a management
information about outputs of completed MapTasks to be aggregated.
AggregationWaitMap is used so that MRAppMaster decides whether or not
MapTasks start to combine local MapOutputFiles.
AggregationWaitMap is a abstraction class of ConcurrentHashMapString,
ArrayListTaskAttemptCompletionEvent. These Events are candidate files to be
aggregated.
When MapTasks are completed, MRAppMaster buffer TaskAttemptCompletionEvent
into AggregationWaitMap to delay reducers' fethcing outputs from mappers
until node-level aggregation are finished. After node-level aggreagtion,
MRAppMaster write back mapAttemptCompletionEvents, to restart reducers'
feching outputs from mappers.

[jira] [Work started] (MAPREDUCE-4864) Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-4864 started by Tsuyoshi OZAWA.

 Adding new umbilical protocol RPC, getAggregationTargets(), for node-level 
 combiner.
 --

 Key: MAPREDUCE-4864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4864
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2, tasktracker
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: 
 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg-20130116.patch, 
 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg.patch


 MapTasks need to know whether or not they should start node-level combiner 
 agaist outputs of mapper on their node. 
 The new umbilical RPC, getAggregationTargets(), is used to get outputs to be 
 aggregated on the node. The definition as follows:
 AggregationTarget getAggregationTargets(TaskAttemptID aggregator) throws 
 IOException;
 AggregationTarget is a abstraction class of array of TaskAttemptID to be 
 aggregated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4864) Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4864:
--

Affects Version/s: trunk

 Adding new umbilical protocol RPC, getAggregationTargets(), for node-level 
 combiner.
 --

 Key: MAPREDUCE-4864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4864
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2, tasktracker
Affects Versions: 3.0.0, trunk
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: 
 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg-20130116.patch, 
 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg.patch


 MapTasks need to know whether or not they should start node-level combiner 
 agaist outputs of mapper on their node. 
 The new umbilical RPC, getAggregationTargets(), is used to get outputs to be 
 aggregated on the node. The definition as follows:
 AggregationTarget getAggregationTargets(TaskAttemptID aggregator) throws 
 IOException;
 AggregationTarget is a abstraction class of array of TaskAttemptID to be 
 aggregated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-4910) Adding AggregationWaitMap to some components(MRAppMaster, TaskAttemptListener, JobImpl, MapTaskImpl).


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-4910 started by Tsuyoshi OZAWA.

 Adding AggregationWaitMap to some components(MRAppMaster, 
 TaskAttemptListener, JobImpl, MapTaskImpl).
 -

 Key: MAPREDUCE-4910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4910
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2, task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: 
 0003-Adding-AggregationWaitMap-to-some-components-MRAppMa.patch, 
 0004-Add-AggregationWaitMap-to-some-components-MRAppMaste.patch


 To implement MR-4502, AggregationWaitMap need to be used by some 
 components(MRAppMaster, TaskAttemptListener, JobImpl, MapTaskImpl).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-4865) Launching node-level combiner at the end stage of MapTask and ignoring aggregated inputs at ReduceTask


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-4865 started by Tsuyoshi OZAWA.

 Launching node-level combiner at the end stage of MapTask and ignoring 
 aggregated inputs at ReduceTask
 --

 Key: MAPREDUCE-4865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4865
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: 
 0003-Changed-Mappers-and-Reducers-to-support-Node-level-aggregation-20130116.patch,
  0004-Changed-Mappers-and-Reducers-to-support-Node-level-a.patch


 MapTask needs to start node-level aggregation against local outputs at the 
 end stage of MapTask after calling getAggregationTargets().
 This feature is implemented with Merger and CombinerRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack