[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556000#comment-13556000 ] Avner BenHanoch commented on MAPREDUCE-4049: Hi Alejandro - thanks for your thorough and fast review! regarding {quote} ReducerCopier class should be made public static in order to be able to be created via ReflectionUtils.newInstance() {quote} ... cool! Actually, I went in this direction in my very first patch. I am happy to return to it. (notice, that it will introduce changes in all places that currently ReduceCopier directly uses members of the encapsulating ReduceTask object - but, i believe this is correct thing) regarding: {quote} I've just noticed, that your ShuffleConsumerPlugin API does not respect the API of the ReduceCopier, the createKVIterator() method has a different signature. The parameters being passed to it, in your patch, are already avail in the Context, except for the FileSystem, but you could create the FileSystem (and obtain the raw) within the your plugin impl using the conf received in the context. {quote} I think this comment is wrong. Please clarify! Regarding {quote} I'm not trilled about the TT loading the default shuffle provider (which is not implementting the new shuffle provider interface) and in addition one extra custom shuffle provider. Instead, I'd say the current shuffle provider logic should be refactored into a shuffle provider implementation and this one loaded by default. And, if as you indicated before, you want to load different impls simultaneously, then a shuffle plugin multiplexor implementation could be used. This increases the scope of the changes, thus why I'd like to do this in a separate JIRA and keep this JIRA for the consumer (reducer) side. {quote} Actually, I wrote above _my intuition is that supporting 1 external shuffle service (in addition to the built-in shuffle service) is the 'keep it simple' solution. I feel that the use case of N providers is theoretical. Hence, I prefer to keep the conf and code simple_. This clarify why I wrote my patch in this way instead of introducing big feature with shuffle plugin multiplexor... in hadoop-1. *Again, this JIRA issue - since its creation - focus on _Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer_. It has no value for me, if it deals with consumer only.* *I am fine with all the rest of your comments. Please let me know if I can continue according to this!* Avner plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4077) Issues while using Hadoop Streaming job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved MAPREDUCE-4077. -- Resolution: Not A Problem Issues while using Hadoop Streaming job --- Key: MAPREDUCE-4077 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4077 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Devaraj K Assignee: Devaraj K When we use -file option it says deprecated and use -files. {code:xml} linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop jar ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop -output /test/output/3 -mapper cat -reducer wc -file hadoop 02/02/19 10:55:51 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. {code} But when we use -files option, it says unrecognized option. {code:xml} linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop jar ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop -output /test/output/3 -mapper cat -reducer wc -files hadoop 02/02/19 10:56:42 ERROR streaming.StreamJob: Unrecognized option: -files Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options] {code} When we use -archives option, it says unrecognized option. {code:xml} linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop jar ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop -output /test/output/3 -mapper cat -reducer wc -archives testarchive.rar 02/02/19 11:05:43 ERROR streaming.StreamJob: Unrecognized option: -archives Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options] {code} But in the options it will display the usage of the -archives. {code:xml} linux-f330:/home/devaraj/hadoop/trunk/hadoop-0.24.0-SNAPSHOT/bin # ./hadoop jar ../share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -input /hadoop -output /test/output/3 -mapper cat -reducer wc -archives testarchive.rar 02/02/19 11:05:43 ERROR streaming.StreamJob: Unrecognized option: -archives Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options] .. .. -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2309) While querying the Job Statics from the command-line, if we give wrong status name then there is no warning or response.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-2309: - Resolution: Won't Fix Status: Resolved (was: Patch Available) While querying the Job Statics from the command-line, if we give wrong status name then there is no warning or response. Key: MAPREDUCE-2309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Devaraj K Assignee: Devaraj K Priority: Minor Fix For: 0.22.1 Attachments: MAPREDUCE-2309-0.20.patch, MAPREDUCE-2309-trunk.patch If we try to get the jobs information by giving the wrong status name from the command line interface, it is not giving any warning or response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2548) Log improvements in DBOutputFormat.java and CounterGroup.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-2548: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Log improvements in DBOutputFormat.java and CounterGroup.java - Key: MAPREDUCE-2548 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2548 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-2548-1.patch, MAPREDUCE-2548.patch 1. Instead of the printing the stack trace on the console, It can be logged. {code:title=DBOutputFormat.java|borderStyle=solid} public void write(K key, V value) throws IOException { try { key.write(statement); statement.addBatch(); } catch (SQLException e) { e.printStackTrace(); } } {code} 2. Missing resource information can be logged. {code:title=CounterGroup .java|borderStyle=solid} protected CounterGroup(String name) { this.name = name; try { bundle = getResourceBundle(name); } catch (MissingResourceException neverMind) { } displayName = localize(CounterGroupName, name); } private String localize(String key, String defaultValue) { String result = defaultValue; if (bundle != null) { try { result = bundle.getString(key); } catch (MissingResourceException mre) { } } return result; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2562) NullPointerException in Jobtracker when it is started without Name Node
[ https://issues.apache.org/jira/browse/MAPREDUCE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-2562: - Resolution: Won't Fix Status: Resolved (was: Patch Available) NullPointerException in Jobtracker when it is started without Name Node --- Key: MAPREDUCE-2562 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2562 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Devaraj K Assignee: Devaraj K Fix For: 0.22.1 Attachments: MAPREDUCE-2562.patch It is throwing NullPointerException in job tracker logs when job tracker is started without NameNode. {code:title=Bar.java|borderStyle=solid} 2011-06-03 01:50:04,304 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.18.52.225:9000. Already tried 7 time(s). 2011-06-03 01:50:05,307 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.18.52.225:9000. Already tried 8 time(s). 2011-06-03 01:50:06,310 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.18.52.225:9000. Already tried 9 time(s). 2011-06-03 01:50:21,243 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1635) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:287) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:279) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:274) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4312) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3207) TestMRCLI failing on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned MAPREDUCE-3207: Assignee: (was: Devaraj K) TestMRCLI failing on trunk Key: MAPREDUCE-3207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3207 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2 Affects Versions: 0.23.0 Reporter: Hitesh Shah Priority: Minor Fix For: 0.24.0 Attachments: TEST-org.apache.hadoop.cli.TestMRCLI.txt Failing tests: 7: Archive: Deleting a file in archive 8: Archive: Renaming a file in archive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3222) ant test TestTaskContext failing on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned MAPREDUCE-3222: Assignee: (was: Devaraj K) ant test TestTaskContext failing on trunk - Key: MAPREDUCE-3222 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3222 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2 Affects Versions: 0.23.0 Reporter: Hitesh Shah Priority: Minor Fix For: 0.24.0 Testcase: testContextStatus took 29.977 sec FAILED null expected:map[ sort] but was:map[] junit.framework.ComparisonFailure: null expected:map[ sort] but was:map[] at org.apache.hadoop.mapreduce.TestTaskContext.testContextStatus(TestTaskContext.java:120) Testcase: testMapContextProgress took 17.371 sec Testcase: testReduceContextProgress took 16.267 sec -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4743) Job is marking as FAILED and also throwing the Transition exception instead of KILLED when issues a KILL command
[ https://issues.apache.org/jira/browse/MAPREDUCE-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-4743: - Summary: Job is marking as FAILED and also throwing the Transition exception instead of KILLED when issues a KILL command (was: Job is marking as FAILED and also throwing thhe Transition exception instead of KILLED when issues a KILL command) Job is marking as FAILED and also throwing the Transition exception instead of KILLED when issues a KILL command Key: MAPREDUCE-4743 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4743 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_KILL at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:605) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:903) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:897) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3841) Broken Server metrics and Local logs link under the tools menu
[ https://issues.apache.org/jira/browse/MAPREDUCE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned MAPREDUCE-3841: Assignee: (was: Devaraj K) Broken Server metrics and Local logs link under the tools menu -- Key: MAPREDUCE-3841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3841 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Ramya Sunil Local logs link redirects to the cluster page and Server metrics opens an empty page on the RM/JHS homepage. So does the links from nodemanager UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3556) Resource Leaks in key flows
[ https://issues.apache.org/jira/browse/MAPREDUCE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-3556: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Resource Leaks in key flows --- Key: MAPREDUCE-3556 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3556 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.0 Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-3556.patch {code:title=MapTask.java|borderStyle=solid} {code} {code:xml} if (combinerRunner == null || numSpills minSpillsForCombine) { Merger.writeFile(kvIter, writer, reporter, job); } else { combineCollector.setWriter(writer); combinerRunner.combine(kvIter, combineCollector); } //close writer.close(); {code} {code:title=InputSampler.java|borderStyle=solid} {code} {code:xml} for(int i = 1; i numPartitions; ++i) { int k = Math.round(stepSize * i); while (last = k comparator.compare(samples[last], samples[k]) == 0) { ++k; } writer.append(samples[k], nullValue); last = k; } writer.close();{code} The key flows have potential resource leaks. {code:title=JobSplitWriter.java|borderStyle=solid} {code} {code:xml} SplitMetaInfo[] info = writeNewSplits(conf, splits, out); out.close(); SplitMetaInfo[] info = writeOldSplits(splits, out); out.close(); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3232) AM should handle reboot from Resource Manager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved MAPREDUCE-3232. -- Resolution: Not A Problem AM should handle reboot from Resource Manager -- Key: MAPREDUCE-3232 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3232 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.24.0 Reporter: Devaraj K Assignee: Devaraj K When the RM doesn't have last response id for app attempt(or the request response id is less than the last response id), RM sends reboot response but AM doesn't handle this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4286) TestClientProtocolProviderImpls passes on failure conditions also
[ https://issues.apache.org/jira/browse/MAPREDUCE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-4286: - Affects Version/s: (was: 2.0.1-alpha) (was: 2.0.0-alpha) 2.0.3-alpha 0.23.5 TestClientProtocolProviderImpls passes on failure conditions also - Key: MAPREDUCE-4286 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4286 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.2-alpha, 2.0.3-alpha, 0.23.5 Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4286.patch, MAPREDUCE-4286.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556213#comment-13556213 ] Alejandro Abdelnur commented on MAPREDUCE-4049: --- Regarding ...ShuffleConsumerPlugin API does not respect the API of the ReduceCopier,... I think this comment is wrong. Please clarify!... You are right, please disregard that comment. After integrating my comments into the consumer side I think it (the consumer) is ready to go in. Regarding the producer changes, I think that the default producer implementation should implement the producer plugin interface as well. Once we have that, the multiplexor plugin would be trivial, I'd be happy to help with that. We can do the producer plugin as a subtask of this JIRA. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556247#comment-13556247 ] Arun C Murthy commented on MAPREDUCE-4808: -- bq. The goal is to be able to write alternate implementations of the Shuffle Alejandro - it seems like you understand something about the use-case that I don't. Maybe you Asokan have had a private chat? What are the use-cases for alternate implementations of the Shuffle? Like Chris also mentioned with MAPREDUCE-4049 we already allow alternate implementations of Shuffle, is this redundant then? bq. While some of this logic replacement could be done at Merge level as you suggested, other, like MapOutput allocation cannot be done there as this is driven by the MergeManager. So, a combination of MapOutput re-factor and Merger interface should suffice? IAC, what are the use-cases for alternate implementations of MapOutput? Or, is it the MapOutput re-factor merely a code-hygiene issue? I'm not trying to be difficult here. But, I feel like I just don't understand the use-case. So, I'd appreciate if we could focus on concrete use-cases for the plugin. I admit I still am having a hard time understanding why we need this complexity. Thanks. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4808: - Fix Version/s: (was: 2.0.3-alpha) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4808: - Affects Version/s: (was: 2.0.2-alpha) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556262#comment-13556262 ] Avner BenHanoch commented on MAPREDUCE-4049: Thanks. So, we agreed upon the consumer details. Now, For the producer details: - Again, throughout the lifetime of this JIRA issue, the consumer producer come together, since they are the 2 sides of the shuffle service. *This JIRA issue has no value if it has one without the other.* Hence, they should be kept together! Additionally, I want it to be one patch that can be ported to any hadoop-1.x.y version at once. - The default producer implementation already implements the producer plugin interface! (though, it is still loaded via HttpServlet interface) As I said, I went on a keep it simple solution, in which I only support 1 extra provider (with simple code and simple conf). Please clarify whether this is enough, or rather you ask me to support N providers. I don't want to write a new feature and then have someone say that we have a problem to introduce new feature in hadoop-1. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556344#comment-13556344 ] Avner BenHanoch commented on MAPREDUCE-4049: Additionally, as I wrote once, currently, there is no request and no use case for N providers. Hence, do we really want that? plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
Jason Lowe created MAPREDUCE-4946: - Summary: Type conversion of map completion events leads to performance problems with large jobs Key: MAPREDUCE-4946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Jason Lowe Priority: Critical We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500). We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556391#comment-13556391 ] Jason Lowe commented on MAPREDUCE-4946: --- This performance problem prevents the AM from reliably supporting very large jobs (i.e.: tens of thousands of maps and thousands of reducers) because it can take too long to serve up requests and other clients end up being ignored and timeout. If the same task times out enough attempts then the whole job fails. Type conversion of map completion events leads to performance problems with large jobs -- Key: MAPREDUCE-4946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Priority: Critical We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500). We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556394#comment-13556394 ] Jason Lowe commented on MAPREDUCE-4946: --- Sample stacktrace from one of the many IPC server threads waiting for a lock during type-conversion of the map completion events: {noformat} IPC Server handler 9 on 45874 daemon prio=10 tid=0x08f76800 nid=0x1c27 waiting for monitor entry [0x10583000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.getAppId(JobIdPBImpl.java:78) - waiting to lock 0x21e729b8 (a org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:65) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:119) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:211) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:185) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:178) at org.apache.hadoop.mapred.TaskAttemptListenerImpl.getMapCompletionEvents(TaskAttemptListenerImpl.java:284) at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:394) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1530) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1526) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1524) {noformat} Type conversion of map completion events leads to performance problems with large jobs -- Key: MAPREDUCE-4946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Priority: Critical We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout. Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events. What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting. That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500). We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4907) TrackerDistributedCacheManager issues too many getFileStatus calls
[ https://issues.apache.org/jira/browse/MAPREDUCE-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-4907: - Fix Version/s: 0.23.7 merged this to branch-0.23 TrackerDistributedCacheManager issues too many getFileStatus calls -- Key: MAPREDUCE-4907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, tasktracker Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.2.0, 2.0.3-alpha, 0.23.7 Attachments: MAPREDUCE-4907.patch, MAPREDUCE-4907-trunk-1.patch, MAPREDUCE-4907-trunk-1.patch, MAPREDUCE-4907-trunk-1.patch, MAPREDUCE-4907-trunk.patch TrackerDistributedCacheManager issues a number of redundant getFileStatus calls when determining the timestamps and visibilities of files in the distributed cache. 300 distributed cache files deep in the directory structure can hammer HDFS with a couple thousand requests. A couple optimizations can reduce this load: 1. determineTimestamps and determineCacheVisibilities both call getFileStatus on every file. We could cache the results of the former and use them for the latter. 2. determineCacheVisibilities needs to check that all ancestor directories of each file have execute permissions for everyone. This currently entails a getFileStatus on each ancestor directory for each file. The results of these getFileStatus calls could be cached as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556405#comment-13556405 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, MAPREDUCE-4049 expects the plugin implementer to implement the shuffle from scratch. With the default implementation of HTTP shuffle being robust and secure it is possible to reuse it in majority of the situations. The alternate implementation of MapOutput can be left to the plugin implementer. For example, it can be optimized to use less JVM memory and minimize Java garbage collection. Some of the concrete use cases for the plugin are: hash aggregation, hash join, limit-N query, etc. Thanks. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556524#comment-13556524 ] Alejandro Abdelnur commented on MAPREDUCE-4049: --- Regarding 'new features in Hadoop-1'. Small or big, this is a new feature and it should be treated as such. I'm all for having this in Hadoop 1. If you want I can start the discussion in common-dev@. Regarding Again, throughout the lifetime of this JIRA issue,..., I see different ways this can be done: * Keep everything in the same JIRA (as it is now) and wait till the whole patch is ready * Break the JIRA in 2 subtasks, consumer and producer side ** Do it in branch-1 directly ** Do it in a dev branch (seems an overkill) I'm OK with any approach, your call. Regarding default implementation already implements the producer API. Ahh, missed that because the initialize method it is not used. With some minor tweaks to your patch I think we could get things done in a simple way: * Add to the TT a 'public Server getHttpServer()' method * In the TT constructor, where the MapOutputServlet is added to the HttpServer 'server', remove that line and discover, instantiate and initialize the provider plugin. * Don't make the MapOutputServlet to extends the provider interface. * The default provider should be a class that simply adds the MapOutputServlet to the server via the TT.getHttpServer() method. * Remove the logic to instantiate a custom single provider plugin. A provider multiplexor would be a very simple class, something along the following lines: {code} public class MultiShuffleProviderPlugin implements ShuffleProviderPlugin { public static final String PLUGIN_CLASSES = hadoop.mapreduce.multi.shuffle.provider.classes; private ShuffleProviderPlugin[] plugins; public void initialize(TaskTracker tt) { Configuration conf = tt.getJobConf(); Class[] klasses = conf.getClasses(PLUGIN_CLASSES, DefaultShuffleProvider.class); //LOG INFO list of plugin classes plugins = new ShuffleProviderPlugin[klasses.length]; for (int i = 0; i klasses.length; i++) { plugins[i] = ReflectionUtils.newInstance(klasses[i], conf); } for (ShuffleProviderPlugin plugin : plugins) { plugin.initialize(tt); } } public void destroy() { if (plugins != null) { for (ShuffleProviderPlugin plugin : plugins) { try { plugin.destroy(); } catch (Throwable ex) { //LOG WARN and ignore exception } } } } } {code} And the default provider class would be: {code} public static class DefaultShuffleProviderPlugin implements ShuffleProviderPlugin { public void initialize(TaskTracker tt) { tt.getHttpServer().addInternalServlet(mapOutput, /mapOutput, MapOutputServlet.class); } public void destroy() { } } {code} plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast
[jira] [Updated] (MAPREDUCE-4278) cannot run two local jobs in parallel from the same gateway.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-4278: - Fix Version/s: 0.23.7 merged to branch-0.23 cannot run two local jobs in parallel from the same gateway. Key: MAPREDUCE-4278 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4278 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Araceli Henley Assignee: Sandy Ryza Fix For: 1.2.0, 2.0.3-alpha, 0.23.7 Attachments: MAPREDUCE-4278-2-branch1.patch, MAPREDUCE-4278-3-branch1.patch, MAPREDUCE-4278-branch1.patch, MAPREDUCE-4278-trunk.patch, MAPREDUCE-4278-trunk.patch I cannot run two local mode jobs from Pig in parallel from the same gateway, this is a typical use case. If I re-run the tests sequentially, then the test pass. This seems to be a problem from Hadoop. Additionally, the pig harness, expects to be able to run Pig-version-undertest against Pig-version-stable from the same gateway. To replicate the error: I have two clusters running from the same gateway. If I run the Pig regression suites nightly.conf in local mode in paralell - once on each cluster. Conflicts in M/R local mode result in failures in the tests. ERROR1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/file.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924) at org.apache.hadoop.mapred.Task.done(Task.java:875) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374) --- ERROR2: 2012-05-17 20:25:36,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001 2012-05-17 20:25:36,778 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache. hadoop.util.LinuxResourceCalculatorPlugin@ffa490e 2012-05-17 20:25:36,837 [Thread-3] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java :153) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputForm at.java:106) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:489) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 2012-05-17 20:25:41,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556536#comment-13556536 ] Chris Douglas commented on MAPREDUCE-4808: -- Asokan, the concern is that even breaking an API, even if it's marked unstable, is an incompatible change. Since the pluggable shuffle is particularly useful for frameworks, breaking this contract could require patching/validation/rewrite of plugin and optimizer code in projects that invest in it (Hive, Pig, etc.). Moreover, if we wanted to change the default {{Shuffle}} to a different implementation, then user/framework code would perform badly- or break- unless we exposed this implementation-specific mechanism in the _new_ impl. So it's fair to press for use cases, to ensure it's _sufficient_ and that the abstraction could apply to most {{Shuffle}} implementations. Personally, I'm ambivalent about exposing this as an API and am +1 on the patch overall (mostly because I like the {{MapOutput}} refactoring). The user can always configure the current {{Shuffle}}, which is exactly how frameworks would handle this until they port/specialize their efficient {{MergeManager}} plugin. As a compromise, would it make sense to just add a protected {{createMergeManager}} method to the {{Shuffle}}? The user still needs to configure their custom {{Shuffle}} impl now, but that's better than the inevitable future where they configure both. It also makes its tie to this implementation explicit. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556580#comment-13556580 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- Chris, are you suggesting? * remove the MergeManagerPlugin interface * introduce a protected createMergerManager() in the Shuffle class to instantiate (via new) initialize the existing MergerManager. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556595#comment-13556595 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, I will think about your suggestion to make the Merger class pluggable and post my findings for different use cases. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556600#comment-13556600 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Chris, I will work on creating a real working plugin for the use cases to show that the proposed API is sufficient to handle them. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556602#comment-13556602 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Alejandro, If the MergeManagerPlugin is to be removed, it should be possible to extend the framework's MergeManager by an external implementation. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556606#comment-13556606 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Alejandro, I meant to ask whether it is okay to make the existing MergeManager to be extendable? -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556705#comment-13556705 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, I will try to explain a simple use case of an external implementation of merge on the reduce side. Let us say this merge implementation has some fixed area of memory (Java byte array) allocated to store the shuffled data. This may be done to avoid frequent garbage collection by JVM or for better processor cache efficiency. Looking at the methods in the {{Merge}} class, they either accept input to the merge in disk files(array of {{Path}} objects) or memory segments(list of {{Segment}} objects.) The former is not suitable since merge is done in memory first and any intermediate merged output file is under the control of the plugin implementation. The latter is not suitable because memory for the shuffled data is not under the control of the plugin implementation. Ideally, if an {{InputStream}} object is available, the external implementation can read shuffled data from the stream to the fixed area of memory at a specific offset in the byte array. With the {{MergeManagerPlugin,}} the external implementation will get the HTTP connection's {{InputStream}} object via the {{shuffle()}} method in {{MapOutput}} object. In addition, if merge goes though multiple passes because the memory area is limited in size, there should be some way for the {{Shuffle}} to wait until memory is released by a merge pass. There is no method in {{Merge}} for that either. I find that it is possible to define the interaction points between current {{Shuffle}} and {{MergeManager}} using the {{MergeManagerPlugin}} interface. The plugin interface has only three methods and it allows the external plugin to have a lot of freedom in its implementation. As a side effect, the {{MapOutput}} is also refactored. Hope I explained this well. If you have any questions, please let me know. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4808: -- Attachment: MR-4808.patch I've taken the liberty to tweak the patch a bit based the last comments in the JIRA. * removed pluggability via config of the MergeManager * Shuffle has a protected createMergeManager() method * MergeManager is annotated as Private * Kept MergeManagerPlugin interface * Removed MergeManagerPlugin.Context * MergeManagerPlugin interface annotated as Private These changes avoid having an extra know (the MergeManager class) in the config. Keep the MergeManager owned by the Shuffle class. The interface allows, for impls like Jerry's and Asokan's, for alternate implementations. Asokan, Arun, Chris? Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556738#comment-13556738 ] Chris Douglas commented on MAPREDUCE-4808: -- +1 Looked through it; the latest patch lgtm. Asokan, is that sufficient for your use cases? Arun? _Very_ minor, optional nit: {{s/MergeManager/MergeManagerImpl/}} and {{s/MergeManagerPlugin/MergeManager/}}. There's an argument to be made for doing the same with the {{ShuffleScheduler}} while we're at it, but neither of these are blocking, IMO. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556749#comment-13556749 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Chris, Thanks for your quick feedback. I looked at the patch. It has one minor nit. The {{createMergeManager}} method should take {{ShuffleConsumerPlugin.Context}} object. I will go over it one more time, work out the change, run tests, and post the patch shortly. Thanks. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4923) Add toString method to TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556772#comment-13556772 ] Alejandro Abdelnur commented on MAPREDUCE-4923: --- +1. Please open another jira for the annotation changes, thx Add toString method to TaggedInputSplit --- Key: MAPREDUCE-4923 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 1.1.1, 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Minor Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch Per MAPREDUCE-3678, map task logs now contain information about the input split being processed. Because TaggedInputSplit has no overridden toString method, nothing useful gets printed out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4929) mapreduce.task.timeout is ignored
[ https://issues.apache.org/jira/browse/MAPREDUCE-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556774#comment-13556774 ] Alejandro Abdelnur commented on MAPREDUCE-4929: --- It looks good to me, but before committing it, what is the precedence behavior in trunk? to make sure we have the same behavior. mapreduce.task.timeout is ignored - Key: MAPREDUCE-4929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4929 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4929-branch-1.patch In MR1, only mapred.task.timeout works. Both should be made to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4929) mapreduce.task.timeout is ignored
[ https://issues.apache.org/jira/browse/MAPREDUCE-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556795#comment-13556795 ] Sandy Ryza commented on MAPREDUCE-4929: --- It doesn't exactly appear that there is precedence behavior in trunk. When a Configuration#set() is called for a config with deprecations, all the corresponding configs are set. So if we came across mapred.task.timeout first in a config file, both mapred.task.timeout and mapreduce.task.timeout would get set. Then if we came across mapreduce.task.timeout afterwards, both would get overriden. mapreduce.task.timeout is ignored - Key: MAPREDUCE-4929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4929 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4929-branch-1.patch In MR1, only mapred.task.timeout works. Both should be made to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4923) Add toString method to TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556822#comment-13556822 ] Hudson commented on MAPREDUCE-4923: --- Integrated in Hadoop-trunk-Commit #3258 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3258/]) MAPREDUCE-4923. Add toString method to TaggedInputSplit. (sandyr via tucu) (Revision 1434993) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1434993 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/TaggedInputSplit.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplit.java Add toString method to TaggedInputSplit --- Key: MAPREDUCE-4923 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 1.1.1, 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Minor Fix For: 1.2.0, 2.0.3-alpha Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch Per MAPREDUCE-3678, map task logs now contain information about the input split being processed. Because TaggedInputSplit has no overridden toString method, nothing useful gets printed out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4923) Add toString method to TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4923: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha 1.2.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Sandy. Committed to trunk, branch-1 and branch-2. Add toString method to TaggedInputSplit --- Key: MAPREDUCE-4923 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4923 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 1.1.1, 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Minor Fix For: 1.2.0, 2.0.3-alpha Attachments: MAPREDUCE-4923-branch-1.patch, MAPREDUCE-4923.patch Per MAPREDUCE-3678, map task logs now contain information about the input split being processed. Because TaggedInputSplit has no overridden toString method, nothing useful gets printed out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4911: -- Attachment: MAPREDUCE-4911.patch Add JobConf to configuration about node-level aggregation. This patch also includes tests against the changes. Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4911: -- Target Version/s: trunk Affects Version/s: trunk Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4911 started by Tsuyoshi OZAWA. Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4911: -- Status: Patch Available (was: In Progress) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4863) Adding aggregationWaitMap for node-level combiner.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4863 started by Tsuyoshi OZAWA. Adding aggregationWaitMap for node-level combiner. -- Key: MAPREDUCE-4863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4863 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: 0002-Adding-aggregationWaitMap-for-node-level-combiner.patch To manage node/rack-level combining, MRAppMaster needs to have a management information about outputs of completed MapTasks to be aggregated. AggregationWaitMap is used so that MRAppMaster decides whether or not MapTasks start to combine local MapOutputFiles. AggregationWaitMap is a abstraction class of ConcurrentHashMapString, ArrayListTaskAttemptCompletionEvent. These Events are candidate files to be aggregated. When MapTasks are completed, MRAppMaster buffer TaskAttemptCompletionEvent into AggregationWaitMap to delay reducers' fethcing outputs from mappers until node-level aggregation are finished. After node-level aggreagtion, MRAppMaster write back mapAttemptCompletionEvents, to restart reducers' feching outputs from mappers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4864) Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4864 started by Tsuyoshi OZAWA. Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner. -- Key: MAPREDUCE-4864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4864 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2, tasktracker Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg-20130116.patch, 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg.patch MapTasks need to know whether or not they should start node-level combiner agaist outputs of mapper on their node. The new umbilical RPC, getAggregationTargets(), is used to get outputs to be aggregated on the node. The definition as follows: AggregationTarget getAggregationTargets(TaskAttemptID aggregator) throws IOException; AggregationTarget is a abstraction class of array of TaskAttemptID to be aggregated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4864) Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4864: -- Affects Version/s: trunk Adding new umbilical protocol RPC, getAggregationTargets(), for node-level combiner. -- Key: MAPREDUCE-4864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4864 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2, tasktracker Affects Versions: 3.0.0, trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg-20130116.patch, 0001-Adding-new-umbilical-protocol-RPC-getAggregationTarg.patch MapTasks need to know whether or not they should start node-level combiner agaist outputs of mapper on their node. The new umbilical RPC, getAggregationTargets(), is used to get outputs to be aggregated on the node. The definition as follows: AggregationTarget getAggregationTargets(TaskAttemptID aggregator) throws IOException; AggregationTarget is a abstraction class of array of TaskAttemptID to be aggregated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4910) Adding AggregationWaitMap to some components(MRAppMaster, TaskAttemptListener, JobImpl, MapTaskImpl).
[ https://issues.apache.org/jira/browse/MAPREDUCE-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4910 started by Tsuyoshi OZAWA. Adding AggregationWaitMap to some components(MRAppMaster, TaskAttemptListener, JobImpl, MapTaskImpl). - Key: MAPREDUCE-4910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4910 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2, task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: 0003-Adding-AggregationWaitMap-to-some-components-MRAppMa.patch, 0004-Add-AggregationWaitMap-to-some-components-MRAppMaste.patch To implement MR-4502, AggregationWaitMap need to be used by some components(MRAppMaster, TaskAttemptListener, JobImpl, MapTaskImpl). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4865) Launching node-level combiner at the end stage of MapTask and ignoring aggregated inputs at ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4865 started by Tsuyoshi OZAWA. Launching node-level combiner at the end stage of MapTask and ignoring aggregated inputs at ReduceTask -- Key: MAPREDUCE-4865 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4865 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: 0003-Changed-Mappers-and-Reducers-to-support-Node-level-aggregation-20130116.patch, 0004-Changed-Mappers-and-Reducers-to-support-Node-level-a.patch MapTask needs to start node-level aggregation against local outputs at the end stage of MapTask after calling getAggregationTargets(). This feature is implemented with Merger and CombinerRunner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-4502 started by Tsuyoshi OZAWA. Multi-level aggregation with combining the result of maps per node/rack --- Key: MAPREDUCE-4502 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: design_v2.pdf, MAPREDUCE-4525-pof.diff, speculative_draft.pdf The shuffle costs is expensive in Hadoop in spite of the existence of combiner, because the scope of combining is limited within only one MapTask. To solve this problem, it's a good way to aggregate the result of maps per node/rack by launch combiner. This JIRA is to implement the multi-level aggregation infrastructure, including combining per container(MAPREDUCE-3902 is related), coordinating containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556986#comment-13556986 ] Andrew Purtell commented on MAPREDUCE-4495: --- For the benefit of those coming to this issue now, was this moved to the Oozie project? Was the yapp proposal submitted to the incubator? What is the current status of this? Is the code/design on this issue orphaned/dead? Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf, yapp_proposal.txt It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557002#comment-13557002 ] Bo Wang commented on MAPREDUCE-4495: Hi Andrew, Thanks for looking at this issue. Currently this issue hasn't been moved to Oozie and I don't think the yapp proposal has been submitted to the incubator either. In terms of of implementation, a prototype based on the v2 design in the document is finished. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf, yapp_proposal.txt It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4911) Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557003#comment-13557003 ] Hadoop QA commented on MAPREDUCE-4911: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565437/MAPREDUCE-4911.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 12 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestYARNRunner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3247//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3247//console This message is automatically generated. Add node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Affects Versions: trunk Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4944) Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated MAPREDUCE-4944: - Target Version/s: 1.2.0 Affects Version/s: 1.1.1 Hadoop Flags: Incompatible change Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool - Key: MAPREDUCE-4944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4944 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.1.1 Reporter: Binglin Chang Priority: Minor support listClusterNodes and printNodeStatus in command line tool is useful for admin to create certain automation tools, this can also used by MAPREDUCE-4900 to get TaskTracker name so can set TT's slot dynamically -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4944) Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557014#comment-13557014 ] Binglin Chang commented on MAPREDUCE-4944: -- I look into the code and find some issue of this backport: hadoop-1.x have similar command -list-active-trackers and -list-blacklisted-trackers, which just print trackerNames, they use JobSubmissionProtocol to talk to JobTracker, and their is no more information we can expose expect adding another protocol method into JobSubmissionProtocol, this wil break compatibility which I think is unacceptable for JobSubmissionProtocol, which is used by normal clients. Another option is add this to AdminOperationsProtocol(like MAPREDUCE-4900), it is a admin protocol, which I think has less compatibility requirement, still I don't think it's worth to break compatibility. Another option is JMX, I haven't find out the proper way to write a command line tool using JMX, which needs to connect to some specific port in JobTracker host, this information is hard to get, because it is not in mapred-site.xml but depending some jvm config in hadoop-env.sh. Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool - Key: MAPREDUCE-4944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4944 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.1.1 Reporter: Binglin Chang Priority: Minor support listClusterNodes and printNodeStatus in command line tool is useful for admin to create certain automation tools, this can also used by MAPREDUCE-4900 to get TaskTracker name so can set TT's slot dynamically -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4944) Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557031#comment-13557031 ] Junping Du commented on MAPREDUCE-4944: --- I prefer to break AdminOpProtocol if we have to break compatibility. But another option could be add another protocol for slot get/set? I think manipulate map/red slots on TT is a very useful feature especially in sharing environment (like HBase region server live with TT), so may be deserved to add another protocol? Backport YARN-40 to support listClusterNodes and printNodeStatus in command line tool - Key: MAPREDUCE-4944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4944 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.1.1 Reporter: Binglin Chang Priority: Minor support listClusterNodes and printNodeStatus in command line tool is useful for admin to create certain automation tools, this can also used by MAPREDUCE-4900 to get TaskTracker name so can set TT's slot dynamically -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira