[jira] [Commented] (MAPREDUCE-577) Duplicate Mapper input when using StreamXmlRecordReader

2012-09-04 Thread Ming Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447568#comment-13447568
 ] 

Ming Jin commented on MAPREDUCE-577:


Hi everyone,

I found the exact same issue in Hadoop 
v1.0.3(http://fossies.org/dox/hadoop-1.0.3/StreamXmlRecordReader_8java_source.html).
 

Is there any plan to fix it in v1.0.3?

 Duplicate Mapper input when using StreamXmlRecordReader
 ---

 Key: MAPREDUCE-577
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-577
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
 Environment: HADOOP 0.17.0, Java 6.0
Reporter: David Campbell
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 0001-test-to-demonstrate-HADOOP-3484.patch, 
 0002-patch-for-HADOOP-3484.patch, 577.20S.patch, 577.patch, 577.v1.patch, 
 577.v2.patch, 577.v3.patch, 577.v4.patch, HADOOP-3484.combined.patch, 
 HADOOP-3484.try3.patch


 I have an XML file with 93626 rows.  A row is marked by row.../row.
 I've confirmed this with grep and the Grep example program included with 
 HADOOP.
 Here is the grep example output.  93626   row
 I've setup my job configuration as follows:   
 conf.set(stream.recordreader.class, 
 org.apache.hadoop.streaming.StreamXmlRecordReader);
 conf.set(stream.recordreader.begin, row);
 conf.set(stream.recordreader.end, /row);
 conf.setInputFormat(StreamInputFormat.class);
 I have a fairly simple test Mapper.
 Here's the map method.
   public void map(Text key, Text value, OutputCollectorText, IntWritable 
 output, Reporter reporter) throws IOException {
 try {
 output.collect(totalWord, one);
 if (key != null  key.toString().indexOf(01852) != -1) {
 output.collect(new Text(01852), one);
 }
 } catch (Exception ex) {
 Logger.getLogger(TestMapper.class.getName()).log(Level.SEVERE, 
 null, ex);
 System.out.println(value);
 }
 }
 For totalWord (TOTAL), I get:
 TOTAL 140850
 and for 01852 I get.
 01852 86
 There are 43 instances of 01852 in the file.
 I have the following setting in my config.  
conf.setNumMapTasks(1);
 I have a total of six machines in my cluster.
 If I run without this, the result is 12x the actual value, not 2x.
 Here's some info from the cluster web page.
 Maps  Reduces Total Submissions   Nodes   Map Task Capacity   Reduce 
 Task CapacityAvg. Tasks/Node
 0 0   1   6   12  12  4.00
 I've also noticed something really strange in the job's output.  It looks 
 like it's starting over or redoing things.
 This was run using all six nodes and no limitations on map or reduce tasks.  
 I haven't seen this behavior in any other case.
 08/06/03 10:50:35 INFO mapred.FileInputFormat: Total input paths to process : 
 1
 08/06/03 10:50:36 INFO mapred.JobClient: Running job: job_200806030916_0018
 08/06/03 10:50:37 INFO mapred.JobClient:  map 0% reduce 0%
 08/06/03 10:50:42 INFO mapred.JobClient:  map 2% reduce 0%
 08/06/03 10:50:45 INFO mapred.JobClient:  map 12% reduce 0%
 08/06/03 10:50:47 INFO mapred.JobClient:  map 31% reduce 0%
 08/06/03 10:50:48 INFO mapred.JobClient:  map 49% reduce 0%
 08/06/03 10:50:49 INFO mapred.JobClient:  map 68% reduce 0%
 08/06/03 10:50:50 INFO mapred.JobClient:  map 100% reduce 0%
 08/06/03 10:50:54 INFO mapred.JobClient:  map 87% reduce 0%
 08/06/03 10:50:55 INFO mapred.JobClient:  map 100% reduce 0%
 08/06/03 10:50:56 INFO mapred.JobClient:  map 0% reduce 0%
 08/06/03 10:51:00 INFO mapred.JobClient:  map 0% reduce 1%
 08/06/03 10:51:05 INFO mapred.JobClient:  map 28% reduce 2%
 08/06/03 10:51:07 INFO mapred.JobClient:  map 80% reduce 4%
 08/06/03 10:51:08 INFO mapred.JobClient:  map 100% reduce 4%
 08/06/03 10:51:09 INFO mapred.JobClient:  map 100% reduce 7%
 08/06/03 10:51:10 INFO mapred.JobClient:  map 90% reduce 9%
 08/06/03 10:51:11 INFO mapred.JobClient:  map 100% reduce 9%
 08/06/03 10:51:12 INFO mapred.JobClient:  map 100% reduce 11%
 08/06/03 10:51:13 INFO mapred.JobClient:  map 90% reduce 11%
 08/06/03 10:51:14 INFO mapred.JobClient:  map 97% reduce 11%
 08/06/03 10:51:15 INFO mapred.JobClient:  map 63% reduce 11%
 08/06/03 10:51:16 INFO mapred.JobClient:  map 48% reduce 11%
 08/06/03 10:51:17 INFO mapred.JobClient:  map 21% reduce 11%
 08/06/03 10:51:19 INFO mapred.JobClient:  map 0% reduce 11%
 08/06/03 10:51:20 INFO mapred.JobClient:  map 15% reduce 12%
 08/06/03 10:51:21 INFO mapred.JobClient:  map 27% reduce 13%
 08/06/03 10:51:22 INFO mapred.JobClient:  map 67% reduce 13%
 08/06/03 10:51:24 INFO mapred.JobClient:  map 22% reduce 16%
 08/06/03 10:51:25 INFO mapred.JobClient:  map 46% reduce 16%
 08/06/03 10:51:26 INFO mapred.JobClient:  map 70% reduce 16%
 08/06/03 

[jira] [Created] (MAPREDUCE-4631) Duplicate Mapper input when using StreamXmlRecordReader

2012-09-04 Thread Ming Jin (JIRA)
Ming Jin created MAPREDUCE-4631:
---

 Summary: Duplicate Mapper input when using StreamXmlRecordReader
 Key: MAPREDUCE-4631
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4631
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 1.0.3
Reporter: Ming Jin


This is the same defect as https://issues.apache.org/jira/browse/MAPREDUCE-577, 
which was fixed in v0.22.0.

So I'm wondering whether there is a plan to fix it in v1.0.3 as well? Or shall 
I move to v2.0.x?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4631) Duplicate Mapper input when using StreamXmlRecordReader

2012-09-04 Thread Ming Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Jin updated MAPREDUCE-4631:


Environment: Hadoop v1.0.3, JDK 6

 Duplicate Mapper input when using StreamXmlRecordReader
 ---

 Key: MAPREDUCE-4631
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4631
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 1.0.3
 Environment: Hadoop v1.0.3, JDK 6
Reporter: Ming Jin

 This is the same defect as 
 https://issues.apache.org/jira/browse/MAPREDUCE-577, which was fixed in 
 v0.22.0.
 So I'm wondering whether there is a plan to fix it in v1.0.3 as well? Or 
 shall I move to v2.0.x?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-09-04 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447581#comment-13447581
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:



_Asokan,_
My design has no conflict with your design. Below is a comment *I wrote you 4 
months ago*:
{quote}
_Your patch for the trunk is good enough for my needs. I can write my RDMA 
shuffle plugin based on either your patch or based on my patch. Hence, I am not 
planning to submit additional patch for the trunk on top of your patch. (I will 
only submit patch for 1.x)_
{quote}
(I have only now submitted a patch for the trunk, because of Arun's/Todd's 
comment on my 1.x patch)

The academic paper I pointed as *Reference* should not be confused with my 
plugin (Personally, I consider code in academic researches as POC and not as 
product).   The two relevant conclusions I take from this academic research are:
  1) Hadoop can benefit from RDMA shuffle and shuffle plugin-ability
  2) With fast shuffle, Hadoop can benefit from *additional* merge algorithms 
that are not practical with slow shuffle.  
That's all!  There is no request for Hadoop to keep its coupling of shuffle 
with merge.  
Again, I encourage your decoupling!  When your patch will be accepted to the 
trunk, I will adjust future versions of my plugin following your decoupling.

*My design should not disturb you in any way!*
When reviewing my design from ReduceTask.java point of view, *If you merely 
rename: ShuffleConsumerPlugin - ReduceFeederPlugin, then you could easily see 
that your decoupling design can peacefully come after my design.*
I believe the thing that disturbs you is that currently Hadoop uses 'shuffle' 
which invokes 'merge' while you want the opposite direction.  However, this is 
outside the scope of my patch.  Hence, you are welcome to build your patch on 
top of mine.  It is not really different than building your patch on top of the 
current trunk.
I will be more than happy to assist you with anything you might need, and I'll 
appreciate it if you gave me your blessing for my commit :-)


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin 
 TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, 
 mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2012-09-04 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447584#comment-13447584
 ] 

Johannes Zillmann commented on MAPREDUCE-207:
-

Currently in our hadoop applications we calculate the splits before we submit 
it to the client (then the client simply looks up the existing splits). We do 
that mainly to influence the reducer count base on the number of 
splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would 
be nice to have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after 
we know the splits (different join strategies for different data 
constellations).

Just dumping some ideas here...


 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-207.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-09-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447668#comment-13447668
 ] 

Hudson commented on MAPREDUCE-2786:
---

Integrated in Hadoop-Hdfs-trunk #1155 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1155/])
MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen 
Jeliazkov. (Revision 1380310)

 Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1380310
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java


 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.2.0-alpha

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE_2786.patch, 
 MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-09-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447699#comment-13447699
 ] 

Hudson commented on MAPREDUCE-2786:
---

Integrated in Hadoop-Mapreduce-trunk #1186 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1186/])
MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen 
Jeliazkov. (Revision 1380310)

 Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1380310
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java


 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.2.0-alpha

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE_2786.patch, 
 MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-09-04 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447718#comment-13447718
 ] 

Mariappan Asokan commented on MAPREDUCE-4049:
-

Hi Avner,
  Thanks for the clarification.  I am also back-porting MAPREDUCE-2454 to 
Hadoop 1.1(please see MAPREDUCE-4482.)  There also, I am trying to decouple 
merge related code from {{ReduceCopier}} class in {{ReduceTask.java}}.  I 
started my work initially on the trunk version because Arun asked me to do so.  
Once I finish refactoring {{ReduceCopier}} I will post a patch in 
MAPREDUCE-4482.  Please take a look at it when I am done.

Thanks for your offer to help me.

-- Asokan  

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin 
 TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, 
 mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1700:
-

Attachment: MAPREDUCE-1700.patch

New patch with a unit test. The test isn't integrated into the build yet, so 
you have to build the class-isolation-example module manually first. I've also 
removed the fictitious libs and instead used Guava as an example of an 
incompatibility.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4632) Make sure MapReduce declares correct set of dependencies

2012-09-04 Thread Tom White (JIRA)
Tom White created MAPREDUCE-4632:


 Summary: Make sure MapReduce declares correct set of dependencies
 Key: MAPREDUCE-4632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Tom White


This is the equivalent of HADOOP-8278 for MapReduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-09-04 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447876#comment-13447876
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


Cool.  I wish you good luck with your issues.  I am watching them for staying 
in the picture and for any question you may have.  
I understand you don't have obligation to my commit any more.  Right?
(Please let me know if you want the rename: ShuffleConsumerPlugin - 
ReduceFeederPlugin, or any other name)

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin 
 TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, 
 mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4582) [MAPREDUCE-3902] ScheduledRequests#remove should remove the elements from mapsHostMapping and mapsRackMapping

2012-09-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447882#comment-13447882
 ] 

Siddharth Seth commented on MAPREDUCE-4582:
---

bq. removing entries from attemptToLaunchRequestMap needs to happen after the 
ScheduledRequests.remove call.
handleTaStopRequest removes the task attempt from attemptToLaunchRequestMap 
before it attempts to remove the attempt from ScheduledRequests. By the 
previous comment, I meant this order of removal needs to be fixed.

 [MAPREDUCE-3902] ScheduledRequests#remove should remove the elements from 
 mapsHostMapping and mapsRackMapping
 -

 Key: MAPREDUCE-4582
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4582
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: MR-3902
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: MAPREDUCE-4582.2.patch, MAPREDUCE-4582.3.patch, 
 MAPREDUCE-4582.patch


 ScheduledRequests#remove only remove the specified TaskAttemptId from maps.
 It's inefficient, and the method should remove the elements from 
 mapsHostMapping and mapsRackMapping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4629) In DEBUG_MODE, JobHistory#directoryTime() returns incorrect time

2012-09-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447886#comment-13447886
 ] 

Karthik Kambatla commented on MAPREDUCE-4629:
-

From my conversation with Alejandro offline:

More context:
- The regular mode cleans up the history files after a month. The history 
filenames use /mm/dd.
- The debug mode cleans up the history files after 20 minutes. The history 
filenames (currently) use /hour/min.

The DEBUG_MODE overloads the regular mode. Instead, a better approach seems to 
be to append to the regular mode, instead of overloading it.

Also, the config parameter to turn the DEBUG_MODE on/off 
mapreduce.jobhistory.debug.mode doesn't have a default in mapred-default.xml 
and needs to be added.

 In DEBUG_MODE, JobHistory#directoryTime() returns incorrect time
 

 Key: MAPREDUCE-4629
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4629
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: MR-4629.patch


 The helper methods in JobHistory - directoryTime() and 
 timestampDirectoryComponent() - adjust the month field for readability (Jan 
 is 0 as per Calendar, 1 for us)
 In DEBUG_MODE, JobHistory uses hour instead of month. However, the adjustment 
 is still applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4629) In DEBUG_MODE, JobHistory#directoryTime() returns incorrect time

2012-09-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-4629:


Status: Open  (was: Patch Available)

 In DEBUG_MODE, JobHistory#directoryTime() returns incorrect time
 

 Key: MAPREDUCE-4629
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4629
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: MR-4629.patch


 The helper methods in JobHistory - directoryTime() and 
 timestampDirectoryComponent() - adjust the month field for readability (Jan 
 is 0 as per Calendar, 1 for us)
 In DEBUG_MODE, JobHistory uses hour instead of month. However, the adjustment 
 is still applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4619) [MAPREDUCE-3902] Change AMContainerMap to extend AbstractService

2012-09-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447891#comment-13447891
 ] 

Siddharth Seth commented on MAPREDUCE-4619:
---

Looks good. +1, committing to branch MR-3902. Thanks Tsuyoshi

 [MAPREDUCE-3902] Change AMContainerMap to extend AbstractService
 

 Key: MAPREDUCE-4619
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4619
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: MR-3902
Reporter: Siddharth Seth
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4619.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4619) [MAPREDUCE-3902] Change AMContainerMap to extend AbstractService

2012-09-04 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved MAPREDUCE-4619.
---

   Resolution: Fixed
Fix Version/s: MR-3902
 Hadoop Flags: Reviewed

 [MAPREDUCE-3902] Change AMContainerMap to extend AbstractService
 

 Key: MAPREDUCE-4619
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4619
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: MR-3902
Reporter: Siddharth Seth
Assignee: Tsuyoshi OZAWA
 Fix For: MR-3902

 Attachments: MAPREDUCE-4619.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1700:


Assignee: Arun C Murthy

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1700:


Assignee: Tom White  (was: Arun C Murthy)

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars

2012-09-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447908#comment-13447908
 ] 

Arun C Murthy commented on MAPREDUCE-4421:
--

Tucu - I think we are close, but I don't want MR AM or DistShell AM configs in 
yarn-site.xml. They belong in mapred-site.xml or distshell-site.xml etc. Makes 
sense?

 Remove dependency on deployed MR jars
 -

 Key: MAPREDUCE-4421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 Currently MR AM depends on MR jars being deployed on all nodes via implicit 
 dependency on YARN_APPLICATION_CLASSPATH. 
 We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, 
 probably, just rely on adding a shaded MR jar along with job.jar to the 
 dist-cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447911#comment-13447911
 ] 

Arun C Murthy commented on MAPREDUCE-1700:
--

Tom, I don't understand specific advantages of OSGI or Felix, so please pardon 
some of my questions.

However, with MR being an application in YARN (see MAPREDUCE-4421) we can just 
add user jars in front of the classpath for the tasks (we already allow it). 
This isn't the same Map/Reduce child inherits the TT classpath problem in MR1 
(actually even in MR1 you can put child jars ahead in the classpath for a long 
while now). Given this, do we need to bring in OSGI or Felix, what do else do 
they provide? Thanks.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-09-04 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447916#comment-13447916
 ] 

Avner BenHanoch commented on MAPREDUCE-4049:


[correcting typo:]

Cool. I wish you good luck with your issues. I am watching them for staying in 
the picture and for any question you may have. 
I understand you don't have objection to my commit any more. Right?
(Please let me know if you want the rename: ShuffleConsumerPlugin - 
ReduceFeederPlugin, or any other name)

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin 
 TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, 
 mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-09-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447943#comment-13447943
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
---

Thanks for the help with this JIRA.
bq. because MRAppMaster in container-reuse implementation has the feature to 
monitor whether the running tasks on the containers are the last task at a 
machine or not, for the purpose of exiting JVMs on containers, as you know.
That will definitely be simpler to achieve with the container-reuse AM, with 
nodes already tracking container information. Last task on a node can be 
figured out relatively easily by the scheduler. It is, however, also possible 
with the current AM, and several bits like the decision on when to run the 
combiner - should be a straight forward port to the reuse-AM. IAC, it'll be 
good to get the re-use AM into trunk fast. Looking forward to the updates on 
4502 and 4525. 


 MR AM should reuse containers for map tasks, there-by allowing fine-grained 
 control on num-maps for users without need for CombineFileInputFormat etc.
 --

 Key: MAPREDUCE-3902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, mrv2
Reporter: Arun C Murthy
Assignee: Siddharth Seth
 Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch


 The MR AM is now in a great position to reuse containers across (map) tasks. 
 This is something similar to JVM re-use we had in 0.20.x, but in a 
 significantly better manner:
 # Consider data-locality when re-using containers
 # Consider the new shuffle - ensure that reduces fetch output of the whole 
 container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-09-04 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447981#comment-13447981
 ] 

Plamen Jeliazkov commented on MAPREDUCE-4491:
-

Great work, Benoy!

This looks like a very neat feature to add. I am all in support. I like your 
similarity with the compressor / decompressor interfaces and the ease of the 
implementation to plug-in any keystores.

I am in the midst of applying your patches and doing a small test locally and 
will reply back with any results I find.

 Encryption and Key Protection
 -

 Key: MAPREDUCE-4491
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: documentation, security, task-controller, tasktracker
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf


 When dealing with sensitive data, it is required to keep the data encrypted 
 wherever it is stored. Common use case is to pull encrypted data out of a 
 datasource and store in HDFS for analysis. The keys are stored in an external 
 keystore. 
 The feature adds a customizable framework to integrate different types of 
 keystores, support for Java KeyStore, read keys from keystores, and transport 
 keys from JobClient to Tasks.
 The feature adds PGP encryption as a codec and additional utilities to 
 perform encryption related steps.
 The design document is attached. It explains the requirement, design and use 
 cases.
 Kindly review and comment. Collaboration is very much welcome.
 I have a tested patch for this for 1.1 and will upload it soon as an initial 
 work for further refinement.
 Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447996#comment-13447996
 ] 

Steve Loughran commented on MAPREDUCE-1700:
---

Arun, 

I see where Tom is coming from. Irrespective of how the Hadoop services are 
deployed, you need to be able to do things like submit jobs from OSGi 
containers (e.g Spring  others) which is what this patch appears to offer. And 
if Oracle finally commit to OSGi now that Java 8 is being redefined, it'd be 
good from all clients.

I would like to see a way to support this which doesn't put an OSGi JAR on the 
classpath of everything.

Tom -is there a way to abstract away OSGi support so that it's optional, even 
if its a subclass of JobSubmitter? An 
{{org.apache.hadoop.mapreduce.osgi.OSGiJobSubmitter}} could override some new 
specific protect methods to enable this.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-09-04 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448021#comment-13448021
 ] 

Mariappan Asokan commented on MAPREDUCE-4049:
-

Hi Avner,
  You agree that {{ShuffleConsumerPlugin}} should be decoupled from merge.  In 
that case, it should have nothing to do with {{RawKeyValueIterator}}.  However 
in its current form the {{run()}} method in {{ShuffleConsumerPlugin}} returns 
{{RawkKeyValueIterator}}.  From a design point, my objection is that the 
abstraction {{ShuffleConsumerPlugin}} is not capturing the concept it is 
intended for namely moving just raw bytes from map hosts to reduce hosts and 
nothing more.

I would ask you to go back to my original suggestion to make {{ShuffleRunner}} 
pluggable.  We can add an {{initialize()}} method there.

-- Asokan


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin 
 TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, 
 mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4633) history server doesn't set permissions on all subdirs

2012-09-04 Thread Thomas Graves (JIRA)
Thomas Graves created MAPREDUCE-4633:


 Summary: history server doesn't set permissions on all subdirs 
 Key: MAPREDUCE-4633
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4633
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Thomas Graves


The job history server creates a bunch of subdirectories under the done 
directory.  They are like 2012/09/03/00.  It only sets the permissions on 
the last one, ie 00 to 770.So the 2012/09/03 aren't explicitly set so 
if the umask is more restrictive, they won't be set as it expects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4628) Use Builder to get RPC server in Map/Reduce

2012-09-04 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448112#comment-13448112
 ] 

Suresh Srinivas commented on MAPREDUCE-4628:


+1 for the patch.

 Use Builder to get RPC server in Map/Reduce
 ---

 Key: MAPREDUCE-4628
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4628
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Minor
 Attachments: MAPREDUCE-4628.patch


 In HADOOP-8736, a Builder is introduced to replace all the getServer() 
 variants. This JIRA is the change in Map/Reduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4414) Add main methods to JobConf and YarnConfiguration, for debug purposes

2012-09-04 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov reassigned MAPREDUCE-4414:
---

Assignee: Plamen Jeliazkov  (was: Linden Hillenbrand)

 Add main methods to JobConf and YarnConfiguration, for debug purposes
 -

 Key: MAPREDUCE-4414
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4414
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Plamen Jeliazkov
  Labels: newbie

 Just like Configuration has a main() func that dumps XML out for debug 
 purposes, we should have a similar function under the JobConf and 
 YarnConfiguration classes that do the same. This is useful in testing out app 
 classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4414) Add main methods to JobConf and YarnConfiguration, for debug purposes

2012-09-04 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated MAPREDUCE-4414:


Attachment: MAPREDUCE-4144.patch

 Add main methods to JobConf and YarnConfiguration, for debug purposes
 -

 Key: MAPREDUCE-4414
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4414
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Plamen Jeliazkov
  Labels: newbie
 Attachments: MAPREDUCE-4144.patch


 Just like Configuration has a main() func that dumps XML out for debug 
 purposes, we should have a similar function under the JobConf and 
 YarnConfiguration classes that do the same. This is useful in testing out app 
 classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4414) Add main methods to JobConf and YarnConfiguration, for debug purposes

2012-09-04 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated MAPREDUCE-4414:


Status: Patch Available  (was: Open)

 Add main methods to JobConf and YarnConfiguration, for debug purposes
 -

 Key: MAPREDUCE-4414
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4414
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Plamen Jeliazkov
  Labels: newbie
 Attachments: MAPREDUCE-4144.patch


 Just like Configuration has a main() func that dumps XML out for debug 
 purposes, we should have a similar function under the JobConf and 
 YarnConfiguration classes that do the same. This is useful in testing out app 
 classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4634) Change TestUmbilicalProtocolWithJobToken to use RPC builder

2012-09-04 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created MAPREDUCE-4634:
--

 Summary: Change TestUmbilicalProtocolWithJobToken to use RPC 
builder
 Key: MAPREDUCE-4634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Brandon Li
Priority: Minor


In HADOOP-8736, a Builder is introduced to replace all the getServer() 
variants. This JIRA is the change in MapReduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448178#comment-13448178
 ] 

Scott Carey commented on MAPREDUCE-1700:


Putting user jars before/after the application dependencies doesn't actually 
solve the problem.  

* The conflict might require a user jar that is not compatible with one needed 
by the framework, either order breaks something
* The user might override a system jar and alter functionality in a way that 
breaks the framework, or subverts security.

Both the host container and the user code need to be able to be certain of what 
code they are executing without stepping on each other's toes.  This is _not 
possible_ with one classpath.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4414) Add main methods to JobConf and YarnConfiguration, for debug purposes

2012-09-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448190#comment-13448190
 ] 

Hadoop QA commented on MAPREDUCE-4414:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543758/MAPREDUCE-4144.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2812//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2812//console

This message is automatically generated.

 Add main methods to JobConf and YarnConfiguration, for debug purposes
 -

 Key: MAPREDUCE-4414
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4414
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Plamen Jeliazkov
  Labels: newbie
 Attachments: MAPREDUCE-4144.patch


 Just like Configuration has a main() func that dumps XML out for debug 
 purposes, we should have a similar function under the JobConf and 
 YarnConfiguration classes that do the same. This is useful in testing out app 
 classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-09-04 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448205#comment-13448205
 ] 

Scott Carey commented on MAPREDUCE-1700:


If we are lucky, projecct jigsaw will be pulled back into Java 8.  According 
to: http://mreinhold.org/blog/late-for-the-train-qa it has not yet been decided.

If it is brought back in, then perhaps we can wait until Java has a module 
system 1 to 1.5 years from now.  If not, I do not think Hadoop can wait until 
Java 9, sometime 2015 to 2016 ish.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4634) Change TestUmbilicalProtocolWithJobToken to use RPC builder

2012-09-04 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated MAPREDUCE-4634:
--

Attachment: MAPREDUCE-4634.patch

 Change TestUmbilicalProtocolWithJobToken to use RPC builder
 ---

 Key: MAPREDUCE-4634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Brandon Li
Priority: Minor
 Attachments: MAPREDUCE-4634.patch


 In HADOOP-8736, a Builder is introduced to replace all the getServer() 
 variants. This JIRA is the change in MapReduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4634) Change TestUmbilicalProtocolWithJobToken to use RPC builder

2012-09-04 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448223#comment-13448223
 ] 

Brandon Li commented on MAPREDUCE-4634:
---

The patch part of the patch in YARN-84. Still uploaded it here just to save a 
record for the MapReduce change in JIRA system.

 Change TestUmbilicalProtocolWithJobToken to use RPC builder
 ---

 Key: MAPREDUCE-4634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Brandon Li
Priority: Minor
 Attachments: MAPREDUCE-4634.patch


 In HADOOP-8736, a Builder is introduced to replace all the getServer() 
 variants. This JIRA is the change in MapReduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4634) Change TestUmbilicalProtocolWithJobToken to use RPC builder

2012-09-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-4634.


Resolution: Duplicate

Thanks Brandon. Closing this as duplicate.

 Change TestUmbilicalProtocolWithJobToken to use RPC builder
 ---

 Key: MAPREDUCE-4634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Brandon Li
Priority: Minor
 Attachments: MAPREDUCE-4634.patch


 In HADOOP-8736, a Builder is introduced to replace all the getServer() 
 variants. This JIRA is the change in MapReduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4635) MR side of YARN-83. Changing package of YarnClient

2012-09-04 Thread Bikas Saha (JIRA)
Bikas Saha created MAPREDUCE-4635:
-

 Summary: MR side of YARN-83. Changing package of YarnClient
 Key: MAPREDUCE-4635
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4635
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4635) MR side of YARN-83. Changing package of YarnClient

2012-09-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-4635:
--

Attachment: YARN-83.3.MR.patch

Attaching MR patch.

 MR side of YARN-83. Changing package of YarnClient
 --

 Key: MAPREDUCE-4635
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4635
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-83.3.MR.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira