[jira] [Commented] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493720#comment-13493720
 ] 

Hadoop QA commented on MAPREDUCE-4783:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552762/mapreduce-4783.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3002//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3002//console

This message is automatically generated.

> data_join mavenization broke the mr1 build
> --
>
> Key: MAPREDUCE-4783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: mapreduce-4783.txt
>
>
> MR-4238 didn't update build.xml and forgot to nuke the old data_join 
> directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Attachment: mapreduce-4783.txt

Patch attached.

> data_join mavenization broke the mr1 build
> --
>
> Key: MAPREDUCE-4783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: mapreduce-4783.txt
>
>
> MR-4238 didn't update build.xml and forgot to nuke the old data_join 
> directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Status: Patch Available  (was: Open)

> data_join mavenization broke the mr1 build
> --
>
> Key: MAPREDUCE-4783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: mapreduce-4783.txt
>
>
> MR-4238 didn't update build.xml and forgot to nuke the old data_join 
> directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)
Eli Collins created MAPREDUCE-4783:
--

 Summary: data_join mavenization broke the mr1 build
 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


MR-4238 didn't update build.xml and forgot to nuke the old data_join directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Attachment: MAPREDUCE-4751-20121108.txt

Here's a first attempt for the patch. Very raw, no tests yet. Want to be sure 
that I am understanding your comments correctly.

Bobby/Ravi/Jason, can you please have a quick look at it please? Tx.

I get a feeling we need to do something similar in Job also. Even though it 
will not be the current bug assuming TaskImpl itself is stuck today.

> AM stuck in KILL_WAIT for days
> --
>
> Key: MAPREDUCE-4751
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Ravi Prakash
>Assignee: Vinod Kumar Vavilapalli
>     Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg
>
>
> We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
> as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
> few maps running. All these maps were scheduled on nodes which are now in the 
> RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-4782:
--

Target Version/s: 1.1.1, 0.23.5  (was: 0.23.5)

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493669#comment-13493669
 ] 

Matt Foley commented on MAPREDUCE-4782:
---

Nasty.  Could you please port to branch-1 and I'll include it in the next 
release?

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493658#comment-13493658
 ] 

Alejandro Abdelnur commented on MAPREDUCE-2454:
---

Asokan, if I understood you correctly you were working in a new testcase. This 
is not in the latest patch, correct? When you upload the patch with the new 
testcase please fix the following nit:

*Fetcher.java has 2 unused imports*

import java.io.InputStream;
import java.io.OutputStream;

Then, IMO we are good to go.

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493584#comment-13493584
 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:


The patch looks good to me I am +1 on it, but I added in the test, so if 
someone else could take a look I would appreciate it.

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493579#comment-13493579
 ] 

Hadoop QA commented on MAPREDUCE-4666:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552712/MAPREDUCE-4666.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3001//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3001//console

This message is automatically generated.

> JVM metrics for history server
> --
>
> Key: MAPREDUCE-4666
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.0.2-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-4666.patch
>
>
> It would be nice if the job history server provided the same JVM metrics via 
> metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493571#comment-13493571
 ] 

Hadoop QA commented on MAPREDUCE-4782:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552709/MR-4782.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapred.TestClusterMRNotification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//console

This message is automatically generated.

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4666:
--

Assignee: Jason Lowe
Target Version/s: 2.0.3-alpha, 0.23.5
  Status: Patch Available  (was: Open)

> JVM metrics for history server
> --
>
> Key: MAPREDUCE-4666
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.0.2-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-4666.patch
>
>
> It would be nice if the job history server provided the same JVM metrics via 
> metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4666:
--

Attachment: MAPREDUCE-4666.patch

Patch to add metrics2 JvmMetrics to the history server.  I manually tested this 
to verify the metrics could be used via a sink configured in 
hadoop-metrics2.properties.  I also verified the JvmMetrics bean shows up via 
the JMX web service.

> JVM metrics for history server
> --
>
> Key: MAPREDUCE-4666
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.0.2-alpha
>Reporter: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-4666.patch
>
>
> It would be nice if the job history server provided the same JVM metrics via 
> metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

 Target Version/s: 0.23.5
Affects Version/s: 0.23.0
   1.0.0
   2.0.0-alpha
   Status: Patch Available  (was: Open)

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.0-alpha, 1.0.0, 0.23.0, 0.22.0, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Attachment: MR-4782.txt

I was able to reproduce the issue, and I have updated the test case to 
reproduce it as well.  The original test case did not check the last split, I 
don't know why.  I also found out that this exists in branch-1 as well. 

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Mark Fuhs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Fuhs updated MAPREDUCE-4782:
-

Attachment: MAPREDUCE-4782.patch

I confess I'm not terribly familiar with git, so this is just a "git diff".

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
> Attachments: MAPREDUCE-4782.patch
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Priority: Critical  (was: Major)

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493465#comment-13493465
 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:


Marked this a critical as data loss is serious.  Mark can you post your patch?

> NLineInputFormat skips first line of last InputSplit
> 
>
> Key: MAPREDUCE-4782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.22.0, trunk
> Environment: job.setMapperClass(Mapper.class);  // just pass text 
> lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>Reporter: Mark Fuhs
>Priority: Critical
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
> generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
> begin and length fields of the FileSplit are constructed differently for the 
> first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, 
> but the creation does not respect the difference of how the first vs. the 
> rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've 
> created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Mark Fuhs (JIRA)
Mark Fuhs created MAPREDUCE-4782:


 Summary: NLineInputFormat skips first line of last InputSplit
 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
Reporter: Mark Fuhs


NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
begin and length fields of the FileSplit are constructed differently for the 
first FileSplit vs. the rest.

After looping through all lines of a file, the final FileSplit is created, but 
the creation does not respect the difference of how the first vs. the rest of 
the FileSplits are created.

This results in the first line of the final InputSplit being skipped. I've 
created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2012-11-08 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493443#comment-13493443
 ] 

Philip Zeyliger commented on MAPREDUCE-4469:


If you're looking for a resource usage of a process and its children, look at 
{{man getrusage}} which includes a flag to get the CPU usage of the children.  
Mind you, you'd need native code to get at it.

> Resource calculation in child tasks is CPU-heavy
> 
>
> Key: MAPREDUCE-4469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: performance, task
>Affects Versions: 1.0.3
>Reporter: Todd Lipcon
>Assignee: Ahmed Radwan
> Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
> MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch
>
>
> In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
> each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
> that it's spending a lot of time looping through all the files in /proc to 
> calculate resource usage.
> As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
> within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
> runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493406#comment-13493406
 ] 

Hadoop QA commented on MAPREDUCE-2454:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552689/mapreduce-2454.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2999//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2999//console

This message is automatically generated.

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Patch Available  (was: Open)

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Attachment: mapreduce-2454.patch

Hi Alejandro,
  Thanks for the feedback.  I changed {{MapOutputCollector}} to 
{{PostMapProcessor}} so that there is only one interface.  I also made the 
other changes you suggested.  I am uploading the new patch.  Please take a look 
at it.

-- Asokan

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Open  (was: Patch Available)

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4774) repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR

2012-11-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493279#comment-13493279
 ] 

Jason Lowe commented on MAPREDUCE-4774:
---

Thanks for the analysis, Ivan!  JobImpl's state machine is missing a number of 
events in the FAILED state.  Due to the asynchronous nature of the job, task, 
and task attempt state machines, it is possible for tasks and task attempts to 
complete even though the job overall has decided to fail for other reasons.  
Therefore we need to ignore these additional events in the FAILED state to 
avoid their asynchronous arrival from knocking us out of the FAILED state and 
into the ERROR state.

JOB_TASK_COMPLETED
JOB_TASK_ATTEMPT_COMPLETED
JOB_MAP_TASK_RESCHEDULED


> repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR
> -
>
> Key: MAPREDUCE-4774
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ivan A. Veselovsky
>
> The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
>  fails in mapred build (e.g. see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
>  , or 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
> The test aims to check Job status notifications received through HTTP 
> Servlet. It runs 3 jobs: successfull, killed, and failed. 
> The test expects the servlet to receive some expected notifications in some 
> expected order. It also tries to test the retry-on-failure notification 
> functionality, so on each 1st notification the servlet answers "400 forcing 
> error", and on each 2nd notification attempt it answers "ok". 
> In general, the test fails because the actual number and/or type of the 
> notifications differs from the expected.
> Investigation shows that actual root cause of the problem is an incorrect job 
> state transition: the 3rd job mapred task fails (by intentionally thrown  
> RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
> changes from RUNNING to FAILED.
> At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
> method 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
>  TaskAttemptCompletionEventStatus)), and this event gets processed in 
> AsyncDispatcher, but this transition is impossible according to the event 
> transition map (see 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
> This causes the following exception to be thrown upon the event processing:
> 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_ATTEMPT_COMPLETED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
> at java.lang.Thread.run(Thread.java:662) 
> So, the job gets into state "INTERNAL_ERROR", the job end notification like 
> this is sent:
> http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002&jobStatus=ERROR
>  
> (here we can see "ERROR" status instead of "FAILED")
> After that the notification servlet receives either only "ERROR" 
> notification, or one more notification "ERROR" after "FAILED", which finally 
> causes the test to fail. (Some variation in the test behavior caused by 
> racing conditions because there are many asynchronous processings there, and 
> the test is flaky, in fact).
> In any way, it looks like the root cause of the problem is the possibility of 
> the forbidden transition "Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
> FAILED". 
> Need an expert advice on how 

[jira] [Updated] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4772:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review Jon,

I checked the code into trunk, branch-2, and branch-0.23

> Fetch failures can take way too long for a map to be restarted
> --
>
> Key: MAPREDUCE-4772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.4
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
> Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt
>
>
> In one particular case we saw a NM go down at just the right time, that most 
> of the reducers got the output of the map tasks, but not all of them.
> The ones that failed to get the output reported to the AM rather quickly that 
> they could not fetch from the NM, but because the other reducers were still 
> running the AM would not relaunch the map task because there weren't more 
> than 50% of the running reducers that had reported fetch failures.  Then 
> because of the exponential back-off for fetches on the reducers it took until 
> 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
> report in again. At that point the other reducers had finished and the job 
> relaunched the map task.  If the reducers had still been running at 1:45 I 
> have no idea how long it would have taken for each of the tasks to get to 30 
> fetch failures.
> We need to trigger the map based off of percentage of reducers shuffling, not 
> percentage of reducers running, we also need to have a maximum limit of the 
> back off, so that we don't ever have the reducer waiting for days to try and 
> fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493253#comment-13493253
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-trunk-Commit #2979 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2979/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


> Fetch failures can take way too long for a map to be restarted
> --
>
> Key: MAPREDUCE-4772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.4
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt
>
>
> In one particular case we saw a NM go down at just the right time, that most 
> of the reducers got the output of the map tasks, but not all of them.
> The ones that failed to get the output reported to the AM rather quickly that 
> they could not fetch from the NM, but because the other reducers were still 
> running the AM would not relaunch the map task because there weren't more 
> than 50% of the running reducers that had reported fetch failures.  Then 
> because of the exponential back-off for fetches on the reducers it took until 
> 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
> report in again. At that point the other reducers had finished and the job 
> relaunched the map task.  If the reducers had still been running at 1:45 I 
> have no idea how long it would have taken for each of the tasks to get to 30 
> fetch failures.
> We need to trigger the map based off of percentage of reducers shuffling, not 
> percentage of reducers running, we also need to have a maximum limit of the 
> back off, so that we don't ever have the reducer waiting for days to try and 
> fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493237#comment-13493237
 ] 

Hadoop QA commented on MAPREDUCE-4781:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552663/MAPREDUCE-4781-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2998//console

This message is automatically generated.

> Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
> -
>
> Key: MAPREDUCE-4781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17_64 on x86
>Reporter: Amir Sanjar
> Attachments: MAPREDUCE-4781-branch-1.patch
>
>
> Problem:
> JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
> Migrate the testcase to JUnit4
> How:
> Remove extends TestCase"
> SetUp and TearDown methods
> @Override
> protected void setUp() throws Exception { }
> replaced by:
> @Before
> public void setUp() throws Exception { }
> Same for tearDown():
> @Override
> protected void tearDown() throws Exception { }
> replaced by
> @After
> public void tearDown() throws Exception { }
> Imports
> The imports has to be reorganized:
> Remove import junit.framework.TestCase;
> Add org.junit.*; or import org.junit.After; import org.junit.Before;   
> import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4781:
---

Status: Patch Available  (was: Open)

> Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
> -
>
> Key: MAPREDUCE-4781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17_64 on x86
>Reporter: Amir Sanjar
> Attachments: MAPREDUCE-4781-branch-1.patch
>
>
> Problem:
> JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
> Migrate the testcase to JUnit4
> How:
> Remove extends TestCase"
> SetUp and TearDown methods
> @Override
> protected void setUp() throws Exception { }
> replaced by:
> @Before
> public void setUp() throws Exception { }
> Same for tearDown():
> @Override
> protected void tearDown() throws Exception { }
> replaced by
> @After
> public void tearDown() throws Exception { }
> Imports
> The imports has to be reorganized:
> Remove import junit.framework.TestCase;
> Add org.junit.*; or import org.junit.After; import org.junit.Before;   
> import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4781:
---

Attachment: MAPREDUCE-4781-branch-1.patch

> Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
> -
>
> Key: MAPREDUCE-4781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17_64 on x86
>Reporter: Amir Sanjar
> Attachments: MAPREDUCE-4781-branch-1.patch
>
>
> Problem:
> JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
> Migrate the testcase to JUnit4
> How:
> Remove extends TestCase"
> SetUp and TearDown methods
> @Override
> protected void setUp() throws Exception { }
> replaced by:
> @Before
> public void setUp() throws Exception { }
> Same for tearDown():
> @Override
> protected void tearDown() throws Exception { }
> replaced by
> @After
> public void tearDown() throws Exception { }
> Imports
> The imports has to be reorganized:
> Remove import junit.framework.TestCase;
> Add org.junit.*; or import org.junit.After; import org.junit.Before;   
> import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493221#comment-13493221
 ] 

Hadoop QA commented on MAPREDUCE-4779:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552662/MAPREDUCE-4779-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2997//console

This message is automatically generated.

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779-branch-1.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)
Amir Sanjar created MAPREDUCE-4781:
--

 Summary: Unit test TestKerberosAuthenticationHandler fails with 
ant 1.8.3+
 Key: MAPREDUCE-4781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17_64 on x86
Reporter: Amir Sanjar


Problem:
JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not JUnit4:
Solution:
Migrate the testcase to JUnit4
How:

Remove extends TestCase"

SetUp and TearDown methods

@Override
protected void setUp() throws Exception { }

replaced by:

@Before
public void setUp() throws Exception { }

Same for tearDown():

@Override
protected void tearDown() throws Exception { }

replaced by

@After
public void tearDown() throws Exception { }

Imports

The imports has to be reorganized:
Remove import junit.framework.TestCase;
Add org.junit.*; or import org.junit.After; import org.junit.Before;   
import org.junit.Test;


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Attachment: MAPREDUCE-4779-branch-1.patch

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779-branch-1.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Status: Patch Available  (was: Open)

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779-branch-1.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Attachment: (was: MAPREDUCE-4779.patch)

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779-branch-1.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4777) In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec

2012-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493181#comment-13493181
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Mapreduce-trunk #1250 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1250/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


> In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
> -
>
> Key: MAPREDUCE-4777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Minor
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-8969.patch
>
>
> The file used to test reading is expected to have been created by the file 
> used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort

2012-11-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated MAPREDUCE-3235:
---

Attachment: (was: hashed-sort-MAPREDUCE-3235.patch)

> Improve CPU cache behavior in map side sort
> ---
>
> Key: MAPREDUCE-3235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance, task
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hashed-sort-MAPREDUCE-3235.patch, map_sort_perf.diff, 
> mr-3235-poc.txt
>
>
> When running oprofile on a terasort workload, I noticed that a large amount 
> of CPU usage was going to MapTask$MapOutputBuffer.compare. Upon disassembling 
> this and looking at cycle counters, most of the cycles were going to memory 
> loads dereferencing into the array of key-value data -- implying expensive 
> cache misses. This can be avoided as follows:
> - rather than simply swapping indexes into the kv array, swap the entire meta 
> entries in the meta array. Swapping 16 bytes is only negligibly slower than 
> swapping 4 bytes. This requires adding the value-length into the meta array, 
> since we used to rely on the previous-in-the-array meta entry to determine 
> this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
> - introduce an interface which allows key types to provide a 4-byte 
> comparison proxy. For string keys, this can simply be the first 4 bytes of 
> the string. The idea is that, if stringCompare(key1.proxy(), key2.proxy()) != 
> 0, then compare(key1, key2) should have the same result. If the proxies are 
> equal, the normal comparison method is used. We then include the 4-byte proxy 
> as part of the metadata entry, so that for many cases the indirection into 
> the data buffer can be avoided.
> On a terasort benchmark, these optimizations plus an optimization to 
> WritableComparator.compareBytes dropped the aggregate mapside CPU millis by 
> 40%, and the compare() routine mostly dropped off the oprofile results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort

2012-11-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated MAPREDUCE-3235:
---

Attachment: hashed-sort-MAPREDUCE-3235.patch

Update BinaryComparable.getPrefix() to always generated positive integers.

> Improve CPU cache behavior in map side sort
> ---
>
> Key: MAPREDUCE-3235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance, task
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hashed-sort-MAPREDUCE-3235.patch, map_sort_perf.diff, 
> mr-3235-poc.txt
>
>
> When running oprofile on a terasort workload, I noticed that a large amount 
> of CPU usage was going to MapTask$MapOutputBuffer.compare. Upon disassembling 
> this and looking at cycle counters, most of the cycles were going to memory 
> loads dereferencing into the array of key-value data -- implying expensive 
> cache misses. This can be avoided as follows:
> - rather than simply swapping indexes into the kv array, swap the entire meta 
> entries in the meta array. Swapping 16 bytes is only negligibly slower than 
> swapping 4 bytes. This requires adding the value-length into the meta array, 
> since we used to rely on the previous-in-the-array meta entry to determine 
> this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
> - introduce an interface which allows key types to provide a 4-byte 
> comparison proxy. For string keys, this can simply be the first 4 bytes of 
> the string. The idea is that, if stringCompare(key1.proxy(), key2.proxy()) != 
> 0, then compare(key1, key2) should have the same result. If the proxies are 
> equal, the normal comparison method is used. We then include the 4-byte proxy 
> as part of the metadata entry, so that for many cases the indirection into 
> the data buffer can be avoided.
> On a terasort benchmark, these optimizations plus an optimization to 
> WritableComparator.compareBytes dropped the aggregate mapside CPU millis by 
> 40%, and the compare() routine mostly dropped off the oprofile results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Status: Open  (was: Patch Available)

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+

2012-11-08 Thread Amir Sanjar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493164#comment-13493164
 ] 

Amir Sanjar commented on MAPREDUCE-4779:


my bad, testcase is no longer in the trunk.. fixing patch naming for release 
1.0.3  

> Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
> ---
>
> Key: MAPREDUCE-4779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Fedora 17 on x86_64
>Reporter: Amir Sanjar
>Assignee: Amir Sanjar
> Attachments: MAPREDUCE-4779.patch
>
>
> Problem:
>   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
> JUnit4:
> Solution:
>  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4777) In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec

2012-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493150#comment-13493150
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Hdfs-trunk #1220 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1220/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


> In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
> -
>
> Key: MAPREDUCE-4777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Minor
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-8969.patch
>
>
> The file used to test reading is expected to have been created by the file 
> used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4777) In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec

2012-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493100#comment-13493100
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Yarn-trunk #30 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/30/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


> In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
> -
>
> Key: MAPREDUCE-4777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Minor
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-8969.patch
>
>
> The file used to test reading is expected to have been created by the file 
> used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira