date:20121108


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493100#comment-13493100
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Yarn-trunk #30 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/30/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


 In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
 -

 Key: MAPREDUCE-4777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8969.patch


 The file used to test reading is expected to have been created by the file 
 used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4777) In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493150#comment-13493150
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Hdfs-trunk #1220 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1220/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


 In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
 -

 Key: MAPREDUCE-4777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8969.patch


 The file used to test reading is expected to have been created by the file 
 used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493164#comment-13493164
 ] 

Amir Sanjar commented on MAPREDUCE-4779:


my bad, testcase is no longer in the trunk.. fixing patch naming for release 
1.0.3  

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Status: Open  (was: Patch Available)

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort

2012-11-08 Thread Gopal V (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated MAPREDUCE-3235:
---

Attachment: hashed-sort-MAPREDUCE-3235.patch

Update BinaryComparable.getPrefix() to always generated positive integers.

Improve CPU cache behavior in map side sort
---

Key: MAPREDUCE-3235
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: performance, task
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Attachments: hashed-sort-MAPREDUCE-3235.patch, map_sort_perf.diff,
mr-3235-poc.txt

When running oprofile on a terasort workload, I noticed that a large amount
of CPU usage was going to MapTask$MapOutputBuffer.compare. Upon disassembling
this and looking at cycle counters, most of the cycles were going to memory
loads dereferencing into the array of key-value data -- implying expensive
cache misses. This can be avoided as follows:
- rather than simply swapping indexes into the kv array, swap the entire meta
entries in the meta array. Swapping 16 bytes is only negligibly slower than
swapping 4 bytes. This requires adding the value-length into the meta array,
since we used to rely on the previous-in-the-array meta entry to determine
this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
- introduce an interface which allows key types to provide a 4-byte
comparison proxy. For string keys, this can simply be the first 4 bytes of
the string. The idea is that, if stringCompare(key1.proxy(), key2.proxy()) !=
0, then compare(key1, key2) should have the same result. If the proxies are
equal, the normal comparison method is used. We then include the 4-byte proxy
as part of the metadata entry, so that for many cases the indirection into
the data buffer can be avoided.
On a terasort benchmark, these optimizations plus an optimization to
WritableComparator.compareBytes dropped the aggregate mapside CPU millis by
40%, and the compare() routine mostly dropped off the oprofile results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort

2012-11-08 Thread Gopal V (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated MAPREDUCE-3235:
---

Attachment: (was: hashed-sort-MAPREDUCE-3235.patch)

Improve CPU cache behavior in map side sort
---

[jira] [Commented] (MAPREDUCE-4777) In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493181#comment-13493181
 ] 

Hudson commented on MAPREDUCE-4777:
---

Integrated in Hadoop-Mapreduce-trunk #1250 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1250/])
MAPREDUCE-4777. In TestIFile, testIFileReaderWithCodec relies on 
testIFileWriterWithCodec. Contributed by Sandy Ryza (Revision 1406645)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406645
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java


 In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
 -

 Key: MAPREDUCE-4777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-8969.patch


 The file used to test reading is expected to have been created by the file 
 used to test writing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Status: Patch Available  (was: Open)

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779-branch-1.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Attachment: (was: MAPREDUCE-4779.patch)

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779-branch-1.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4779:
---

Attachment: MAPREDUCE-4779-branch-1.patch

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779-branch-1.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+

Amir Sanjar created MAPREDUCE-4781:
--

 Summary: Unit test TestKerberosAuthenticationHandler fails with 
ant 1.8.3+
 Key: MAPREDUCE-4781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17_64 on x86
Reporter: Amir Sanjar


Problem:
JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not JUnit4:
Solution:
Migrate the testcase to JUnit4
How:

Remove extends TestCase

SetUp and TearDown methods

@Override
protected void setUp() throws Exception { }

replaced by:

@Before
public void setUp() throws Exception { }

Same for tearDown():

@Override
protected void tearDown() throws Exception { }

replaced by

@After
public void tearDown() throws Exception { }

Imports

The imports has to be reorganized:
Remove import junit.framework.TestCase;
Add org.junit.*; or import org.junit.After; import org.junit.Before;   
import org.junit.Test;


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4779) Unit test TestJobTrackerSafeMode fails with ant 1.8.3+


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493221#comment-13493221
 ] 

Hadoop QA commented on MAPREDUCE-4779:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552662/MAPREDUCE-4779-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2997//console

This message is automatically generated.

 Unit test TestJobTrackerSafeMode fails with  ant 1.8.3+
 ---

 Key: MAPREDUCE-4779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17 on x86_64
Reporter: Amir Sanjar
Assignee: Amir Sanjar
 Attachments: MAPREDUCE-4779-branch-1.patch


 Problem:
   JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
  Migrate the testcase to JUnit4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4781:
---

Attachment: MAPREDUCE-4781-branch-1.patch

 Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
 -

 Key: MAPREDUCE-4781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17_64 on x86
Reporter: Amir Sanjar
 Attachments: MAPREDUCE-4781-branch-1.patch


 Problem:
 JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
 Migrate the testcase to JUnit4
 How:
 Remove extends TestCase
 SetUp and TearDown methods
 @Override
 protected void setUp() throws Exception { }
 replaced by:
 @Before
 public void setUp() throws Exception { }
 Same for tearDown():
 @Override
 protected void tearDown() throws Exception { }
 replaced by
 @After
 public void tearDown() throws Exception { }
 Imports
 The imports has to be reorganized:
 Remove import junit.framework.TestCase;
 Add org.junit.*; or import org.junit.After; import org.junit.Before;   
 import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4781:
---

Status: Patch Available  (was: Open)

 Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
 -

 Key: MAPREDUCE-4781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17_64 on x86
Reporter: Amir Sanjar
 Attachments: MAPREDUCE-4781-branch-1.patch


 Problem:
 JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
 Migrate the testcase to JUnit4
 How:
 Remove extends TestCase
 SetUp and TearDown methods
 @Override
 protected void setUp() throws Exception { }
 replaced by:
 @Before
 public void setUp() throws Exception { }
 Same for tearDown():
 @Override
 protected void tearDown() throws Exception { }
 replaced by
 @After
 public void tearDown() throws Exception { }
 Imports
 The imports has to be reorganized:
 Remove import junit.framework.TestCase;
 Add org.junit.*; or import org.junit.After; import org.junit.Before;   
 import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4781) Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493237#comment-13493237
 ] 

Hadoop QA commented on MAPREDUCE-4781:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552663/MAPREDUCE-4781-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2998//console

This message is automatically generated.

 Unit test TestKerberosAuthenticationHandler fails with ant 1.8.3+
 -

 Key: MAPREDUCE-4781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
 Environment: Fedora 17_64 on x86
Reporter: Amir Sanjar
 Attachments: MAPREDUCE-4781-branch-1.patch


 Problem:
 JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not 
 JUnit4:
 Solution:
 Migrate the testcase to JUnit4
 How:
 Remove extends TestCase
 SetUp and TearDown methods
 @Override
 protected void setUp() throws Exception { }
 replaced by:
 @Before
 public void setUp() throws Exception { }
 Same for tearDown():
 @Override
 protected void tearDown() throws Exception { }
 replaced by
 @After
 public void tearDown() throws Exception { }
 Imports
 The imports has to be reorganized:
 Remove import junit.framework.TestCase;
 Add org.junit.*; or import org.junit.After; import org.junit.Before;   
 import org.junit.Test;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493253#comment-13493253
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-trunk-Commit #2979 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2979/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4772:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review Jon,

I checked the code into trunk, branch-2, and branch-0.23

 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR

2012-11-08 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493279#comment-13493279
 ] 

Jason Lowe commented on MAPREDUCE-4774:
---

Thanks for the analysis, Ivan!  JobImpl's state machine is missing a number of 
events in the FAILED state.  Due to the asynchronous nature of the job, task, 
and task attempt state machines, it is possible for tasks and task attempts to 
complete even though the job overall has decided to fail for other reasons.  
Therefore we need to ignore these additional events in the FAILED state to 
avoid their asynchronous arrival from knocking us out of the FAILED state and 
into the ERROR state.

JOB_TASK_COMPLETED
JOB_TASK_ATTEMPT_COMPLETED
JOB_MAP_TASK_RESCHEDULED


 repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR
 -

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ivan A. Veselovsky

 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Open  (was: Patch Available)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Attachment: mapreduce-2454.patch

Hi Alejandro,
  Thanks for the feedback.  I changed {{MapOutputCollector}} to 
{{PostMapProcessor}} so that there is only one interface.  I also made the 
other changes you suggested.  I am uploading the new patch.  Please take a look 
at it.

-- Asokan

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Mariappan Asokan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Patch Available  (was: Open)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493406#comment-13493406
 ] 

Hadoop QA commented on MAPREDUCE-2454:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552689/mapreduce-2454.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2999//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2999//console

This message is automatically generated.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2012-11-08 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493443#comment-13493443
 ] 

Philip Zeyliger commented on MAPREDUCE-4469:


If you're looking for a resource usage of a process and its children, look at 
{{man getrusage}} which includes a flag to get the CPU usage of the children.  
Mind you, you'd need native code to get at it.

 Resource calculation in child tasks is CPU-heavy
 

 Key: MAPREDUCE-4469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 1.0.3
Reporter: Todd Lipcon
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
 MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch


 In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
 each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
 that it's spending a lot of time looping through all the files in /proc to 
 calculate resource usage.
 As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
 within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
 runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Mark Fuhs (JIRA)

Mark Fuhs created MAPREDUCE-4782:


 Summary: NLineInputFormat skips first line of last InputSplit
 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs


NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
begin and length fields of the FileSplit are constructed differently for the 
first FileSplit vs. the rest.

After looping through all lines of a file, the final FileSplit is created, but 
the creation does not respect the difference of how the first vs. the rest of 
the FileSplits are created.

This results in the first line of the final InputSplit being skipped. I've 
created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Priority: Critical (was: Major)

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical

NLineInputFormat creates FileSplits that are then used by LineRecordReader to
generate Text values. To deal with an idiosyncrasy of LineRecordReader, the
begin and length fields of the FileSplit are constructed differently for the
first FileSplit vs. the rest.
After looping through all lines of a file, the final FileSplit is created,
but the creation does not respect the difference of how the first vs. the
rest of the FileSplits are created.
This results in the first line of the final InputSplit being skipped. I've
created a patch to NLineInputFormat, and this fixes the problem.

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493465#comment-13493465
]

Robert Joseph Evans commented on MAPREDUCE-4782:

Marked this a critical as data loss is serious. Mark can you post your patch?

NLineInputFormat skips first line of last InputSplit

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Mark Fuhs (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Fuhs updated MAPREDUCE-4782:
-

Attachment: MAPREDUCE-4782.patch

I confess I'm not terribly familiar with git, so this is just a git diff.

NLineInputFormat skips first line of last InputSplit

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Attachment: MR-4782.txt

I was able to reproduce the issue, and I have updated the test case to
reproduce it as well. The original test case did not check the last split, I
don't know why. I also found out that this exists in branch-1 as well.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
Attachments: MAPREDUCE-4782.patch, MR-4782.txt

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

 Target Version/s: 0.23.5
Affects Version/s: 0.23.0
   1.0.0
   2.0.0-alpha
   Status: Patch Available  (was: Open)

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.0-alpha, 1.0.0, 0.23.0, 0.22.0, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-08 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4666:
--

Attachment: MAPREDUCE-4666.patch

Patch to add metrics2 JvmMetrics to the history server.  I manually tested this 
to verify the metrics could be used via a sink configured in 
hadoop-metrics2.properties.  I also verified the JvmMetrics bean shows up via 
the JMX web service.

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-08 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4666:
--

Assignee: Jason Lowe
Target Version/s: 2.0.3-alpha, 0.23.5
  Status: Patch Available  (was: Open)

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493571#comment-13493571
 ] 

Hadoop QA commented on MAPREDUCE-4782:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552709/MR-4782.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapred.TestClusterMRNotification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//console

This message is automatically generated.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493579#comment-13493579
 ] 

Hadoop QA commented on MAPREDUCE-4666:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552712/MAPREDUCE-4666.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3001//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3001//console

This message is automatically generated.

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493584#comment-13493584
]

Robert Joseph Evans commented on MAPREDUCE-4782:

The patch looks good to me I am +1 on it, but I added in the test, so if
someone else could take a look I would appreciate it.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
Attachments: MAPREDUCE-4782.patch, MR-4782.txt

[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-08 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493658#comment-13493658
 ] 

Alejandro Abdelnur commented on MAPREDUCE-2454:
---

Asokan, if I understood you correctly you were working in a new testcase. This 
is not in the latest patch, correct? When you upload the patch with the new 
testcase please fix the following nit:

*Fetcher.java has 2 unused imports*

import java.io.InputStream;
import java.io.OutputStream;

Then, IMO we are good to go.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-4782:
--

Target Version/s: 1.1.1, 0.23.5  (was: 0.23.5)

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-08 Thread Matt Foley (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493669#comment-13493669
]

Matt Foley commented on MAPREDUCE-4782:
---

Nasty. Could you please port to branch-1 and I'll include it in the next
release?

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Priority: Critical
Attachments: MAPREDUCE-4782.patch, MR-4782.txt

[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-08 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Attachment: MAPREDUCE-4751-20121108.txt

Here's a first attempt for the patch. Very raw, no tests yet. Want to be sure 
that I am understanding your comments correctly.

Bobby/Ravi/Jason, can you please have a quick look at it please? Tx.

I get a feeling we need to do something similar in Job also. Even though it 
will not be the current bug assuming TaskImpl itself is stuck today.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)

Eli Collins created MAPREDUCE-4783:
--

 Summary: data_join mavenization broke the mr1 build
 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


MR-4238 didn't update build.xml and forgot to nuke the old data_join directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Attachment: mapreduce-4783.txt

Patch attached.

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-08 Thread Eli Collins (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Status: Patch Available  (was: Open)

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4783) data_join mavenization broke the mr1 build