date:20121109


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493899#comment-13493899
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Yarn-trunk #31 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/31/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-09 Thread Ivan A. Veselovsky (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493959#comment-13493959
]

Ivan A. Veselovsky commented on MAPREDUCE-4764:
---

Hi, Daryn,
I'd like to clarify our plan of improvements in this test.

Currently the test writes the token into a file, then sets the file name as
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY value in the config, and also
passes the same file name as a value of a dedicated config property
(KEY_SECURITY_TOKEN).
In the job: it gets the tokens from the job context
(context.getCredentials().getAllTokens()), and gets the delegation token from
there by the known key: let it be token X.
After that it gets the binary file name from the job config (key
KEY_SECURITY_TOKEN), reads the file, de-serializing the token: let it be token
Y.
Then the job asserts X.equals(Y).

This way the binary token propagation and serialization/de-serialization is
checked, and this pretty much corresponds to the test name.

As I understand, you suggested to check also that the same delegation token is
present in UserGroupInformation.getCurrentUser().getTokens(), right?
So, If I add this check, will you be okay with that test? Or, do you have other
suggestions on how to improve it?

repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

Key: MAPREDUCE-4764
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ivan A. Veselovsky
Attachments: MAPREDUCE-4764-trunk.patch

the test is @Ignore-ed, and fails being enabled.
Suggested to repair it to fill the coverage gap.
Problems fixed in the test:
(1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties
must be correctly set in the configuration to correctly enable the security
in the way this test implies.
(2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not
passed into the Job configuration -- it is intentionally deleted from there.
So, we pass the binary file name in another dedicated property.
(3) The test was using deprecated cluster classes. All them are updated to
the modern analogs.
(4) The delegation token found in the job context is now correctly compared
to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493972#comment-13493972
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Hdfs-0.23-Build #430 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/430/])
svn merge -c 1407118 FIXES: MAPREDUCE-4772. Fetch failures can take way too 
long for a map to be restarted (bobby) (Revision 1407128)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407128
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493981#comment-13493981
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Hdfs-trunk #1221 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1221/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494006#comment-13494006
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Mapreduce-trunk #1251 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1251/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned MAPREDUCE-4782:


Assignee: Mark Fuhs

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Attachment: MR-4782-branch-1.txt

Patch for branch-1. The patch is identical to the one for trunk except for
line numbers and the location of the files.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Critical
Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt

NLineInputFormat creates FileSplits that are then used by LineRecordReader to
generate Text values. To deal with an idiosyncrasy of LineRecordReader, the
begin and length fields of the FileSplit are constructed differently for the
first FileSplit vs. the rest.
After looping through all lines of a file, the final FileSplit is created,
but the creation does not respect the difference of how the first vs. the
rest of the FileSplits are created.
This results in the first line of the final InputSplit being skipped. I've
created a patch to NLineInputFormat, and this fixes the problem.

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Priority: Blocker  (was: Critical)

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494050#comment-13494050
]

Robert Joseph Evans commented on MAPREDUCE-4782:

Also now that I think about it more this really is a Blocker, not a critical.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494052#comment-13494052
]

Hadoop QA commented on MAPREDUCE-4782:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console

This message is automatically generated.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494059#comment-13494059
]

Jason Lowe commented on MAPREDUCE-4782:
---

+1, thanks Mark and Bobby. Bobby or Matt, feel free to commit.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt

[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4266:
-

Status: Patch Available  (was: Open)

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4266:
-

Attachment: MAPREDUCE-4266.sh

shell script to remove the directories and xml files.  You run it like 
./MAPREDUCE-4266.sh svn.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494062#comment-13494062
 ] 

Hadoop QA commented on MAPREDUCE-4266:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552845/MAPREDUCE-4266.sh
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3004//console

This message is automatically generated.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

[
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Resolution: Fixed
Fix Version/s: 0.23.5
2.0.3-alpha
3.0.0
1.2.0
1.1.1
Status: Resolved (was: Patch Available)

Thanks Mark,

This is a great catch, I just wish we had found it sooner. I put this into
trunk, branch-2, branch-0.23, branch-1, and branch-1.1.

If I missed any branches that people want it in please let me know and I will
see what I can do.

NLineInputFormat skips first line of last InputSplit

Key: MAPREDUCE-4782
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
Environment: job.setMapperClass(Mapper.class); // just pass text
lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494080#comment-13494080
 ] 

Hudson commented on MAPREDUCE-4782:
---

Integrated in Hadoop-trunk-Commit #2988 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2988/])
MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark 
Fuhs via bobby) (Revision 1407505)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java


 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Mark Fuhs (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494115#comment-13494115
 ] 

Mark Fuhs commented on MAPREDUCE-4782:
--

I'm glad I could contribute!

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494139#comment-13494139
 ] 

Robert Joseph Evans commented on MAPREDUCE-4266:


The shell script looks good and does what we want.  +1.  I'll check this in.  
I'll also take a look at Jenkins to see if there are any builds still calling 
into ant for trunk.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494141#comment-13494141
 ] 

Robert Joseph Evans commented on MAPREDUCE-4266:


Oh and I'll also update the build/release instructions on twiki to remove ant :)

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494145#comment-13494145
 ] 

Hudson commented on MAPREDUCE-4266:
---

Integrated in Hadoop-trunk-Commit #2989 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2989/])
MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 
1407551)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/build.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/src


 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494162#comment-13494162
 ] 

Hudson commented on MAPREDUCE-4266:
---

Integrated in Hadoop-Mapreduce-trunk #1252 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/])
MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 
1407551)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/build.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/src


 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494163#comment-13494163
 ] 

Hudson commented on MAPREDUCE-4782:
---

Integrated in Hadoop-Mapreduce-trunk #1252 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/])
MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark 
Fuhs via bobby) (Revision 1407505)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java


 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4266:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Tom,

I put this into trunk, branch-2, and branch-0.23. I also updated Jenkis and 
wiki.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Open  (was: Patch Available)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Attachment: mapreduce-2454.patch

Hi Alejandro,
  Thanks for catching the unused imports.  I updated Fetcher.java.  I have also 
added a test in the latest patch.

-- Asokan


 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Patch Available  (was: Open)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494276#comment-13494276
 ] 

Hadoop QA commented on MAPREDUCE-2454:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552875/mapreduce-2454.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.TestJobMonitorAndPrint
  org.apache.hadoop.mapred.TestClusterMRNotification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//console

This message is automatically generated.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

[
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494279#comment-13494279
]

Robert Joseph Evans commented on MAPREDUCE-4751:

I have been doing a quick once over on this, and I have a few comments.

# I think it would be cleaner for KillWaitAttemptKilledTransition to have a
constructor that takes a TaskAttemptCompletionEventStatus, instead of having
the subclasses set it directly themselves.
# Remove the commented out if statement.
# I am not sure if HashSet is the correct data type for success, failed, etc.
They are likely to be sparse arrays with small amounts of data in them.
Probably not very important, but if there are thousands of tasks it starts to
add up.

Over all it looks OK. I would like to see more tests though.

AM stuck in KILL_WAIT for days
--

Key: MAPREDUCE-4751
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg

We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them
as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a
few maps running. All these maps were scheduled on nodes which are now in the
RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

[
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494295#comment-13494295
]

Jason Lowe commented on MAPREDUCE-4751:
---

Part of the issue is that the job is hanging around waiting for all tasks to be
killed rather than just exiting and letting YARN shoot any straggling
containers. I think it would be simpler/safer for the AM to just write out the
final state stuff and exit, much like it does for the FAILED state. If job's
KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state
to handle waiting for task cleanup when a job fails.

If we don't need the job's KILL_WAIT state then similarly we can probably ditch
the task KILL_WAIT state -- it could just send off kills to all the
corresponding task attempts and sit in the KILLED state. Does it really need
to wait?

Removing KILL_WAIT is quite a bit bigger change than the current one. as it
would break a lot of tests that know and expect the KILL_WAIT state. However I
think it would be more robust in the long-term, as KILL_WAIT seems like a state
primed for hanging if we don't really need it. Since we're eager to get a fix
for this in soon we could address that in a followup JIRA.

AM stuck in KILL_WAIT for days
--

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494303#comment-13494303
 ] 

Robert Joseph Evans commented on MAPREDUCE-4751:


Yes I think that would be better.  But that is a much larger change that would 
need more tests.  Perhaps we do that in a follow on JIRA.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4783) data_join mavenization broke the mr1 build


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494306#comment-13494306
 ] 

Robert Joseph Evans commented on MAPREDUCE-4783:


I think this can be dupes to MAPREDUCE-4266.  It removed all of the ant code.

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-09 Thread Eli Collins (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Great, thanks.

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server

2012-11-09 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494353#comment-13494353
 ] 

Jonathan Eagles commented on MAPREDUCE-4666:


+1. simple change that works for me.

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-09 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-4666:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4774) repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

Attachment: MAPREDUCE-4774.patch

This test failure is pretty pervasive and annoying, so taking this to get it 
fixed quickly.  Patch ignores some asynchronous task events in the FAILED state 
much like we do in the ERROR state, along with corresponding unit tests to 
verify we're handling them properly.

 repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR
 -

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

  Component/s: mrv2
   applicationmaster
Affects Version/s: 0.23.3
   2.0.1-alpha
  Summary: JobImpl does not handle asynchronous task events in 
FAILED state  (was: repair test 
org.apache.hadoop.mapred.TestClusterMRNotification.testMR)

Editing headline to more accurately reflect the root cause.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494368#comment-13494368
 ] 

Hudson commented on MAPREDUCE-4666:
---

Integrated in Hadoop-trunk-Commit #2996 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2996/])
MAPREDUCE-4666. JVM metrics for history server. (jlowe via jeagles) 
(Revision 1407669)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407669
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java


 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

Assignee: Jason Lowe
Target Version/s: 2.0.3-alpha, 0.23.5
  Status: Patch Available  (was: Open)

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 2.0.1-alpha, 0.23.3
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494377#comment-13494377
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


The change looks simple enough and does fix the failing test.  I am +1 p[ending 
Jenkins approval.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494389#comment-13494389
 ] 

Hadoop QA commented on MAPREDUCE-4774:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552903/MAPREDUCE-4774.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.TestRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//console

This message is automatically generated.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494392#comment-13494392
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


I ran TestRecovery Manually and it looks like it is a spurious failure.  We 
should file a JIRA to fix it.  Checking in the patch now.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4774:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Jason,

I put this into trunk, branch-2, and branch-0.23

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494423#comment-13494423
 ] 

Mariappan Asokan commented on MAPREDUCE-2454:
-

Hi Alejandro,
  I ran the tests on my box.  The failing tests are failing without my patch.  
The failure does not seem to be related to my patch.

-- Asokan


 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494459#comment-13494459
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-trunk-Commit #2997 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2997/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing

[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

[
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Attachment: MAPREDUCE-4751-20121109.txt

Address Bobby's comments on my earlier patch.

- Agree about Hashset. Started doing bitmaps, but it made code unreadable.
Keeping HashSet but with an explicit initial capacity of 2 instead of the
default 16. Could've been 1, but HashSet/HashMap immediately resizes it to two.
- Addressed other changes.
- Wrote up a test which passes with the changes and fails without. Had to
spend a lot of time to get it right.

AM stuck in KILL_WAIT for days
--

[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Status: Patch Available  (was: Open)

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

[
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494518#comment-13494518
]

Vinod Kumar Vavilapalli commented on MAPREDUCE-4751:

bq. Part of the issue is that the job is hanging around waiting for all tasks
to be killed rather than just exiting and letting YARN shoot any straggling
containers. I think it would be simpler/safer for the AM to just write out the
final state stuff and exit, much like it does for the FAILED state. If job's
KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state
to handle waiting for task cleanup when a job fails.
I agree. Sharad/I debated this for a while when we wrote this initially. We let
it be like it is now, just to be sure that AM's sanely exit, but we can change
it. The only catch I can think of is, while the AM tries to do the remaining
cleanup work (jobhistory etc), tasks will keep on bombarding AM with more
updates.

Didn't realize that we don't have fail_wait state.

The change isn't much bigger but it can break tests. Let's pursue that
separately.

The current bug is caused by Tasks waiting on TAs which should be fixed by my
patch. Of course, it then opens up the job bug, let's fix that separately.

Regarding doing away with Task's kill_wait, I disagree. Tasks can get kill
signal during the AM is running, so we should handle it explicitly by killing
and waiting for all attempts, otherwise we run the risk of dangling JVMs doing
nothing but occupying slots till AM exits.

AM stuck in KILL_WAIT for days
--

[jira] [Commented] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed

[
https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494523#comment-13494523
]

Vinod Kumar Vavilapalli commented on MAPREDUCE-4749:

Looks good to me too. The algo is clean. And nice tests too! Checking it in.

Killing multiple attempts of a task taker longer as more attempts are killed

Key: MAPREDUCE-4749
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Attachments: MAPREDUCE-4749.branch-1.patch,
MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch,
MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch,
MAPREDUCE-4749.branch-1.patch

The following was noticed on a mr job running on hadoop 1.1.0
1. Start an mr job with 1 mapper
2. Wait for a min
3. Kill the first attempt of the mapper and then subsequently kill the other
3 attempts in order to fail the job
The time taken to kill the task grew exponentially.
1st attempt was killed immediately.
2nd attempt took a little over a min
3rd attempt took approx. 20 mins
4th attempt took around 3 hrs.
The command used to kill the attempt was hadoop job -fail-task
Note that the command returned immediately as soon as the fail attempt was
accepted but the time the attempt was actually killed was as stated above.

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494522#comment-13494522
 ] 

Hadoop QA commented on MAPREDUCE-4751:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552951/MAPREDUCE-4751-20121109.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//console

This message is automatically generated.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed