[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2012-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493814#comment-13493814
 ] 

Todd Lipcon commented on MAPREDUCE-4469:


The problem with the getrusage approach is that it only includes terminated 
children, which means it doesn't track usage as the process progresses.

That said, maybe we really don't care, and we should just tally our own 
resource usage and then at the time of cleanup, add the children?

 Resource calculation in child tasks is CPU-heavy
 

 Key: MAPREDUCE-4469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 1.0.3
Reporter: Todd Lipcon
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
 MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch


 In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
 each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
 that it's spending a lot of time looping through all the files in /proc to 
 calculate resource usage.
 As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
 within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
 runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493899#comment-13493899
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Yarn-trunk #31 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/31/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile

2012-11-09 Thread Ivan A. Veselovsky (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493959#comment-13493959
 ] 

Ivan A. Veselovsky commented on MAPREDUCE-4764:
---

Hi, Daryn,
I'd like to clarify our plan of improvements in this test.

Currently the test writes the token into a file, then sets the file name as 
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY value in the config, and also 
passes the same file name as a value of a dedicated config property 
(KEY_SECURITY_TOKEN).
In the job: it gets the tokens from the job context 
(context.getCredentials().getAllTokens()), and gets the delegation token from 
there by the known key: let it be token X.
After that it gets the binary file name from the job config (key 
KEY_SECURITY_TOKEN), reads the file, de-serializing the token: let it be token 
Y.
Then the job asserts X.equals(Y).

This way the binary token propagation and serialization/de-serialization is 
checked, and this pretty much corresponds to the test name.

As I understand, you suggested to check also that the same delegation token is 
present in UserGroupInformation.getCurrentUser().getTokens(), right?
So, If I add this check, will you be okay with that test? Or, do you have other 
suggestions on how to improve it?

 repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
 

 Key: MAPREDUCE-4764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4764-trunk.patch


 the test is @Ignore-ed, and fails being enabled.
 Suggested to repair it to fill the coverage gap.
 Problems fixed in the test: 
 (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties 
 must be correctly set in the configuration to correctly enable the security 
 in the way this test implies. 
 (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not 
 passed into the Job configuration -- it is intentionally deleted from there. 
 So, we pass the binary file name in another dedicated property. 
 (3) The test was using deprecated cluster classes. All them are updated to 
 the modern analogs.
 (4) The delegation token found in the job context is now correctly compared 
 to the one deserialized from the binary file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493972#comment-13493972
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Hdfs-0.23-Build #430 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/430/])
svn merge -c 1407118 FIXES: MAPREDUCE-4772. Fetch failures can take way too 
long for a map to be restarted (bobby) (Revision 1407128)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407128
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493981#comment-13493981
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Hdfs-trunk #1221 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1221/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494006#comment-13494006
 ] 

Hudson commented on MAPREDUCE-4772:
---

Integrated in Hadoop-Mapreduce-trunk #1251 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1251/])
MAPREDUCE-4772. Fetch failures can take way too long for a map to be 
restarted (bobby) (Revision 1407118)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java


 Fetch failures can take way too long for a map to be restarted
 --

 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt


 In one particular case we saw a NM go down at just the right time, that most 
 of the reducers got the output of the map tasks, but not all of them.
 The ones that failed to get the output reported to the AM rather quickly that 
 they could not fetch from the NM, but because the other reducers were still 
 running the AM would not relaunch the map task because there weren't more 
 than 50% of the running reducers that had reported fetch failures.  Then 
 because of the exponential back-off for fetches on the reducers it took until 
 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and 
 report in again. At that point the other reducers had finished and the job 
 relaunched the map task.  If the reducers had still been running at 1:45 I 
 have no idea how long it would have taken for each of the tasks to get to 30 
 fetch failures.
 We need to trigger the map based off of percentage of reducers shuffling, not 
 percentage of reducers running, we also need to have a maximum limit of the 
 back off, so that we don't ever have the reducer waiting for days to try and 
 fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned MAPREDUCE-4782:


Assignee: Mark Fuhs

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Attachment: MR-4782-branch-1.txt

Patch for branch-1.  The patch is identical to the one for trunk except for 
line numbers and the location of the files.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Critical
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

Priority: Blocker  (was: Critical)

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494050#comment-13494050
 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:


Also now that I think about it more this really is a Blocker, not a critical.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494052#comment-13494052
 ] 

Hadoop QA commented on MAPREDUCE-4782:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console

This message is automatically generated.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494059#comment-13494059
 ] 

Jason Lowe commented on MAPREDUCE-4782:
---

+1, thanks Mark and Bobby.  Bobby or Matt, feel free to commit.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4266:
-

Status: Patch Available  (was: Open)

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-4266:
-

Attachment: MAPREDUCE-4266.sh

shell script to remove the directories and xml files.  You run it like 
./MAPREDUCE-4266.sh svn.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494062#comment-13494062
 ] 

Hadoop QA commented on MAPREDUCE-4266:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552845/MAPREDUCE-4266.sh
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3004//console

This message is automatically generated.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4782:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   1.2.0
   1.1.1
   Status: Resolved  (was: Patch Available)

Thanks Mark,

This is a great catch, I just wish we had found it sooner.  I put this into 
trunk, branch-2, branch-0.23, branch-1, and branch-1.1.

If I missed any branches that people want it in please let me know and I will 
see what I can do.

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494080#comment-13494080
 ] 

Hudson commented on MAPREDUCE-4782:
---

Integrated in Hadoop-trunk-Commit #2988 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2988/])
MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark 
Fuhs via bobby) (Revision 1407505)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java


 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Mark Fuhs (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494115#comment-13494115
 ] 

Mark Fuhs commented on MAPREDUCE-4782:
--

I'm glad I could contribute!

 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494139#comment-13494139
 ] 

Robert Joseph Evans commented on MAPREDUCE-4266:


The shell script looks good and does what we want.  +1.  I'll check this in.  
I'll also take a look at Jenkins to see if there are any builds still calling 
into ant for trunk.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494141#comment-13494141
 ] 

Robert Joseph Evans commented on MAPREDUCE-4266:


Oh and I'll also update the build/release instructions on twiki to remove ant :)

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494145#comment-13494145
 ] 

Hudson commented on MAPREDUCE-4266:
---

Integrated in Hadoop-trunk-Commit #2989 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2989/])
MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 
1407551)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/build.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/src


 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494162#comment-13494162
 ] 

Hudson commented on MAPREDUCE-4266:
---

Integrated in Hadoop-Mapreduce-trunk #1252 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/])
MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 
1407551)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/build.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy
* /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/src


 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494163#comment-13494163
 ] 

Hudson commented on MAPREDUCE-4782:
---

Integrated in Hadoop-Mapreduce-trunk #1252 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/])
MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark 
Fuhs via bobby) (Revision 1407505)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java


 NLineInputFormat skips first line of last InputSplit
 

 Key: MAPREDUCE-4782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
 Environment: job.setMapperClass(Mapper.class);  // just pass text 
 lines through to output
 job.setInputFormatClass(NLineInputFormat.class);
 NLineInputFormat.setNumLinesPerSplit(job, 100);
 NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt);
Reporter: Mark Fuhs
Assignee: Mark Fuhs
Priority: Blocker
 Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt


 NLineInputFormat creates FileSplits that are then used by LineRecordReader to 
 generate Text values. To deal with an idiosyncrasy of LineRecordReader, the 
 begin and length fields of the FileSplit are constructed differently for the 
 first FileSplit vs. the rest.
 After looping through all lines of a file, the final FileSplit is created, 
 but the creation does not respect the difference of how the first vs. the 
 rest of the FileSplits are created.
 This results in the first line of the final InputSplit being skipped. I've 
 created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR

2012-11-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4266:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Tom,

I put this into trunk, branch-2, and branch-0.23. I also updated Jenkis and 
wiki.

 remove Ant remnants from MR
 ---

 Key: MAPREDUCE-4266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh


 Remove:
 hadoop-mapreduce-project/src/*
 hadoop-mapreduce-project/ivy/*
 hadoop-mapreduce-project/build.xml
 hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-09 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Open  (was: Patch Available)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-09 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Attachment: mapreduce-2454.patch

Hi Alejandro,
  Thanks for catching the unused imports.  I updated Fetcher.java.  I have also 
added a test in the latest patch.

-- Asokan


 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-09 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-2454:


Status: Patch Available  (was: Open)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494276#comment-13494276
 ] 

Hadoop QA commented on MAPREDUCE-2454:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552875/mapreduce-2454.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.TestJobMonitorAndPrint
  org.apache.hadoop.mapred.TestClusterMRNotification

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//console

This message is automatically generated.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494279#comment-13494279
 ] 

Robert Joseph Evans commented on MAPREDUCE-4751:


I have been doing a quick once over on this, and I have a few comments.

# I think it would be cleaner for KillWaitAttemptKilledTransition to have a 
constructor that takes a TaskAttemptCompletionEventStatus, instead of having 
the subclasses set it directly themselves.
# Remove the commented out if statement.
# I am not sure if HashSet is the correct data type for success, failed, etc.  
They are likely to be sparse arrays with small amounts of data in them.  
Probably not very important, but if there are thousands of tasks it starts to 
add up.

Over all it looks OK.  I would like to see more tests though.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494295#comment-13494295
 ] 

Jason Lowe commented on MAPREDUCE-4751:
---

Part of the issue is that the job is hanging around waiting for all tasks to be 
killed rather than just exiting and letting YARN shoot any straggling 
containers.  I think it would be simpler/safer for the AM to just write out the 
final state stuff and exit, much like it does for the FAILED state.  If job's 
KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state 
to handle waiting for task cleanup when a job fails.

If we don't need the job's KILL_WAIT state then similarly we can probably ditch 
the task KILL_WAIT state -- it could just send off kills to all the 
corresponding task attempts and sit in the KILLED state.  Does it really need 
to wait?

Removing KILL_WAIT is quite a bit bigger change than the current one. as it 
would break a lot of tests that know and expect the KILL_WAIT state.  However I 
think it would be more robust in the long-term, as KILL_WAIT seems like a state 
primed for hanging if we don't really need it.  Since we're eager to get a fix 
for this in soon we could address that in a followup JIRA.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494303#comment-13494303
 ] 

Robert Joseph Evans commented on MAPREDUCE-4751:


Yes I think that would be better.  But that is a much larger change that would 
need more tests.  Perhaps we do that in a follow on JIRA.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494306#comment-13494306
 ] 

Robert Joseph Evans commented on MAPREDUCE-4783:


I think this can be dupes to MAPREDUCE-4266.  It removed all of the ant code.

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build

2012-11-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-4783:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Great, thanks.

 data_join mavenization broke the mr1 build
 --

 Key: MAPREDUCE-4783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: mapreduce-4783.txt


 MR-4238 didn't update build.xml and forgot to nuke the old data_join 
 directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server

2012-11-09 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494353#comment-13494353
 ] 

Jonathan Eagles commented on MAPREDUCE-4666:


+1. simple change that works for me.

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server

2012-11-09 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-4666:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4774) repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR

2012-11-09 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

Attachment: MAPREDUCE-4774.patch

This test failure is pretty pervasive and annoying, so taking this to get it 
fixed quickly.  Patch ignores some asynchronous task events in the FAILED state 
much like we do in the ERROR state, along with corresponding unit tests to 
verify we're handling them properly.

 repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR
 -

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

  Component/s: mrv2
   applicationmaster
Affects Version/s: 0.23.3
   2.0.1-alpha
  Summary: JobImpl does not handle asynchronous task events in 
FAILED state  (was: repair test 
org.apache.hadoop.mapred.TestClusterMRNotification.testMR)

Editing headline to more accurately reflect the root cause.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494368#comment-13494368
 ] 

Hudson commented on MAPREDUCE-4666:
---

Integrated in Hadoop-trunk-Commit #2996 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2996/])
MAPREDUCE-4666. JVM metrics for history server. (jlowe via jeagles) 
(Revision 1407669)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407669
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java


 JVM metrics for history server
 --

 Key: MAPREDUCE-4666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4666.patch


 It would be nice if the job history server provided the same JVM metrics via 
 metrics2 that other Hadoop daemons are already providing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4774:
--

Assignee: Jason Lowe
Target Version/s: 2.0.3-alpha, 0.23.5
  Status: Patch Available  (was: Open)

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 2.0.1-alpha, 0.23.3
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494377#comment-13494377
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


The change looks simple enough and does fix the failing test.  I am +1 p[ending 
Jenkins approval.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494389#comment-13494389
 ] 

Hadoop QA commented on MAPREDUCE-4774:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552903/MAPREDUCE-4774.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.TestRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//console

This message is automatically generated.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494392#comment-13494392
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


I ran TestRecovery Manually and it looks like it is a spurious failure.  We 
should file a JIRA to fix it.  Checking in the patch now.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4774:
---

   Resolution: Fixed
Fix Version/s: 0.23.5
   2.0.3-alpha
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Jason,

I put this into trunk, branch-2, and branch-0.23

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-09 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494423#comment-13494423
 ] 

Mariappan Asokan commented on MAPREDUCE-2454:
-

Hi Alejandro,
  I ran the tests on my box.  The failing tests are failing without my patch.  
The failure does not seem to be related to my patch.

-- Asokan


 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, 
 MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494459#comment-13494459
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-trunk-Commit #2997 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2997/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing 

[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Attachment: MAPREDUCE-4751-20121109.txt

Address Bobby's comments on my earlier patch.

 - Agree about Hashset. Started doing bitmaps, but it made code unreadable. 
Keeping HashSet but with an explicit initial capacity of 2 instead of the 
default 16. Could've been 1, but HashSet/HashMap immediately resizes it to two.
 - Addressed other changes.
 - Wrote up a test which passes with the changes and fails without. Had to 
spend a lot of time to get it right.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4751:
---

Status: Patch Available  (was: Open)

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494518#comment-13494518
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4751:


bq. Part of the issue is that the job is hanging around waiting for all tasks 
to be killed rather than just exiting and letting YARN shoot any straggling 
containers. I think it would be simpler/safer for the AM to just write out the 
final state stuff and exit, much like it does for the FAILED state. If job's 
KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state 
to handle waiting for task cleanup when a job fails.
I agree. Sharad/I debated this for a while when we wrote this initially. We let 
it be like it is now, just to be sure that AM's sanely exit, but we can change 
it. The only catch I can think of is, while the AM tries to do the remaining 
cleanup work (jobhistory etc), tasks will keep on bombarding AM with more 
updates.

Didn't realize that we don't have fail_wait state.

The change isn't much bigger but it can break tests. Let's pursue that 
separately.

The current bug is caused by Tasks waiting on TAs which should be fixed by my 
patch. Of course, it then opens up the job bug, let's fix that separately.

Regarding doing away with Task's kill_wait, I disagree. Tasks can get kill 
signal during the AM is running, so we should handle it explicitly by killing 
and waiting for all attempts, otherwise we run the risk of dangling JVMs doing 
nothing but occupying slots till AM exits.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed

2012-11-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494523#comment-13494523
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4749:


Looks good to me too. The algo is clean. And nice tests too! Checking it in.

 Killing multiple attempts of a task taker longer as more attempts are killed
 

 Key: MAPREDUCE-4749
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch


 The following was noticed on a mr job running on hadoop 1.1.0
 1. Start an mr job with 1 mapper
 2. Wait for a min
 3. Kill the first attempt of the mapper and then subsequently kill the other 
 3 attempts in order to fail the job
 The time taken to kill the task grew exponentially.
 1st attempt was killed immediately.
 2nd attempt took a little over a min
 3rd attempt took approx. 20 mins
 4th attempt took around 3 hrs.
 The command used to kill the attempt was hadoop job -fail-task
 Note that the command returned immediately as soon as the fail attempt was 
 accepted but the time the attempt was actually killed was as stated above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

2012-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494522#comment-13494522
 ] 

Hadoop QA commented on MAPREDUCE-4751:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12552951/MAPREDUCE-4751-20121109.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//console

This message is automatically generated.

 AM stuck in KILL_WAIT for days
 --

 Key: MAPREDUCE-4751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.2-alpha
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-4751-20121108.txt, 
 MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed

2012-11-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-4749.


   Resolution: Fixed
Fix Version/s: 1.1.1
 Hadoop Flags: Reviewed

I just committed this to branch-1 and branch-1.1. Thanks Arpit!

 Killing multiple attempts of a task taker longer as more attempts are killed
 

 Key: MAPREDUCE-4749
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Fix For: 1.1.1

 Attachments: MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, 
 MAPREDUCE-4749.branch-1.patch


 The following was noticed on a mr job running on hadoop 1.1.0
 1. Start an mr job with 1 mapper
 2. Wait for a min
 3. Kill the first attempt of the mapper and then subsequently kill the other 
 3 attempts in order to fail the job
 The time taken to kill the task grew exponentially.
 1st attempt was killed immediately.
 2nd attempt took a little over a min
 3rd attempt took approx. 20 mins
 4th attempt took around 3 hrs.
 The command used to kill the attempt was hadoop job -fail-task
 Note that the command returned immediately as soon as the fail attempt was 
 accepted but the time the attempt was actually killed was as stated above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira