[jira] [Commented] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable

2013-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594387#comment-13594387
 ] 

Hadoop QA commented on MAPREDUCE-5049:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572258/MAPREDUCE-5049.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3388//console

This message is automatically generated.

> CombineFileInputFormat counts all compressed files non-splitable
> 
>
> Key: MAPREDUCE-5049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5049.patch
>
>
> In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec 
> into account and thinks that all compressible input files aren't splittable.  
> This is a regression from when handling for non-splitable compression codecs 
> was originally added in MAPREDUCE-1597, and seems to have somehow gotten in 
> when the code was pulled from 0.22 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5049:
--

Status: Patch Available  (was: Open)

> CombineFileInputFormat counts all compressed files non-splitable
> 
>
> Key: MAPREDUCE-5049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5049.patch
>
>
> In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec 
> into account and thinks that all compressible input files aren't splittable.  
> This is a regression from when handling for non-splitable compression codecs 
> was originally added in MAPREDUCE-1597, and seems to have somehow gotten in 
> when the code was pulled from 0.22 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5049:
--

Attachment: MAPREDUCE-5049.patch

> CombineFileInputFormat counts all compressed files non-splitable
> 
>
> Key: MAPREDUCE-5049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5049.patch
>
>
> In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec 
> into account and thinks that all compressible input files aren't splittable.  
> This is a regression from when handling for non-splitable compression codecs 
> was originally added in MAPREDUCE-1597, and seems to have somehow gotten in 
> when the code was pulled from 0.22 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API

2013-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594334#comment-13594334
 ] 

Hadoop QA commented on MAPREDUCE-5038:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12572246/MAPREDUCE-5038-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3387//console

This message is automatically generated.

> old API CombineFileInputFormat missing fixes that are in new API 
> -
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but 
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default 
> FS
> In trunk this is not an issue as the one in mapred extends the one in 
> mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5038:
--

Status: Patch Available  (was: Open)

> old API CombineFileInputFormat missing fixes that are in new API 
> -
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but 
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default 
> FS
> In trunk this is not an issue as the one in mapred extends the one in 
> mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API

2013-03-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594329#comment-13594329
 ] 

Sandy Ryza commented on MAPREDUCE-5038:
---

Filed MAPREDUCE-5049 to handle SplittableCompressionCodec.  Uploaded a new 
patch that includes MAPREDUCE-1423.

> old API CombineFileInputFormat missing fixes that are in new API 
> -
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but 
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default 
> FS
> In trunk this is not an issue as the one in mapred extends the one in 
> mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5038:
--

Attachment: MAPREDUCE-5038-1.patch

> old API CombineFileInputFormat missing fixes that are in new API 
> -
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but 
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default 
> FS
> In trunk this is not an issue as the one in mapred extends the one in 
> mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable

2013-03-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5049:
-

 Summary: CombineFileInputFormat counts all compressed files 
non-splitable
 Key: MAPREDUCE-5049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec 
into account and thinks that all compressible input files aren't splittable.  
This is a regression from when handling for non-splitable compression codecs 
was originally added in MAPREDUCE-1597, and seems to have somehow gotten in 
when the code was pulled from 0.22 to branch-1.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594253#comment-13594253
 ] 

Hadoop QA commented on MAPREDUCE-5047:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572230/MAPREDUCE-5047.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3386//console

This message is automatically generated.

> keep.failed.task.files=true causes job failure on secure clusters
> -
>
> Key: MAPREDUCE-5047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task, tasktracker
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5047.patch
>
>
> To support IsolationRunner, split info is written to local directories.  This 
> occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
> and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
> it fails, because the tasktracker does not have permission to write to the 
> user's directory. It is likely that the call to localizeConfiguration in the 
> tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594251#comment-13594251
 ] 

Sandy Ryza commented on MAPREDUCE-5047:
---

localizeConfiguration is needed in the tasktracker in order to set 
task-specific configuration options, but split.info does not need to be created 
at that time.  The patch moves the action of writing out split.info into a new 
writeFilesRequiredForRerun method.  This method is called by the Child, but not 
by the tasktracker.

Tested on a pseudo distributed cluster and on a secure distributed cluster that 
the permissions error no longer shows up and that split.info is still written 
out to the correct location.

> keep.failed.task.files=true causes job failure on secure clusters
> -
>
> Key: MAPREDUCE-5047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task, tasktracker
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5047.patch
>
>
> To support IsolationRunner, split info is written to local directories.  This 
> occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
> and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
> it fails, because the tasktracker does not have permission to write to the 
> user's directory. It is likely that the call to localizeConfiguration in the 
> tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5047:
--

Status: Patch Available  (was: Open)

> keep.failed.task.files=true causes job failure on secure clusters
> -
>
> Key: MAPREDUCE-5047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task, tasktracker
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5047.patch
>
>
> To support IsolationRunner, split info is written to local directories.  This 
> occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
> and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
> it fails, because the tasktracker does not have permission to write to the 
> user's directory. It is likely that the call to localizeConfiguration in the 
> tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5047:
--

Attachment: MAPREDUCE-5047.patch

> keep.failed.task.files=true causes job failure on secure clusters
> -
>
> Key: MAPREDUCE-5047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task, tasktracker
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5047.patch
>
>
> To support IsolationRunner, split info is written to local directories.  This 
> occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
> and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
> it fails, because the tasktracker does not have permission to write to the 
> user's directory. It is likely that the call to localizeConfiguration in the 
> tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

2013-03-05 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594248#comment-13594248
 ] 

Mariappan Asokan commented on MAPREDUCE-4842:
-

Hi Ravi,
  Thanks for the compliment.  I will look at the patch for MAPREDUCE-3685 and 
post my comments there once I understand it completely.

-- Asokan


> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Mariappan Asokan
>Priority: Blocker
> Fix For: 2.0.3-alpha, 0.23.6
>
> Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered

2013-03-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5042:
--

Attachment: MAPREDUCE-5042.patch

This is complicated by the fact that the job token currently serves a dual-role 
to authenticate both the shuffle *and* the task umbilical.  The former is 
something that should persist across app attempts, while the latter should not. 
 We don't want old task attempts authenticating with the new app attempt, at 
least not at this point.  It would only serve to confuse the new app attempt.

Therefore I propose the following:

* The current job token remains primarily as-is for the authenticating of the 
task umbilical, and each AM attempt continues to generate its own job token.
* A new secret key, the shuffle secret, will be generated by the job client 
when the job is submitted as part of the job's credentials.  Each app attempt 
will extract the shuffle secret from the job's credentials and use it as the 
shared secret to authenticate the shuffle

Attaching the first draft of a patch that implements that proposal.  It needs 
unit tests, but I've manually tested that it can recover map tasks and 
successfully shuffle their data.

> Reducer unable to fetch for a map task that was recovered
> -
>
> Key: MAPREDUCE-5042
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am, security
>Affects Versions: 0.23.7, 2.0.4-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-5042.patch
>
>
> If an application attempt fails and is relaunched the AM will try to recover 
> previously completed tasks.  If a reducer needs to fetch the output of a map 
> task attempt that was recovered then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to 
> "Verification of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered

2013-03-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-5042:
-

Assignee: Jason Lowe

> Reducer unable to fetch for a map task that was recovered
> -
>
> Key: MAPREDUCE-5042
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am, security
>Affects Versions: 0.23.7, 2.0.4-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>
> If an application attempt fails and is relaunched the AM will try to recover 
> previously completed tasks.  If a reducer needs to fetch the output of a map 
> task attempt that was recovered then it will fail with a 401 error like this:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156)
> {noformat}
> Looking at the corresponding NM's logs, we see the shuffle failed due to 
> "Verification of the hashReply failed".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594174#comment-13594174
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5028:
---

Thanks Karthik. I've committed the patch for branch-1. Thanks Chris for 
reviewing it.

+1 for the trunk patch. I'll wait for a bit to see if there are comments for 
others before committing it.

> Maps fail when io.sort.mb is set to high value
> --
>
> Key: MAPREDUCE-5028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
> mr-5028-branch1.patch, mr-5028-trunk.patch
>
>
> Verified the problem exists on branch-1 with the following configuration:
> Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
> io.sort.mb=1280, dfs.block.size=2147483648
> Run teragen to generate 4 GB data
> Maps fail when you run wordcount on this configuration with the following 
> error: 
> {noformat}
> java.io.IOException: Spill failed
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
>   at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>   at 
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
>   at 
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594141#comment-13594141
 ] 

Hudson commented on MAPREDUCE-5027:
---

Integrated in Hadoop-trunk-Commit #3421 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3421/])
MAPREDUCE-5027. Shuffle does not limit number of outstanding connections 
(Robert Parker via jeagles) (Revision 1453098)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453098
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java


> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Fix For: 3.0.0, 0.23.7, 2.0.4-beta
>
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-5027:
---

   Resolution: Fixed
Fix Version/s: 2.0.4-beta
   0.23.7
   3.0.0
   Status: Resolved  (was: Patch Available)

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Fix For: 3.0.0, 0.23.7, 2.0.4-beta
>
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-3688) Need better Error message if AM is killed/throws exception

2013-03-05 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated MAPREDUCE-3688:


Attachment: mapreduce-3688-h0.23-v01.patch

This has been a pain for our users as well.

I don't think this patch will fly well with the reviewers, but maybe it'll help 
move the discussion forward. 

I didn't see a good way of communicating the error message to the caller so 
decided to sacrifice the stdout that current MRAppMaster does not use. 

After the patch, webUI would show

{quote}
Diagnostics: Application application_1362527487477_0005 failed 1 times due 
to AM Container for appattempt_1362527487477_0005_01 exited with exitCode: 
1 due to: Error starting MRAppMaster: org.apache.hadoop.yarn.YarnException: 
java.io.IOException: Split metadata size exceeded 20. Aborting job 
job_1362527487477_0005 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1290)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1146)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1118)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:823) at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:121) at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1094)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:998) 
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1273) 
at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1269)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1226) 
Caused by: java.io.IOException: Split metadata size exceeded 20. Aborting job 
job_1362527487477_0005 at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53)
 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1285)
 ... 16 more .Failing this attempt.. Failing the application.
{quote}

(This patch is based on 0.23)

> Need better Error message if AM is killed/throws exception
> --
>
> Key: MAPREDUCE-3688
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am, mrv2
>Affects Versions: 0.23.1
>Reporter: David Capwell
>Assignee: Sandy Ryza
> Fix For: 0.23.2
>
> Attachments: mapreduce-3688-h0.23-v01.patch
>
>
> We need better error messages in the UI if the AM gets killed or throws an 
> Exception.
> If the following error gets thrown: 
> java.lang.NumberFormatException: For input string: "9223372036854775807l" // 
> last char is an L
> then the UI should say this exception.  Instead I get the following:
> Application application_1326504761991_0018 failed 1 times due to AM Container 
> for appattempt_1326504761991_0018_01
> exited with exitCode: 1 due to: Exception from container-launch: 
> org.apache.hadoop.util.Shell$ExitCodeException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594127#comment-13594127
 ] 

Jonathan Eagles commented on MAPREDUCE-5027:


+1. This patch looks good. Thanks, Rob.

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager

2013-03-05 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594124#comment-13594124
 ] 

Ravi Prakash commented on MAPREDUCE-3685:
-

This is probably more for just my reference than anything. Here's my 
understanding from reading the code. This is very approximate and may be 
inaccurate for some cases

IntermediateMemoryToMemoryMerger - Can be toggled on / off
- Merges map outputs *from* memory *to* memory
- When is it triggered? (If at all enabled, which it isn't by default) When the 
number of in memory map outputs > memToMemMergeOutputsThreshold
I am guessing this was put in on the premise that it might be faster to sort 
fewer number of streams even in memory. And also we can sort perhaps while 
waiting to fetch.

InMemoryMerger
- Merges map outputs *from* memory *to* disk
- When is it triggered? When storing more map outputs in memory would cause to 
go over memory allocated for shuffle.


> There are some bugs in implementation of MergeManager
> -
>
> Key: MAPREDUCE-3685
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.1
>Reporter: anty.rao
>Assignee: anty
>Priority: Critical
> Attachments: MAPREDUCE-3685-branch-0.23.1.patch, 
> MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, 
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, 
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, 
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, 
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, 
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

2013-03-05 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594110#comment-13594110
 ] 

Ravi Prakash commented on MAPREDUCE-4842:
-

Hi Mariappan,

bq. This is a tangent to point 1. The mergeFactor is set to the configured 
value for IntermediateMemoryToMemoryMerger but to Integer.MAX_VALUE for 
InMemoryMerger and OnDiskMerger. We have to find out the rationale behind these 
choices.

Thanks for all your work on the MergeManager. It is soo much cleaner now! 
Thanks much.

Anyway, since you have been in this area of the code, I was wondering if you 
could please review MAPREDUCE-3685? The mergeFactor for the OnDiskMerger was 
wrong. For inMemoryMerger it seems to be correct (because io.sort.factor is 
defined as "The number of streams to merge at once while sorting files. This 
determines the number of open file handles."). Besides I wonder if we want to 
really go into the level of detail of the number of fetched cache lines and not 
just simplify by assuming constant access to all memory. Please consider 
continuing the discussion there.

Thanks



> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Mariappan Asokan
>Priority: Blocker
> Fix For: 2.0.3-alpha, 0.23.6
>
> Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, 
> mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5048) streaming combiner feature breaks when input binary, output text

2013-03-05 Thread Antonio Piccolboni (JIRA)
Antonio Piccolboni created MAPREDUCE-5048:
-

 Summary: streaming combiner feature breaks when input binary, 
output text
 Key: MAPREDUCE-5048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5048
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 1.0.2
 Environment: centos 6.2
Reporter: Antonio Piccolboni


When running hadoop streaming job with binary input and shuffling but text 
output with combiner on, it fails with error

java.lang.RuntimeException: java.io.IOException: wrong key class: class 
org.apache.hadoop.io.Text is not class 
org.apache.hadoop.typedbytes.TypedBytesWritable


repro:

hadoop jar  -D  'stream.map.input=typedbytes' -D 
'stream.map.output=typedbytes' -D 'stream.reduce.input=typedbytes'  
 -input   -output
-mappercat -combiner cat   -reducer cat -inputformat 
'org.apache.hadoop.streaming.AutoInputFormat'  

if you remove the -combiner option, it works with only performance 
implications. If you specify in addition -D 
'stream.reduce.output=typedbytes', it succeeds but outputs raw typedbytes 
(without the sequence file superstructure)

I asked in the discussion of HADOOP-1722 (where typedbytes was first 
introduced)  if this is a bug or my misunderstanding of that spec and a 
committer chipped in saying it seems a bug to him too.
Originally reported by a user of the rmr2 package for R and filed by me here 
https://github.com/RevolutionAnalytics/rmr2/issues/16

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API

2013-03-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594002#comment-13594002
 ] 

Sangjin Lee commented on MAPREDUCE-5038:


I filed MAPREDUCE-5046 to backport MAPREDUCE-1423, then found this.

I took a look at the patch here, but I'm not sure if it subsumes the changes 
contained in MAPREDUCE-1423. Specifically, rackToNodes seems still static, 
which is a thread-safety problem. Could you absorb the fix that's in 
MAPREDUCE-1423? I'd be happy to look at that if you want.

> old API CombineFileInputFormat missing fixes that are in new API 
> -
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but 
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default 
> FS
> In trunk this is not an issue as the one in mapred extends the one in 
> mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593919#comment-13593919
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12572165/MAPREDUCE-5027-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3385//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3385//console

This message is automatically generated.

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5047:
-

 Summary: keep.failed.task.files=true causes job failure on secure 
clusters
 Key: MAPREDUCE-5047
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task, tasktracker
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


To support IsolationRunner, split info is written to local directories.  This 
occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
it fails, because the tasktracker does not have permission to write to the 
user's directory. It is likely that the call to localizeConfiguration in the 
tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: (was: MAPREDUCE-5027-4.patch)

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027-4.patch

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593886#comment-13593886
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12572161/MAPREDUCE-5027-b023-2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3384//console

This message is automatically generated.

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027-b023-2.patch
MAPREDUCE-5027-4.patch

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-03-05 Thread Robert Parker (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593881#comment-13593881
 ] 

Robert Parker commented on MAPREDUCE-5027:
--

Jon, I have uploaded a new patch for trunk and branch 0.23, I have eliminated 
the timing issues in the test.

> Shuffle does not limit number of outstanding connections
> 
>
> Key: MAPREDUCE-5027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, 
> MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, 
> MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch
>
>
> The ShuffleHandler does not have any configurable limits to the number of 
> outstanding connections allowed.  Therefore a node with many map outputs and 
> many reducers in the cluster trying to fetch those outputs can exhaust a 
> nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5046) backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat

2013-03-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created MAPREDUCE-5046:
--

 Summary: backport MAPREDUCE-1423 to 
mapred.lib.CombineFileInputFormat
 Key: MAPREDUCE-5046
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5046
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 1.1.1
Reporter: Sangjin Lee


The CombineFileInputFormat class in org.apache.hadoop.mapred.lib (the old API) 
has a couple of issues. These issues were addressed in the new API 
(MAPREDUCE-1423), but the old class was not fixed.

The main issue the JIRA refers to is a performance problem. However, IMO there 
is a more serious problem which is a thread-safety issue (rackToNodes) which 
was fixed alongside.

What is the policy on addressing issues in the old API? Can we backport this to 
the old class?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5045) UtilTest#isCygwin method appears to be unused

2013-03-05 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5045:


 Summary: UtilTest#isCygwin method appears to be unused
 Key: MAPREDUCE-5045
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5045
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: contrib/streaming, test
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Priority: Trivial


Method {{UtilTest#isCygwin}} in 
/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/UtilTest.java
 appears to be unused.  If so, then we need to remove it.  If anything is 
calling it, then we need to update the naming to isWindows, or perhaps just 
change call sites to use {{Shell#WINDOWS}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5043) Fetch failure processing can cause AM event queue to backup and eventually OOM

2013-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593384#comment-13593384
 ] 

Hudson commented on MAPREDUCE-5043:
---

Integrated in Hadoop-Hdfs-trunk #1335 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/])
MAPREDUCE-5043. Fetch failure processing can cause AM event queue to backup 
and eventually OOM (Jason Lowe via bobby) (Revision 1452372)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452372
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/TaskAttempt.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java


> Fetch failure processing can cause AM event queue to backup and eventually OOM
> --
>
> Key: MAPREDUCE-5043
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5043
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 0.23.7, 2.0.4-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.0.0, 0.23.7, 2.0.4-beta
>
> Attachments: MAPREDUCE-5043.patch
>
>
> Saw an MRAppMaster with a 3G heap OOM.  Upon investigating another instance 
> of it running, we saw the UI in a weird state where the task table and task 
> attempt tables in the job overview page weren't consistent.  The AM log 
> showed the AsyncDispatcher had hundreds of thousands of events in the event 
> queue, and jstacks showed it spending a lot of time in fetch failure 
> processing.  It turns out fetch failure processing is currently *very* 
> expensive, with a triple {{for}} loop where the inner loop is calling the 
> quite-expensive {{TaskAttempt.getReport}}.  That function ends up 
> type-converting the entire task report, counters and all, and performing 
> locale conversions among other things.  It does this for every reduce task in 
> the job, for every map task that failed.  And when it's done building up the 
> large task report, it pulls out one field, the phase, then throws the report 
> away.
> While the AM is busy processing fetch failures, tasks attempts are continuing 
> to send events to the AM including memory-expensive events like status 
> updates which include the counters.  These back up in the AsyncDispatcher 
> event queue and eventually even an AM with a large heap size will run out of 
> memory and crash or expire because it thrashes in garbage collect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5043) Fetch failure processing can cause AM event queue to backup and eventually OOM

2013-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593307#comment-13593307
 ] 

Hudson commented on MAPREDUCE-5043:
---

Integrated in Hadoop-Yarn-trunk #146 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/146/])
MAPREDUCE-5043. Fetch failure processing can cause AM event queue to backup 
and eventually OOM (Jason Lowe via bobby) (Revision 1452372)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452372
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/TaskAttempt.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java


> Fetch failure processing can cause AM event queue to backup and eventually OOM
> --
>
> Key: MAPREDUCE-5043
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5043
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 0.23.7, 2.0.4-beta
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.0.0, 0.23.7, 2.0.4-beta
>
> Attachments: MAPREDUCE-5043.patch
>
>
> Saw an MRAppMaster with a 3G heap OOM.  Upon investigating another instance 
> of it running, we saw the UI in a weird state where the task table and task 
> attempt tables in the job overview page weren't consistent.  The AM log 
> showed the AsyncDispatcher had hundreds of thousands of events in the event 
> queue, and jstacks showed it spending a lot of time in fetch failure 
> processing.  It turns out fetch failure processing is currently *very* 
> expensive, with a triple {{for}} loop where the inner loop is calling the 
> quite-expensive {{TaskAttempt.getReport}}.  That function ends up 
> type-converting the entire task report, counters and all, and performing 
> locale conversions among other things.  It does this for every reduce task in 
> the job, for every map task that failed.  And when it's done building up the 
> large task report, it pulls out one field, the phase, then throws the report 
> away.
> While the AM is busy processing fetch failures, tasks attempts are continuing 
> to send events to the AM including memory-expensive events like status 
> updates which include the counters.  These back up in the AsyncDispatcher 
> event queue and eventually even an AM with a large heap size will run out of 
> memory and crash or expire because it thrashes in garbage collect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira