[jira] [Commented] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-30 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809987#comment-13809987
 ] 

Chuan Liu commented on MAPREDUCE-5604:
--

+1. Change looks good to me.

> TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
> path length
> ---
>
> Key: MAPREDUCE-5604
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: MAPREDUCE-5604.1.patch
>
>
> The test uses the full class name as a component of the 
> {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
> causes container launch to fail when trying to access files at a path longer 
> than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5392) "mapred job -history all" command throws IndexOutOfBoundsException

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809983#comment-13809983
 ] 

Hadoop QA commented on MAPREDUCE-5392:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611258/MAPREDUCE-5392.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

  org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4161//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4161//console

This message is automatically generated.

> "mapred job -history all" command throws IndexOutOfBoundsException
> --
>
> Key: MAPREDUCE-5392
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch
>
>
> When I use an "all" option by "mapred job -history" comamnd, the following 
> exceptions are displayed and do not work.
> {code}
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -3
> at java.lang.String.substring(String.java:1875)
> at 
> org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
> at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
> {code}
> This is because a node name recorded in History file is not given "tracker_". 
> Therefore it makes modifications to be able to read History file even if a 
> node name is not given by "tracker_".
> In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809977#comment-13809977
 ] 

Hadoop QA commented on MAPREDUCE-5604:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611259/MAPREDUCE-5604.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4160//console

This message is automatically generated.

> TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
> path length
> ---
>
> Key: MAPREDUCE-5604
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: MAPREDUCE-5604.1.patch
>
>
> The test uses the full class name as a component of the 
> {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
> causes container launch to fail when trying to access files at a path longer 
> than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-5604:
-

Attachment: MAPREDUCE-5604.1.patch

I'm attaching a patch that applies the same fix we've used in similar cases: 
use the simple class name instead of the fullly qualified class name so that 
the testing directory is shorter.

> TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
> path length
> ---
>
> Key: MAPREDUCE-5604
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: MAPREDUCE-5604.1.patch
>
>
> The test uses the full class name as a component of the 
> {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
> causes container launch to fail when trying to access files at a path longer 
> than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-5604:
-

Status: Patch Available  (was: Open)

> TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
> path length
> ---
>
> Key: MAPREDUCE-5604
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: MAPREDUCE-5604.1.patch
>
>
> The test uses the full class name as a component of the 
> {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
> causes container launch to fail when trying to access files at a path longer 
> than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5392) "mapred job -history all" command throws IndexOutOfBoundsException

2013-10-30 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated MAPREDUCE-5392:
--

Attachment: MAPREDUCE-5392.patch

> "mapred job -history all" command throws IndexOutOfBoundsException
> --
>
> Key: MAPREDUCE-5392
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch
>
>
> When I use an "all" option by "mapred job -history" comamnd, the following 
> exceptions are displayed and do not work.
> {code}
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -3
> at java.lang.String.substring(String.java:1875)
> at 
> org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
> at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
> {code}
> This is because a node name recorded in History file is not given "tracker_". 
> Therefore it makes modifications to be able to read History file even if a 
> node name is not given by "tracker_".
> In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-30 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5604:


 Summary: TestMRAMWithNonNormalizedCapabilities fails on Windows 
due to exceeding max path length
 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The test uses the full class name as a component of the 
{{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
causes container launch to fail when trying to access files at a path longer 
than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5392) "mapred job -history all" command throws IndexOutOfBoundsException

2013-10-30 Thread Shinichi Yamashita (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809963#comment-13809963
 ] 

Shinichi Yamashita commented on MAPREDUCE-5392:
---

Jenkins occurred OutOfMemoryError. I checked Jenkins log and OOM occurred at 
native-maven-plugin phase.
To make sure, I attach a same patch and Jenkins test.

> "mapred job -history all" command throws IndexOutOfBoundsException
> --
>
> Key: MAPREDUCE-5392
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch
>
>
> When I use an "all" option by "mapred job -history" comamnd, the following 
> exceptions are displayed and do not work.
> {code}
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -3
> at java.lang.String.substring(String.java:1875)
> at 
> org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
> at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
> {code}
> This is because a node name recorded in History file is not given "tracker_". 
> Therefore it makes modifications to be able to read History file even if a 
> node name is not given by "tracker_".
> In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809703#comment-13809703
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4159//console

This message is automatically generated.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809680#comment-13809680
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

The patch compiles fine for me locally.  The failure seems to be some sort of 
javah issue that I've seen in other builds as well.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809641#comment-13809641
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4158//console

This message is automatically generated.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809636#comment-13809636
 ] 

Andrey Klochkov commented on MAPREDUCE-4980:


The build failed due to OOM while processing native code. Not related to the 
patch.

> Parallel test execution of hadoop-mapreduce-client-core
> ---
>
> Key: MAPREDUCE-4980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
> MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
> MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
> MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809611#comment-13809611
 ] 

Hadoop QA commented on MAPREDUCE-4980:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611165/MAPREDUCE-4980--n8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 125 
new or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4157//console

This message is automatically generated.

> Parallel test execution of hadoop-mapreduce-client-core
> ---
>
> Key: MAPREDUCE-4980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
> MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
> MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
> MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-4980:
---

Attachment: MAPREDUCE-4980--n8.patch

Attaching rebased patch.

> Parallel test execution of hadoop-mapreduce-client-core
> ---
>
> Key: MAPREDUCE-4980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
> MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
> MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
> MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-30 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809577#comment-13809577
 ] 

Andrey Klochkov commented on MAPREDUCE-3860:


Also, it could be that the timeouts I set in the tests are still too low for 
you, if your machine is that slow. Can you increase them by up to an order of 
magnitude to check that? 

> [Rumen] Bring back the removed Rumen unit tests
> ---
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
> MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
> MAPREDUCE-3860.patch, 
> org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
> rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-30 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Attachment: MAPREDUCE-3860--n4.patch

Jonathan,
The logs don't provide much info on why tests fail. Per your description it 
seems that the tests hang indefinitely, so probably printing thread dumps on 
test timeouts would help. I'm attaching a patch which modifyis Rumen's pom.xml 
by adding a JUnit listener that prints thread dumps. I could not reproduce any 
failures in Rumen tests, tried to use 4 different machines (osx, centos, fedora 
on h/w nodes, and rhel on a VM). Please reproduce the failures in your 
environment one more time and attach Console output of Maven and all Surefire 
logs (not just *-output.txt). Thanks for working on this. 

> [Rumen] Bring back the removed Rumen unit tests
> ---
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
> MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
> MAPREDUCE-3860.patch, 
> org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
> rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809558#comment-13809558
 ] 

Hadoop QA commented on MAPREDUCE-5603:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611157/MAPREDUCE-5603.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4156//console

This message is automatically generated.

> Ability to disable FileInputFormat listLocatedStatus optimization to save 
> client memory
> ---
>
> Key: MAPREDUCE-5603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mrv2
>Affects Versions: 0.23.10, 2.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-5603.patch
>
>
> It would be nice if users had the option to disable the listLocatedStatus 
> optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Attachment: MAPREDUCE-5603.patch

Patch that adds a mapreduce.input.fileinputformat.uselocatedstatus config to 
control whether the listLocatedStatus optimization is enabled.  The property 
defaults to true.

> Ability to disable FileInputFormat listLocatedStatus optimization to save 
> client memory
> ---
>
> Key: MAPREDUCE-5603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mrv2
>Affects Versions: 0.23.10, 2.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-5603.patch
>
>
> It would be nice if users had the option to disable the listLocatedStatus 
> optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Status: Patch Available  (was: Open)

> Ability to disable FileInputFormat listLocatedStatus optimization to save 
> client memory
> ---
>
> Key: MAPREDUCE-5603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mrv2
>Affects Versions: 2.2.0, 0.23.10
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-5603.patch
>
>
> It would be nice if users had the option to disable the listLocatedStatus 
> optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809542#comment-13809542
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Sample OOM backtrace for reference:

{noformat}
Exception in thread "main" java.io.IOException: Failed on local exception:
java.io.IOException: Error reading responses; Host Details : local host is: 
"x/x.x.x.x"; destination host is: ""x":x;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1286)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$1.(DistributedFileSystem.java:418)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:409)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1654)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:225)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:265)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:500)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:492)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:568)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:568)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:844)
at x
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at x
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:764)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:64)
at java.lang.StringBuilder.(StringBuilder.java:97)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:216)
at org.apache.hadoop.hdfs.DeprecatedUTF8.readString(DeprecatedUTF8.java:59)
at 
org.apache.hadoop.hdfs.protocol.DatanodeID.readFields(DatanodeID.java:212)
at 
org.apache.hadoop.hdfs.protocol.DatanodeInfo.readFields(DatanodeInfo.java:389)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlock.readFields(LocatedBlock.java:146)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlocks.readFields(LocatedBlocks.java:223)
at 
org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus.readFields(HdfsLocatedFileStatus.java:87)
at 
org.apache.hadoop.hdfs.protocol.DirectoryListing.readFields(DirectoryListing.java:120)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:833)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:757)
{noformat}

> Ability to disable FileInputFormat listLocatedStatus optimization to save 
> client memory
> --

[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809537#comment-13809537
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Recently we ran across a jobclient that failed with an OOM error once we 
updated the cluster to 0.23.10.  The OOM was triggered by the FileInputFormat 
listLocatedStatus optimization from MAPREDUCE-1981, as the client now caches 
the BlockLocations of all files along with the FileStatus objects it was 
caching before.  Normally the user can bump the heap size of the client to work 
around this issue.  However if a job has an input with a particularly large 
number of BlockLocations, as this job did, it would be nice if the user had the 
option to disable the optimization to reduce the required memory necessary for 
input split calculations.

> Ability to disable FileInputFormat listLocatedStatus optimization to save 
> client memory
> ---
>
> Key: MAPREDUCE-5603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mrv2
>Affects Versions: 0.23.10, 2.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
>
> It would be nice if users had the option to disable the listLocatedStatus 
> optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5603:
-

 Summary: Ability to disable FileInputFormat listLocatedStatus 
optimization to save client memory
 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor


It would be nice if users had the option to disable the listLocatedStatus 
optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809423#comment-13809423
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Or you're saying we would pass the amount of unreserved memory remaining?

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809418#comment-13809418
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Was worried about that as well.  But the fetcher doesn't know whether it's 
going to abandon the request before it sends it.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809055#comment-13809055
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


> Allow configuring the number of threads used to serve shuffle connections
> -
>
> Key: MAPREDUCE-5596
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch
>
>
> MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
> default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809058#comment-13809058
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


> TestUserDefinedCounters.testMapReduceJob is flakey
> --
>
> Key: MAPREDUCE-5598
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: trunk, 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch
>
>
> {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
> We sometimes see it fail:
> {noformat}
> junit.framework.AssertionFailedError
>   at junit.framework.Assert.fail(Assert.java:48)
>   at junit.framework.Assert.assertTrue(Assert.java:20)
>   at junit.framework.Assert.assertTrue(Assert.java:27)
>   at 
> org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
> {noformat}
> Upon investigation, the problem is that the input for the MR job in this test 
> is at {{System.getProperty("test.build.data", "/tmp") + "/input"}}.  If an 
> earlier test wrote some files there, this test will use them as part of its 
> input.  This can cause all sorts of problems with this test because its not 
> expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809036#comment-13809036
 ] 

Todd Lipcon commented on MAPREDUCE-5601:


Good find.

One question: could we improve this even further by having the client send a 
header like "Max-response-size: ", and then have the server avoid doing 
any IO for the case where the client is going to abandon the request anyway? 
Seems like we might be incurring extra seeks in some cases due to the behavior 
you described above. It would be unrelated to this JIRA, just thought of it now.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808999#comment-13808999
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


> TestUserDefinedCounters.testMapReduceJob is flakey
> --
>
> Key: MAPREDUCE-5598
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: trunk, 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch
>
>
> {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
> We sometimes see it fail:
> {noformat}
> junit.framework.AssertionFailedError
>   at junit.framework.Assert.fail(Assert.java:48)
>   at junit.framework.Assert.assertTrue(Assert.java:20)
>   at junit.framework.Assert.assertTrue(Assert.java:27)
>   at 
> org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
> {noformat}
> Upon investigation, the problem is that the input for the MR job in this test 
> is at {{System.getProperty("test.build.data", "/tmp") + "/input"}}.  If an 
> earlier test wrote some files there, this test will use them as part of its 
> input.  This can cause all sorts of problems with this test because its not 
> expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808996#comment-13808996
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


> Allow configuring the number of threads used to serve shuffle connections
> -
>
> Key: MAPREDUCE-5596
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch
>
>
> MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
> default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808963#comment-13808963
 ] 

Hudson commented on MAPREDUCE-5596:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


> Allow configuring the number of threads used to serve shuffle connections
> -
>
> Key: MAPREDUCE-5596
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch
>
>
> MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
> default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808966#comment-13808966
 ] 

Hudson commented on MAPREDUCE-5598:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


> TestUserDefinedCounters.testMapReduceJob is flakey
> --
>
> Key: MAPREDUCE-5598
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: trunk, 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 3.0.0, 2.3.0, 2.2.1
>
> Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch
>
>
> {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
> We sometimes see it fail:
> {noformat}
> junit.framework.AssertionFailedError
>   at junit.framework.Assert.fail(Assert.java:48)
>   at junit.framework.Assert.assertTrue(Assert.java:20)
>   at junit.framework.Assert.assertTrue(Assert.java:27)
>   at 
> org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
> {noformat}
> Upon investigation, the problem is that the input for the MR job in this test 
> is at {{System.getProperty("test.build.data", "/tmp") + "/input"}}.  If an 
> earlier test wrote some files there, this test will use them as part of its 
> input.  This can cause all sorts of problems with this test because its not 
> expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5602) cygwin path error

2013-10-30 Thread Amit Cahanovich (JIRA)
Amit Cahanovich created MAPREDUCE-5602:
--

 Summary: cygwin path error
 Key: MAPREDUCE-5602
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5602
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.6-alpha
 Environment: cygwin
Reporter: Amit Cahanovich


the path for a file is received wrong, due to the fact that code is not taking 
into consideration cigwyn.
/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskLog.java:
  
static final String USERLOGS_DIR_NAME = "userlogs";

the outcome of it is:
 C:\cygwin\home\AMITCA\hadoop-2.0.6-alpha\logs/userlogs is not a valid path



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808859#comment-13808859
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4155//console

This message is automatically generated.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808849#comment-13808849
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611003/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4154//console

This message is automatically generated.

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808844#comment-13808844
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Attached a patch that fixes the problem by only fadvising as DONTNEED if the 
Netty transfer completes successfully.  With the patch applied, the average 
reducer shuffle time for my job goes down from 80 seconds to 34, on par with 
MR1. 

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Status: Patch Available  (was: Open)

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Description: 
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  When this occurs, the 
ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
next time it's asked for, it will definitely be read from disk, even if it 
happened to be in the page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager and noticed that there 
were over four times as many fadvise DONTNEED calls as map-reduce pairs.  
Further logging showed the same map outputs being fetched about this many times.

This is a regression from MR1, which only did the fadvise DONTNEED after all 
the bytes were transferred.

  was:
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  Unfortunately, this has some 
consequences on the server side - it forces unnecessary disk and network IO as 
the server begins to read the output data that will go nowhere.  Also, when the 
channel is closed, it triggers an fadvise DONTNEED that causes the data region 
to be evicted from the OS page cache.  Meaning that the next time it's asked 
for, it will definitely be read from disk, even if it happened to be in the 
page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager noticed that there were 
over four times as many fadvise DONTNEED calls as map-reduce pairs.  Further 
logging showed the same map outputs being fetched about this many times.

The fix would be to reserve space in the reducer before fetching the data.  
Currently the fetching the size of the data and fetching the actual data happen 
in the same HTTP request.  Fixing it would require doing these in separate HTTP 
requests.  Or transferring the sizes through the AM.



> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Summary: ShuffleHandler fadvises file regions as DONTNEED even when fetch 
fails  (was: Fetches when reducer can't fit them result in unnecessary reads on 
shuffle server)

> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> --
>
> Key: MAPREDUCE-5601
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  Unfortunately, this has 
> some consequences on the server side - it forces unnecessary disk and network 
> IO as the server begins to read the output data that will go nowhere.  Also, 
> when the channel is closed, it triggers an fadvise DONTNEED that causes the 
> data region to be evicted from the OS page cache.  Meaning that the next time 
> it's asked for, it will definitely be read from disk, even if it happened to 
> be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager noticed that 
> there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> The fix would be to reserve space in the reducer before fetching the data.  
> Currently the fetching the size of the data and fetching the actual data 
> happen in the same HTTP request.  Fixing it would require doing these in 
> separate HTTP requests.  Or transferring the sizes through the AM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)