[jira] [Created] (MAPREDUCE-6193) Hadoop 2.x MapReduce Job Counter Data Local Maps Lower than Hadoop 1.x

2014-12-12 Thread Xu Chen (JIRA)
Xu Chen created MAPREDUCE-6193:
--

 Summary: Hadoop 2.x MapReduce Job Counter Data Local Maps Lower 
than Hadoop 1.x
 Key: MAPREDUCE-6193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.0
Reporter: Xu Chen
Assignee: Xu Chen


I run the MapReduce job at 400+ node of cluster based on Hadoop 2.4.0 confined 
scheduer is FairScheduer and I noticed Job Counter of Data-Local is much lower 
than Hadoop 1.x 

Such as these situations:
Hadoop 1.x Data-local 99% Rack-Local 1%
Hadoop 2.4.0 Data-Local 75% Rack-Local 25%

So I looked up the source code of Hadoop 2.4.0 MRAppMaster and 
YARN-FairScheduer,there are some situations that may lead to this kind of 
problem.

We know MRAppMaster builds the Map of 
Priority->ResourceNamer->Capacity->RemoteRequest->NumContainer

Too many containers are assigned to MRAppMaster from FairScheduler 

MRAppMaster addContainerReq() and assignContainer() have changed NumContainer 
which will send RemoteRequest to FairScheduler, and the FairScheduler will 
reset value of NumContainer by the MRAppMaster’s heartbeat, but FairScheduler 
set NumContainer itself when handle NODE_UPDATE event ,  So if the heartbeat of 
MRAppMaster’s NumContainer next time is bigger than FairScheduler’s 
NumContainer,the extra container is redundant for MRAppMaster,and MRAppMaster 
will assign this container to Rack-Local because no task is needed on this 
container’s host now

Besides, when one task requires more than one host, it will also cause this 
problem.

So the conclusion is the FairScheduler’s NumConainer is reset by MRAppMaster’s 
heartbeat and handle NODE_UPDATE event , both of MRAppMaster’s and NODE_UPDATE 
are async logic 


I found properties of FairScheduler’s config there are 
yarn.scheduler.fair.locality.threshold.node,
yarn.scheduler.fair.locality.threshold.rack

and I’m confused that FairScheuler assignContainer() should be invoked  
app.addSchedulingOpportunity(priority)  after NODE_LOCAL assigned logic . but 
now is opposite ,
means the application have chance to assign a container is opportunity will 
increment , and when the application missed node of NODE_LOCAL opportunity is 
great than locality.threshold.node most time ,so those properties is useless 
for me .
——
And if AppMaster sends no RemoteRequest.ANY at the same priority request , the 
Scheudler will get NPE ,and the ResourceManager will exit immediately 

see this

public synchronized int getTotalRequiredResources(Priority) {
return getResourceRequest(priority,RMNode.ANY).getNumContainers();
}

Anyone has ideas for those issues please comment.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2014-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245131#comment-14245131
 ] 

Hadoop QA commented on MAPREDUCE-4879:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687005/MAPREDUCE-4879.004.patch
  against trunk revision 0e37bbc.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-examples.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5078//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5078//console

This message is automatically generated.

> TeraOutputFormat may overwrite an existing output directory
> ---
>
> Key: MAPREDUCE-4879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 1.2.1, 2.6.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-4879-trunk-rev1.patch, 
> MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch, MAPREDUCE-4879.004.patch
>
>
> Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
> from writing into an existing directory, and potentially overwriting previous 
> runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2014-12-12 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-4879:
-
Attachment: MAPREDUCE-4879.004.patch

Fixing javac warnings in 004.

> TeraOutputFormat may overwrite an existing output directory
> ---
>
> Key: MAPREDUCE-4879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 1.2.1, 2.6.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-4879-trunk-rev1.patch, 
> MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch, MAPREDUCE-4879.004.patch
>
>
> Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
> from writing into an existing directory, and potentially overwriting previous 
> runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6133) Rumen is not generating json for .hist file

2014-12-12 Thread Justin Kestelyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Kestelyn updated MAPREDUCE-6133:
---
Assignee: (was: Justin Kestelyn)

> Rumen is not generating json for .hist file
> ---
>
> Key: MAPREDUCE-6133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6133
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 2.4.1
>Reporter: ARPAN BHANDARI
> Attachments: 
> job_1413468876912_0002-1413563166476-impadmin-word+count-1413563185844-1-1-SUCCEEDED-default-1413563171610.jhist
>
>
> Rumen is creating a blank json for .hist file.Please find below command that 
> is being run :
> java -cp 
> hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar:hadoop-2.4.1/share/hadoop/tools/lib/hadoop-rumen-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-cli-1.2.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-lang-2.6.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/tools/lib/guava-11.0.2.jar:hadoop-2.4.1/share/hadoop/tools/lib/commons-collections-3.2.1.jar:hadoop-2.4.1/share/hadoop/common/lib/hadoop-auth-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/slf4j-api-1.7.5.jar:hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar
>  org.apache.hadoop.tools.rumen.TraceBuilder file://job-trace.json 
> file://topology  
> file://job_1413468876912_0002-1413563166476-impadmin-word+count-1413563185844-1-1-SUCCEEDED-default-1413563171610.jhist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2014-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244708#comment-14244708
 ] 

Hadoop QA commented on MAPREDUCE-4879:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12686922/MAPREDUCE-4879.003.patch
  against trunk revision 3681de2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1222 javac 
compiler warnings (more than the trunk's current 1221 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-examples.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//artifact/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//console

This message is automatically generated.

> TeraOutputFormat may overwrite an existing output directory
> ---
>
> Key: MAPREDUCE-4879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 1.2.1, 2.6.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-4879-trunk-rev1.patch, 
> MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch
>
>
> Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
> from writing into an existing directory, and potentially overwriting previous 
> runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml

2014-12-12 Thread Ray Chiang (JIRA)
Ray Chiang created MAPREDUCE-6192:
-

 Summary: Create unit test to automatically compare MR related 
classes and mapred-default.xml
 Key: MAPREDUCE-6192
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor


Create a unit test that will automatically compare the fields in the various 
MapReduce related classes and mapred-default.xml. It should throw an error if a 
property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2014-12-12 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-4879:
-
 Target Version/s: 2.7.0
Affects Version/s: (was: trunk)
   1.2.1
   2.6.0
   Status: Patch Available  (was: Open)

> TeraOutputFormat may overwrite an existing output directory
> ---
>
> Key: MAPREDUCE-4879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 2.6.0, 1.2.1
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-4879-trunk-rev1.patch, 
> MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch
>
>
> Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
> from writing into an existing directory, and potentially overwriting previous 
> runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2014-12-12 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-4879:
-
Attachment: MAPREDUCE-4879.003.patch

Thanks for picking up the review [~cnauroth]!  Please check 
MAPREDUCE-4879.003.patch

> TeraOutputFormat may overwrite an existing output directory
> ---
>
> Key: MAPREDUCE-4879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: trunk
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-4879-trunk-rev1.patch, 
> MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch
>
>
> Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
> from writing into an existing directory, and potentially overwriting previous 
> runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6149) Document override log4j.properties in MR job

2014-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244428#comment-14244428
 ] 

Hadoop QA commented on MAPREDUCE-6149:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686862/MAPREDUCE-6149.patch
  against trunk revision bda748a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 13 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//console

This message is automatically generated.

> Document override log4j.properties in MR job
> 
>
> Key: MAPREDUCE-6149
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6149.patch
>
>
> This new feature comes from MAPREDUCE-6052, some documentation requirements 
> from Vinod below:
> Document the new config in mapred-default.xml
> Mention in that documentation that if no-scheme is given in the path, it 
> defaults to a log4j file on the local FS.
> Modify the documentation of log-level configs to say that if you override 
> to have your own log4j.properties file, the log-level configs may not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6149) Document override log4j.properties in MR job

2014-12-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6149:
--
Status: Patch Available  (was: Open)

> Document override log4j.properties in MR job
> 
>
> Key: MAPREDUCE-6149
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6149.patch
>
>
> This new feature comes from MAPREDUCE-6052, some documentation requirements 
> from Vinod below:
> Document the new config in mapred-default.xml
> Mention in that documentation that if no-scheme is given in the path, it 
> defaults to a log4j file on the local FS.
> Modify the documentation of log-level configs to say that if you override 
> to have your own log4j.properties file, the log-level configs may not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6149) Document override log4j.properties in MR job

2014-12-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6149:
--
Attachment: MAPREDUCE-6149.patch

> Document override log4j.properties in MR job
> 
>
> Key: MAPREDUCE-6149
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6149.patch
>
>
> This new feature comes from MAPREDUCE-6052, some documentation requirements 
> from Vinod below:
> Document the new config in mapred-default.xml
> Mention in that documentation that if no-scheme is given in the path, it 
> defaults to a log4j file on the local FS.
> Modify the documentation of log-level configs to say that if you override 
> to have your own log4j.properties file, the log-level configs may not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244275#comment-14244275
 ] 

Hudson commented on MAPREDUCE-6046:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1990 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1990/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
* hadoop-mapreduce-project/CHANGES.txt


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244248#comment-14244248
 ] 

Hudson commented on MAPREDUCE-6046:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #40 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/40/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244205#comment-14244205
 ] 

Hudson commented on MAPREDUCE-6046:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #36 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/36/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244196#comment-14244196
 ] 

Hudson commented on MAPREDUCE-6046:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1970 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1970/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

2014-12-12 Thread Ben Roling (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244169#comment-14244169
 ] 

Ben Roling commented on MAPREDUCE-5767:
---

Thanks for the feedback [~tomdeleu] -- I'm glad you found the post helpful.  It 
was quite tedious to come to root cause when we encountered the issue and I 
figured there would probably be some others that ran into it.

> Data corruption when single value exceeds map buffer size (io.sort.mb)
> --
>
> Key: MAPREDUCE-5767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 0.20.1
>Reporter: Ben Roling
>
> There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause 
> data corruption when the size of a single value produced by the mapper 
> exceeds the size of the map output buffer (roughly io.sort.mb).
> I experienced this issue in CDH4.2.1, but am logging the issue here for 
> greater visibility in case anyone else might run across the issue.
> The issue does not exist in 0.21 and beyond due to the implementation of 
> MAPREDUCE-64.  That JIRA significantly changes the way the map output 
> buffering is done and it looks like the issue has been resolved by those 
> changes.
> I expect this bug will likely be closed / won't fix due to the fact that 0.20 
> is obsolete.  As stated previously, I am just logging this issue for 
> visibility in case anyone else is still running something based on 0.20 and 
> encounters the same problem.
> In my situation the issue manifested as an ArrayIndexOutOfBoundsException in 
> the reduce phase when deserializing a key -- causing the job to fail.  
> However, I think the problem could manifest in a more dangerous fashion where 
> the affected job succeeds, but produces corrupt output.  The stack trace I 
> saw was:
> 2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.ArrayIndexOutOfBoundsException: 24
>   at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>   at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>   at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>   at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
>   at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86)
>   at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> The problem appears to me to be in 
> org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, 
> int).  The sequence of events that leads up to the issue is:
> * some complete records (cumulative size less than total buffer size) written 
> to buffer
> * large (over io.sort.mb) record starts writing
> * [soft buffer limit 
> exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030]
>  - spill starts
> * write of large record continues
> * buffer becomes 
> [full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
> * 
> [wrap|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1013]
>  evaluates to true, suggesting the buffer can be safely wrapped
> * writing the large record continues until 

[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

2014-12-12 Thread Tom De Leu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244062#comment-14244062
 ] 

Tom De Leu commented on MAPREDUCE-5767:
---

Thanks a lot for this analysis! We had the exact same problem at work , running 
CDH3.5, with Crunch 0.8.4 and Avro 1.7.7.

It's only after trying a couple of days to find the cause of our problem, and 
not finding it, that I came across this issue via a lucky Google search.
I can confirm that increasing *io.sort.mb* solved our problem.

Thank you for saving us probably weeks of investigation :) 

> Data corruption when single value exceeds map buffer size (io.sort.mb)
> --
>
> Key: MAPREDUCE-5767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 0.20.1
>Reporter: Ben Roling
>
> There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause 
> data corruption when the size of a single value produced by the mapper 
> exceeds the size of the map output buffer (roughly io.sort.mb).
> I experienced this issue in CDH4.2.1, but am logging the issue here for 
> greater visibility in case anyone else might run across the issue.
> The issue does not exist in 0.21 and beyond due to the implementation of 
> MAPREDUCE-64.  That JIRA significantly changes the way the map output 
> buffering is done and it looks like the issue has been resolved by those 
> changes.
> I expect this bug will likely be closed / won't fix due to the fact that 0.20 
> is obsolete.  As stated previously, I am just logging this issue for 
> visibility in case anyone else is still running something based on 0.20 and 
> encounters the same problem.
> In my situation the issue manifested as an ArrayIndexOutOfBoundsException in 
> the reduce phase when deserializing a key -- causing the job to fail.  
> However, I think the problem could manifest in a more dangerous fashion where 
> the affected job succeeds, but produces corrupt output.  The stack trace I 
> saw was:
> 2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.ArrayIndexOutOfBoundsException: 24
>   at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>   at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>   at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>   at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
>   at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86)
>   at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> The problem appears to me to be in 
> org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, 
> int).  The sequence of events that leads up to the issue is:
> * some complete records (cumulative size less than total buffer size) written 
> to buffer
> * large (over io.sort.mb) record starts writing
> * [soft buffer limit 
> exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030]
>  - spill starts
> * write of large record continues
> * buffer becomes 
> [full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
> * 
> [wrap|https://github.com/apache/hadoop-common/blob/re

[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244026#comment-14244026
 ] 

Hudson commented on MAPREDUCE-6046:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #773 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/773/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
* hadoop-mapreduce-project/CHANGES.txt


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java

2014-12-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243970#comment-14243970
 ] 

Hudson commented on MAPREDUCE-6046:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #38 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/38/])
MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 
0bd022911013629a8c9e7357fae8cf4399d7a1e3)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java


> Change the class name for logs in RMCommunicator.java
> -
>
> Key: MAPREDUCE-6046
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0
>Reporter: Devaraj K
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-6046-01.patch
>
>
> It is little confusing when the logs gets generated with the class name as 
> RMContainerAllocator and not present in RMContainerAllocator.java.
> {code:title=RMCommunicator.java|borderStyle=solid}
>   private static final Log LOG = 
> LogFactory.getLog(RMContainerAllocator.class);
> {code}
> In the above RMContainerAllocator.class needs to be changed to 
> RMCommunicator.class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)