[jira] [Commented] (MAPREDUCE-6730) Use StandardCharsets instead of String overload

2016-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405393#comment-15405393
 ] 

ASF GitHub Bot commented on MAPREDUCE-6730:
---

Github user SahilKang commented on the issue:

https://github.com/apache/hadoop/pull/114
  
@aajisaka, this latest commit should fix the two checkstyle warnings, and 
I'll send another patch soon with the analogous changes made to 
org.apache.hadoop.mapred.TextOutputFormat.


> Use StandardCharsets instead of String overload
> ---
>
> Key: MAPREDUCE-6730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6730
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Sahil Kang
>Assignee: Sahil Kang
>Priority: Minor
>
> In TextOutputFormat.java, instead of:
> {code:java}
> private static final String utf8 = "UTF-8";
> private static final byte[] newline;
> static {
>   try {
> newline = "\n".getBytes(utf8);
>   } catch (UnsupportedException uee) {
> threw new IllegalArgumentException("can't find " + utf8 + " encoding");
>   }
> }
> {code}
> Let's do something like:
> {code:java}
> private static final byte[] newline = "\n".getBytes(StandardCharsets.UTF_8);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6745) Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce job finish successfully

2016-08-02 Thread mujunchao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405252#comment-15405252
 ] 

mujunchao commented on MAPREDUCE-6745:
--

We will move the .staging dir away while Job finished or failed. As the Job is 
not alive, i think no need to keep the .staging dir at that time.

> Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging 
> after MapReduce job finish successfully
> -
>
> Key: MAPREDUCE-6745
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6745
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.7.2
> Environment: Suse 11 sp3
>Reporter: liuxiaoping
>Priority: Blocker
>
> If MapReduce client set mapreduce.task.files.preserve.failedtasks=true, 
> temporary job directory will not be deleted in staging directory 
> /tmp/hadoop-yarn/staging.
> As time goes by, the job files are more and more, eventually lead to below 
> exeception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemExceededException):
> The directory item limit of /tmp/hadoop-yarn/staging/username/.staging is 
> exceeded: limit=1048576 items=1048576
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:936)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:981)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:97)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3788)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:986)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:624)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolProtos.$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:624)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
>   at java.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
>   
>   
> The official instructions for the configuration 
> mapreduce.task.files.preserve.failedtasks is below:
> Should the files for failed tasks be kept. This should only be used on 
> jobs that are failing, because the storage is never reclaimed. 
> It also prevents the map outputs from being erased from the reduce 
> directory as they are consumed.
>   
> According to the instructions, I think the temporary files for successful 
> tasks shouldn't be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6745) Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce job finish successfully

2016-08-02 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405195#comment-15405195
 ] 

Akira Ajisaka commented on MAPREDUCE-6745:
--

However, the document is confusing for me. I'd like to add a parameter 
"mapreduce.tasks.files.preserve.failedjobs" for keep the .staging dir only for 
the failing jobs. What do you think?

> Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging 
> after MapReduce job finish successfully
> -
>
> Key: MAPREDUCE-6745
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6745
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.7.2
> Environment: Suse 11 sp3
>Reporter: liuxiaoping
>Priority: Blocker
>
> If MapReduce client set mapreduce.task.files.preserve.failedtasks=true, 
> temporary job directory will not be deleted in staging directory 
> /tmp/hadoop-yarn/staging.
> As time goes by, the job files are more and more, eventually lead to below 
> exeception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemExceededException):
> The directory item limit of /tmp/hadoop-yarn/staging/username/.staging is 
> exceeded: limit=1048576 items=1048576
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:936)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:981)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:97)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3788)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:986)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:624)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolProtos.$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:624)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
>   at java.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
>   
>   
> The official instructions for the configuration 
> mapreduce.task.files.preserve.failedtasks is below:
> Should the files for failed tasks be kept. This should only be used on 
> jobs that are failing, because the storage is never reclaimed. 
> It also prevents the map outputs from being erased from the reduce 
> directory as they are consumed.
>   
> According to the instructions, I think the temporary files for successful 
> tasks shouldn't be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6745) Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce job finish successfully

2016-08-02 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405191#comment-15405191
 ] 

Akira Ajisaka commented on MAPREDUCE-6745:
--

Probably MAPREDUCE-6607 is related. As I commented 
[there|https://issues.apache.org/jira/browse/MAPREDUCE-6607?focusedCommentId=15140967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15140967],
 if the parameter is set, the files for failed tasks are kept. However, the 
files in .staging can be used for all tasks, so it's difficult to search what 
is the files in .staging of the failed tasks. That's why if the parameter is 
set, all the files in .staging are preserved for now.
Therefore you need to set the parameter to true only for the failing jobs to 
avoid the issue. I'm thinking this is what the document want to say.

> Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging 
> after MapReduce job finish successfully
> -
>
> Key: MAPREDUCE-6745
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6745
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.7.2
> Environment: Suse 11 sp3
>Reporter: liuxiaoping
>Priority: Blocker
>
> If MapReduce client set mapreduce.task.files.preserve.failedtasks=true, 
> temporary job directory will not be deleted in staging directory 
> /tmp/hadoop-yarn/staging.
> As time goes by, the job files are more and more, eventually lead to below 
> exeception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemExceededException):
> The directory item limit of /tmp/hadoop-yarn/staging/username/.staging is 
> exceeded: limit=1048576 items=1048576
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:936)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:981)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:97)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3788)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:986)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:624)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolProtos.$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:624)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
>   at java.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
>   
>   
> The official instructions for the configuration 
> mapreduce.task.files.preserve.failedtasks is below:
> Should the files for failed tasks be kept. This should only be used on 
> jobs that are failing, because the storage is never reclaimed. 
> It also prevents the map outputs from being erased from the reduce 
> directory as they are consumed.
>   
> According to the instructions, I think the temporary files for successful 
> tasks shouldn't be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403531#comment-15403531
 ] 

Hadoop QA commented on MAPREDUCE-6310:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s 
{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
29s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-mapreduce-project/hadoop-mapreduce-client hadoop-mapreduce-project 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 50s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3549 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 4m 1s 
{color} | {color:red} The patch 8 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 15s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-mapreduce-project/hadoop-mapreduce-client hadoop-mapreduce-project 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 140m 15s 
{color} | {color:red} hadoop-mapreduce-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s 
{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 114m 28s 
{color} | {color:green} hadoop-mapreduce

[jira] [Commented] (MAPREDUCE-6724) Single shuffle to memory must not exceed Integer#MAX_VALUE

2016-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403505#comment-15403505
 ] 

Hudson commented on MAPREDUCE-6724:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10190 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10190/])
MAPREDUCE-6724. Single shuffle to memory must not exceed (gera: rev 
6890d5b472320fa7592ed1b08b623c55a27089c6)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java


> Single shuffle to memory must not exceed Integer#MAX_VALUE
> --
>
> Key: MAPREDUCE-6724
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6724
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6724.009.patch, mapreduce6724.001.patch, 
> mapreduce6724.002.patch, mapreduce6724.003.patch, mapreduce6724.004.patch, 
> mapreduce6724.005.patch, mapreduce6724.006.patch, mapreduce6724.007.patch, 
> mapreduce6724.008.patch
>
>
> When shuffle is done in memory, MergeManagerImpl converts the requested size 
> to an int to allocate an instance of InMemoryMapOutput. This results in an 
> overflow if the requested size is bigger than Integer.MAX_VALUE and 
> eventually causes the reducer to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org