staging after MapReduce job finish successfully

Akira Ajisaka (JIRA) Tue, 02 Aug 2016 19:31:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405195#comment-15405195
 ]


Akira Ajisaka commented on MAPREDUCE-6745:
------------------------------------------

However, the document is confusing for me. I'd like to add a parameter 
"mapreduce.tasks.files.preserve.failedjobs" for keep the .staging dir only for 
the failing jobs. What do you think?

> Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging 
> after MapReduce job finish successfully
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6745
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6745
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.7.2
>         Environment: Suse 11 sp3
>            Reporter: liuxiaoping
>            Priority: Blocker
>
> If MapReduce client set mapreduce.task.files.preserve.failedtasks=true, 
> temporary job directory will not be deleted in staging directory 
> /tmp/hadoop-yarn/staging.
> As time goes by, the job files are more and more, eventually lead to below 
> exeception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemExceededException):
> The directory item limit of /tmp/hadoop-yarn/staging/username/.staging is 
> exceeded: limit=1048576 items=1048576
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:936)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:981)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:97)
>               at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3788)
>               at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:986)
>               at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:624)
>               at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolProtos.$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:624)
>               at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
>               at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
>               at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
>               at java.security.auth.Subject.doAs(Subject.java:422)
>               at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
>               at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
>               
>               
> The official instructions for the configuration 
> mapreduce.task.files.preserve.failedtasks is below:
>     Should the files for failed tasks be kept. This should only be used on 
> jobs that are failing, because the storage is never reclaimed. 
>     It also prevents the map outputs from being erased from the reduce 
> directory as they are consumed.
>       
> According to the instructions, I think the temporary files for successful 
> tasks shouldn't be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6745) Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce job finish successfully

Reply via email to