[ 
https://issues.apache.org/jira/browse/OOZIE-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617206#comment-16617206
 ] 

Clemens Valiente commented on OOZIE-2536:
-----------------------------------------

[~satishsaley] thanks for this patch, we were running into the same issue and 
backporting it to our oozie sharelib fixed the issue.

we're running into another issue that looks very similar though:
{code:java}
2018-09-15 00:31:09,416 [AsyncDispatcher event handler] INFO  
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl  - Num completed Tasks: 1
2018-09-15 00:31:09,417 [AsyncDispatcher event handler] INFO  
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl  - 
job_1535972033593_259806Job Transitioned from RUNNING to COMMITTING
2018-09-15 00:31:09,419 [CommitterEvent Processor #1] INFO  
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler  - Processing 
the event EventType: JOB_COMMIT
2018-09-15 00:31:09,455 [uber-SubtaskRunner] WARN  
org.apache.hadoop.mapred.LocalContainerLauncher  - Unable to delete unexpected 
local file/dir .action.xml.crc: insufficient permissions?
2018-09-15 00:31:09,455 [uber-SubtaskRunner] WARN  
org.apache.hadoop.mapred.LocalContainerLauncher  - Unable to delete unexpected 
local file/dir .action.xml.crc: insufficient permissions?
2018-09-15 00:31:09,459 [CommitterEvent Processor #1] FATAL 
org.apache.hadoop.conf.Configuration  - error parsing conf sqoop-site.xml
java.io.FileNotFoundException: 
/appdata/hdfs/v7/yarn/nm/usercache/SEM/appcache/application_1535972033593_259806/container_e100_1535972033593_259806_01_000001/sqoop-site.xml
 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at 
sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
        at 
sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2483)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2554)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2507)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2413)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:984)
        at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1034)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1254)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
        at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.touchz(CommitterEventHandler.java:268)
        at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:282)
        at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745){code}
looks like the same race condition can possibly happen with the sqoop-site.xml. 
I will investigate if the same approach as your patch would also fix this.

> Hadoop's cleanup of local directory in uber mode causing failures
> -----------------------------------------------------------------
>
>                 Key: OOZIE-2536
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2536
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2536-1.patch
>
>
> In out environment, we faced an issue where uberized Shell action was getting 
> stuck even though the shell action got completed with status 0. Please refer 
> the attached syslog and stdout if launcher job, here I point out partially
> stdout :
> {quote}
> >>> Invoking Shell command line now >>
> Stdoutput myshellType=qmyshellUpdate
> Exit code of the Shell command 0
> <<< Invocation of Shell command completed <<<
> <<< Invocation of Main class completed <<<
> {quote} 
> syslog
> {quote}
> 2016-05-23 11:15:52,587 WARN [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Unable to delete unexpected 
> local file/dir .action.xml.crc: insufficient permissions?
> 2016-05-23 11:15:52,588 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.conf.Configuration: error parsing conf propagation-conf.xml
> java.io.FileNotFoundException: 
> /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:15:52,590 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /grid/5/tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2639)
>     at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     ... 22 more
> 2016-05-23 11:15:52,591 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> 2016-05-23 11:15:52,591 ERROR [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[AsyncDispatcher ShutDown handler,5,main] threw an Exception.
> java.lang.SecurityException: Intercepted System.exit(-1)
>     at 
> org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:637)
>     at java.lang.Runtime.exit(Runtime.java:107)
>     at java.lang.System.exit(System.java:971)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$2.run(AsyncDispatcher.java:294)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:16:44,589 WARN [LeaseRenewer:sa...@namenode.com:8020] 
> org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final 
> parameter: hadoop.tmp.dir;  Ignoring.
> 2016-05-23 11:20:53,677 INFO [Socket Reader #2 for port 50500] 
> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for saley 
> (auth:SIMPLE)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to