Hi, When I run my application only on the local nodemanager, everything works fine. But when I try to start it on multiple nodes, it fails. Looking in the nodemanager logs I could find a possible cause for the error:
2014-06-21 10:24:05,118 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml does not exist This path is only valid for the local machine where Twill generates the folder and files, but not for the remote machine. Why isn't that file copied to HDFS for distribution? Is there anything I could change, so the file gets delivered together with everything else? - I'm using Apache Twill 0.3.0-snapshot. And Hadoop 2.3.0 as well as Hadoop 2.3.0 libraries in my application Thank you The complete log file can be found formatted here: https://gist.github.com/pgrm/68d07084b1e2cb9e2ce4 And is also below here: 2014-06-21 10:24:03,729 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1403345039835_0001_01_000009 2014-06-21 10:24:04,961 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1403345039835_0001_01_000010 by user ubuntu 2014-06-21 10:24:04,961 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu IP=10.216.60.23 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1403345039835_0001 CONTAINERID=container_1403345039835_0001_01_000010 2014-06-21 10:24:04,961 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1403345039835_0001_01_000010 to application application_1403345039835_0001 2014-06-21 10:24:04,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1403345039835_0001_01_000010 transitioned from NEW to LOCALIZING 2014-06-21 10:24:04,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1403345039835_0001 2014-06-21 10:24:04,966 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml transitioned from INIT to DOWNLOADING 2014-06-21 10:24:04,967 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1403345039835_0001_01_000010 2014-06-21 10:24:04,977 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens. Credentials list: 2014-06-21 10:24:04,980 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user ubuntu 2014-06-21 10:24:05,050 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens to /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010.tokens 2014-06-21 10:24:05,050 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001 = file:/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001 2014-06-21 10:24:05,118 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml does not exist 2014-06-21 10:24:05,120 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml, 1403346214000, FILE, null }, File file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml does not exist 2014-06-21 10:24:05,121 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml transitioned from DOWNLOADING to FAILED 2014-06-21 10:24:05,121 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1403345039835_0001_01_000010 transitioned from LOCALIZING to LOCALIZATION_FAILED 2014-06-21 10:24:05,121 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1403345039835_0001_01_000010 sent RELEASE event on a resource request { file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml, 1403346214000, FILE, null } not present in cache. 2014-06-21 10:24:05,122 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1025) at org.apache.hadoop.ipc.Client.call(Client.java:1379) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy25.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981) 2014-06-21 10:24:05,122 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED APPID=application_1403345039835_0001 CONTAINERID=container_1403345039835_0001_01_000010 2014-06-21 10:24:05,122 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Unknown localizer with localizerId container_1403345039835_0001_01_000010 is sending heartbeat. Ordering it to DIE 2014-06-21 10:24:05,122 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1403345039835_0001_01_000010 transitioned from LOCALIZATION_FAILED to DONE 2014-06-21 10:24:05,122 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010 2014-06-21 10:24:05,123 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete returned false for path: [/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010] 2014-06-21 10:24:05,123 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1403345039835_0001_01_000010 from application application_1403345039835_0001 2014-06-21 10:24:05,123 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1403345039835_0001 2014-06-21 10:24:05,968 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1403345039835_0001_01_000010 2014-06-21 10:24:06,730 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1403345039835_0001_01_000010
