Hi all, I'm new to YARN and trying to have YARN download the Samza job tarball ( https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html). >From the log, it seems that the download failed. I've tested that the file is available via curl. The error message is: org/apache/samza/util/Logging
I appreciate any suggestions. Roger 2015-03-24 17:13:05,469 INFO [Socket Reader #1 for port 33749] ipc.Server (Server.java:saslProcess(1294)) - Auth successful for appattempt_1427226422217_0005_000002 (auth:SIMPLE) 2015-03-24 17:13:05,473 INFO [IPC Server handler 15 on 33749] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(572)) - Start request for container_1427226422217_0005_02_000001 by user opintel 2015-03-24 17:13:05,473 INFO [IPC Server handler 15 on 33749] nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=opintel IP=10.53.152.54 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1427226422217_0005 CONTAINERID=container_1427226422217_0005_02_000001 2015-03-24 17:13:05,473 INFO [AsyncDispatcher event handler] application.Application (ApplicationImpl.java:transition(296)) - Adding container_1427226422217_0005_02_000001 to application application_1427226422217_0005 2015-03-24 17:13:05,474 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(884)) - Container container_1427226422217_0005_02_000001 transitioned from NEW to LOCALIZING 2015-03-24 17:13:05,474 INFO [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:handle(175)) - Got event CONTAINER_INIT for appId application_1427226422217_0005 2015-03-24 17:13:05,475 INFO [AsyncDispatcher event handler] localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz transitioned from INIT to DOWNLOADING 2015-03-24 17:13:05,475 INFO [AsyncDispatcher event handler] localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(596)) - Created localizer for container_1427226422217_0005_02_000001 2015-03-24 17:13:05,480 INFO [LocalizerRunner for container_1427226422217_0005_02_000001] localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1029)) - Writing credentials to the nmPrivate file /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_000001.tokens. Credentials list: 2015-03-24 17:13:05,481 INFO [LocalizerRunner for container_1427226422217_0005_02_000001] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(469)) - Initializing user opintel 2015-03-24 17:13:05,492 INFO [LocalizerRunner for container_1427226422217_0005_02_000001] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(103)) - Copying from /tmp/hadoop-opintel/nm-local-dir/nmPrivate/container_1427226422217_0005_02_000001.tokens to /tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005/container_1427226422217_0005_02_000001.tokens 2015-03-24 17:13:05,492 INFO [LocalizerRunner for container_1427226422217_0005_02_000001] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set to /tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005 = file:/tmp/hadoop-opintel/nm-local-dir/usercache/opintel/appcache/application_1427226422217_0005 2015-03-24 17:13:05,520 INFO [IPC Server handler 4 on 8040] localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(928)) - DEBUG: FAILED { http:// somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz, 0, ARCHIVE, null }, org/apache/samza/util/Logging 2015-03-24 17:13:05,520 INFO [IPC Server handler 4 on 8040] localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz transitioned from DOWNLOADING to FAILED 2015-03-24 17:13:05,520 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(884)) - Container container_1427226422217_0005_02_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED 2015-03-24 17:13:05,521 INFO [AsyncDispatcher event handler] localizer.LocalResourcesTrackerImpl (LocalResourcesTrackerImpl.java:handle(137)) - Container container_1427226422217_0005_02_000001 sent RELEASE event on a resource request { http://somehost.fake.com/samza/web-log-0.0.1-dist.tar.gz, 0, ARCHIVE, null } not present in cache. 2015-03-24 17:13:05,521 WARN [LocalizerRunner for container_1427226422217_0005_02_000001] ipc.Client (Client.java:call(1388)) - interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1029) at org.apache.hadoop.ipc.Client.call(Client.java:1383) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy31.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:995) java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1389) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy31.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:995) Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1029) at org.apache.hadoop.ipc.Client.call(Client.java:1383) ... 8 more 2015-03-24 17:13:05,521 WARN [AsyncDispatcher event handler] nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=opintel OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED APPID=application_1427226422217_0005 CONTAINERID=container_1427226422217_0005_02_000001