Hi, I fixed that problem by putting all the Spark JARS in spark-archive.zip and putting it in the HDFS (as that problem was happening for that reason) -
But, I'm facing a new issue now, this is the new RPC error I get (Stack-Trace below) - 2018-06-08 14:26:43 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-06-08 14:26:45 INFO SparkContext:54 - Running Spark version 2.3.0 2018-06-08 14:26:45 INFO SparkContext:54 - Submitted application: EndToEnd_FeatureEngineeringPipeline 2018-06-08 14:26:45 INFO SecurityManager:54 - Changing view acls to: bblite 2018-06-08 14:26:45 INFO SecurityManager:54 - Changing modify acls to: bblite 2018-06-08 14:26:45 INFO SecurityManager:54 - Changing view acls groups to: 2018-06-08 14:26:45 INFO SecurityManager:54 - Changing modify acls groups to: 2018-06-08 14:26:45 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bblite); groups with view permissions: Set(); users with modify permissions: Set(bblite); groups with modify permissions: Set() 2018-06-08 14:26:45 INFO Utils:54 - Successfully started service 'sparkDriver' on port 41957. 2018-06-08 14:26:45 INFO SparkEnv:54 - Registering MapOutputTracker 2018-06-08 14:26:45 INFO SparkEnv:54 - Registering BlockManagerMaster 2018-06-08 14:26:45 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2018-06-08 14:26:45 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2018-06-08 14:26:45 INFO DiskBlockManager:54 - Created local directory at /appdata/spark/tmp/blockmgr-7b035871-a1f7-47ff-aad8-f7a43367836e 2018-06-08 14:26:45 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB 2018-06-08 14:26:45 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2018-06-08 14:26:45 INFO log:192 - Logging initialized @3659ms 2018-06-08 14:26:45 INFO Server:346 - jetty-9.3.z-SNAPSHOT 2018-06-08 14:26:45 INFO Server:414 - Started @3733ms 2018-06-08 14:26:45 INFO AbstractConnector:278 - Started ServerConnector@3080efb7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-06-08 14:26:45 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c3409b5{/jobs,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f1ba569{/jobs/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@493631a1{/jobs/job,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b12f33c{/jobs/job/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@490023da{/stages,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c3a862{/stages/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4da2454f{/stages/stage,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@552f182d {/stages/stage/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a78a7fa{/stages/pool,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15142105 {/stages/pool/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7589c977{/storage,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@584a599b{/storage/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1742621f{/storage/rdd,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23ea75fb {/storage/rdd/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1813d280{/environment,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@129fc698 {/environment/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c91c4e{/executors,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@667ce6c1 {/executors/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60fdbf5c {/executors/threadDump,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c3a1edd {/executors/threadDump/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@52cf5878{/static,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b7c7cff{/,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7691ad8{/api,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bb96483{/jobs/job/kill,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24a994f7 {/stages/stage/kill,null,AVAILABLE,@Spark} 2018-06-08 14:26:45 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://:4040 2018-06-08 14:26:46 INFO RMProxy:98 - Connecting to ResourceManager at / 192.168.49.37:8032 2018-06-08 14:26:46 INFO Client:54 - Requesting a new application from cluster with 4 NodeManagers 2018-06-08 14:26:46 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 2018-06-08 14:26:46 INFO Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead 2018-06-08 14:26:46 INFO Client:54 - Setting up container launch context for our AM 2018-06-08 14:26:46 INFO Client:54 - Setting up the launch environment for our AM container 2018-06-08 14:26:46 INFO Client:54 - Preparing resources for our AM container 2018-06-08 14:26:48 INFO Client:54 - Source and destination file systems are the same. Not copying hdfs:/spark-jars.zip 2018-06-08 14:26:48 INFO Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs:// 192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/pyspark.zip 2018-06-08 14:26:48 INFO Client:54 - Uploading resource file:/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip -> hdfs:// 192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/py4j-0.10.6-src.zip 2018-06-08 14:26:48 INFO Client:54 - Uploading resource file:/appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3-f3ef0926d3ab/__spark_conf__4300362365336835927.zip -> hdfs:// 192.168.49.37:9000/user/bblite/.sparkStaging/application_1528296308262_0017/__spark_conf__.zip 2018-06-08 14:26:48 INFO SecurityManager:54 - Changing view acls to: bblite 2018-06-08 14:26:48 INFO SecurityManager:54 - Changing modify acls to: bblite 2018-06-08 14:26:48 INFO SecurityManager:54 - Changing view acls groups to: 2018-06-08 14:26:48 INFO SecurityManager:54 - Changing modify acls groups to: 2018-06-08 14:26:48 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bblite); groups with view permissions: Set(); users with modify permissions: Set(bblite); groups with modify permissions: Set() 2018-06-08 14:26:48 INFO Client:54 - Submitting application application_1528296308262_0017 to ResourceManager 2018-06-08 14:26:48 INFO YarnClientImpl:273 - Submitted application application_1528296308262_0017 2018-06-08 14:26:48 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1528296308262_0017 and attemptId None 2018-06-08 14:26:49 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:49 INFO Client:54 - client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1528448208475 final status: UNDEFINED tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/ user: bblite 2018-06-08 14:26:50 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:51 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:52 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:52 WARN TransportChannelHandler:78 - Exception in connection from /192.168.49.38:38862 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) 2018-06-08 14:26:53 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:54 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:55 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:56 INFO Client:54 - Application report for application_1528296308262_0017 (state: ACCEPTED) 2018-06-08 14:26:56 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> MasterNode, PROXY_URI_BASES -> http://MasterNode:8088/proxy/application_1528296308262_0017), /proxy/application_1528296308262_0017 2018-06-08 14:26:56 INFO JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 2018-06-08 14:26:57 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 2018-06-08 14:26:57 INFO Client:54 - Application report for application_1528296308262_0017 (state: RUNNING) 2018-06-08 14:26:57 INFO Client:54 - client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.49.39 ApplicationMaster RPC port: 0 queue: default start time: 1528448208475 final status: UNDEFINED tracking URL: http://MasterNode:8088/proxy/application_1528296308262_0017/ user: bblite 2018-06-08 14:26:57 INFO YarnClientSchedulerBackend:54 - Application application_1528296308262_0017 has started running. 2018-06-08 14:26:57 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45193. 2018-06-08 14:26:57 INFO NettyBlockTransferService:54 - Server created on MasterNode:45193 2018-06-08 14:26:57 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2018-06-08 14:26:57 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, MasterNode, 45193, None) 2018-06-08 14:26:57 INFO BlockManagerMasterEndpoint:54 - Registering block manager MasterNode:45193 with 366.3 MB RAM, BlockManagerId(driver, MasterNode, 45193, None) 2018-06-08 14:26:57 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, MasterNode, 45193, None) 2018-06-08 14:26:57 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, MasterNode, 45193, None) 2018-06-08 14:26:57 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@261e16df{/metrics/json,null,AVAILABLE,@Spark} 2018-06-08 14:26:59 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED! 2018-06-08 14:26:59 INFO AbstractConnector:318 - Stopped Spark@3080efb7 {HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-06-08 14:26:59 INFO SparkUI:54 - Stopped Spark web UI at http://:4040 2018-06-08 14:26:59 ERROR TransportClient:233 - Failed to send RPC 7860815347855476907 to /192.168.49.39:53074: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 2018-06-08 14:26:59 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful java.io.IOException: Failed to send RPC 7860815347855476907 to / 192.168.49.39:53074: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 2018-06-08 14:26:59 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 2018-06-08 14:26:59 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752) at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357) at org.apache.spark.SparkContext.stop(SparkContext.scala:1923) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112) Caused by: java.io.IOException: Failed to send RPC 7860815347855476907 to / 192.168.49.39:53074: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 2018-06-08 14:26:59 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2018-06-08 14:26:59 INFO MemoryStore:54 - MemoryStore cleared 2018-06-08 14:26:59 INFO BlockManager:54 - BlockManager stopped 2018-06-08 14:26:59 ERROR SparkContext:91 - Error initializing SparkContext. java.lang.IllegalStateException: Spark context stopped while waiting for backend at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669) at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177) at org.apache.spark.SparkContext.<init>(SparkContext.scala:558) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) 2018-06-08 14:26:59 INFO SparkContext:54 - SparkContext already stopped. Traceback (most recent call last): File "/appdata/bblite-codebase/automl/backend/feature_extraction/trigger_feature_engineering_pipeline.py", line 18, in <module> .appName("EndToEnd_FeatureEngineeringPipeline")\ File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 331, in getOrCreate File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__ File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 270, in _initialize_context File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__ File "/appdata/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.IllegalStateException: Spark context stopped while waiting for backend at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669) at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177) at org.apache.spark.SparkContext.<init>(SparkContext.scala:558) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) 2018-06-08 14:26:59 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2018-06-08 14:26:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2018-06-08 14:26:59 INFO SparkContext:54 - Successfully stopped SparkContext 2018-06-08 14:26:59 INFO ShutdownHookManager:54 - Shutdown hook called 2018-06-08 14:26:59 INFO ShutdownHookManager:54 - Deleting directory /appdata/spark/tmp/spark-35d9709e-8f20-4b57-82d3 -f3ef0926d3ab 2018-06-08 14:26:59 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-1b471b46-0c5a-4f75-94c1-c99d9d674228 Seems the name-node and data-nodes cannot talk to each other correctly, why, no clue, anyone faced this problem, any help on this? Thanks, Aakash. On Fri, Jun 8, 2018 at 2:17 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > In Spark on YARN, error code 13 means SparkContext doesn't initialize in > time. You can check the yarn application log to get more information. > > BTW, did you just write a plain python script without creating > SparkContext/SparkSession? > > Aakash Basu <aakash.spark....@gmail.com> 于2018年6月8日周五 下午4:15写道: > >> Hi, >> >> I'm trying to run a program on a cluster using YARN. >> >> YARN is present there along with HADOOP. >> >> Problem I'm running into is as below - >> >> Container exited with a non-zero exit code 13 >>> Failing this attempt. Failing the application. >>> ApplicationMaster host: N/A >>> ApplicationMaster RPC port: -1 >>> queue: default >>> start time: 1528297574594 >>> final status: FAILED >>> tracking URL: http://MasterNode:8088/cluster/app/application_ >>> 1528296308262_0004 >>> user: bblite >>> Exception in thread "main" org.apache.spark.SparkException: Application >>> application_1528296308262_0004 finished with failed status >>> >> >> I checked on the net and most of the stackoverflow problems say, that the >> users have given *.master('local[*]')* in the code while invoking the >> Spark Session and at the same time, giving *--master yarn* while doing >> the spark-submit, hence they're getting the error due to conflict. >> >> But, in my case, I've not mentioned any master at all at the code. Just >> trying to run it on yarn by giving *--master yarn* while doing the >> spark-submit. Below is the code spark invoking - >> >> spark = SparkSession\ >> .builder\ >> .appName("Temp_Prog")\ >> .getOrCreate() >> >> Below is the spark-submit - >> >> *spark-submit --master yarn --deploy-mode cluster --num-executors 3 >> --executor-cores 6 --executor-memory 4G >> /appdata/codebase/backend/feature_extraction/try_yarn.py* >> >> I've tried without --deploy-mode too, still no help. >> >> Thanks, >> Aakash. >> >