Hi, I am using Spark on Yarn, particularly Spark in Python. I am trying to run:
myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json") myrdd.getNumPartitions() Unfortunately it seems that Spark tries to load everything to RAM, or at least after while of running this everything slows down and then I am getting errors with log below. Everything works fine for datasets smaller than RAM, but I would expect Spark doing this without storing everything to RAM. So I would like to ask if I'm not missing some settings in Spark on Yarn? Thank you in advance for any help. 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-375] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: GC overhead limit exceeded 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: GC overhead limit exceeded 11744,575: [Full GC 1194515K->1192839K(1365504K), 2,2367150 secs] 11746,814: [Full GC 1194507K->1193186K(1365504K), 2,1788150 secs] 11748,995: [Full GC 1194507K->1193278K(1365504K), 1,3511480 secs] 11750,347: [Full GC 1194507K->1193263K(1365504K), 2,2735350 secs] 11752,622: [Full GC 1194506K->1193192K(1365504K), 1,2700110 secs] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/hadoop/spark/python/pyspark/rdd.py", line 391, in getNumPartitions return self._jrdd.partitions().size() File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError14/11/01 22:07:07 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2 : An error occurred while calling o112.partitions. : java.lang.OutOfMemoryError: GC overhead limit exceeded
11753,896: [Full GC 1194506K->947839K(1365504K), 2,1483780 secs]
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: GC overhead limit exceeded 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-309] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: GC overhead limit exceeded 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 14/11/01 22:07:09 INFO Remoting: Remoting shut down 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) 14/11/01 22:07:09 INFO network.ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@5ca1c790 14/11/01 22:07:09 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@5ca1c790 java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) 14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) 14/11/01 22:07:09 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) not found 14/11/01 22:07:10 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FINISHED 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null} 14/11/01 22:07:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null} 14/11/01 22:07:10 INFO ui.SparkUI: Stopped Spark web UI at http://ip-172-31-20-69.us-west-2.compute.internal:4040 14/11/01 22:07:10 INFO scheduler.DAGScheduler: Stopping DAGScheduler 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Stopped 14/11/01 22:07:11 ERROR spark.MapOutputTrackerMaster: Error communicating with MapOutputTracker akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106) at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117) at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80) at org.apache.spark.SparkContext.stop(SparkContext.scala:1024) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131) Exception in thread "Yarn Application State Checker" org.apache.spark.SparkException: Error communicating with MapOutputTracker at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111) at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117) at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80) at org.apache.spark.SparkContext.stop(SparkContext.scala:1024) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131) Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106) ... 5 more
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org