On Sun, Nov 2, 2014 at 1:35 AM, <jan.zi...@centrum.cz> wrote: > Hi, > > I am using Spark on Yarn, particularly Spark in Python. I am trying to run: > > myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json")
How many files do you have? and the average size of each file? > myrdd.getNumPartitions() > > Unfortunately it seems that Spark tries to load everything to RAM, or at > least after while of running this everything slows down and then I am > getting errors with log below. Everything works fine for datasets smaller > than RAM, but I would expect Spark doing this without storing everything to > RAM. So I would like to ask if I'm not missing some settings in Spark on > Yarn? > > > Thank you in advance for any help. > > > 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.actor.default-dispatcher-375] shutting down > ActorSystem [sparkDriver] > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > 14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down > ActorSystem [sparkDriver] > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > 11744,575: [Full GC 1194515K->1192839K(1365504K), 2,2367150 secs] > > 11746,814: [Full GC 1194507K->1193186K(1365504K), 2,1788150 secs] > > 11748,995: [Full GC 1194507K->1193278K(1365504K), 1,3511480 secs] > > 11750,347: [Full GC 1194507K->1193263K(1365504K), 2,2735350 secs] > > 11752,622: [Full GC 1194506K->1193192K(1365504K), 1,2700110 secs] > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/home/hadoop/spark/python/pyspark/rdd.py", line 391, in > getNumPartitions > > return self._jrdd.partitions().size() > > File > "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > > File > "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line > 300, in get_return_value > > py4j.protocol.Py4JJavaError14/11/01 22:07:07 INFO scheduler.DAGScheduler: > Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2 > > : An error occurred while calling o112.partitions. > > : java.lang.OutOfMemoryError: GC overhead limit exceeded > > > >>>> 11753,896: [Full GC 1194506K->947839K(1365504K), 2,1483780 secs] > > 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: > Shutting down remote daemon. > > 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.actor.default-dispatcher-381] shutting down > ActorSystem [sparkDriver] > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > 14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.actor.default-dispatcher-309] shutting down > ActorSystem [sparkDriver] > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: > Remote daemon shut down; proceeding with flushing remote transports. > > 14/11/01 22:07:09 INFO Remoting: Remoting shut down > > 14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: > Remoting shut down. > > 14/11/01 22:07:09 INFO network.ConnectionManager: Removing > ReceivingConnection to > ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) > > 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) > > 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871) > > 14/11/01 22:07:09 INFO network.ConnectionManager: Key not valid ? > sun.nio.ch.SelectionKeyImpl@5ca1c790 > > 14/11/01 22:07:09 INFO network.ConnectionManager: key already cancelled ? > sun.nio.ch.SelectionKeyImpl@5ca1c790 > > java.nio.channels.CancelledKeyException > > at > org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) > > at > org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) > > 14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) > > 14/11/01 22:07:09 INFO network.ConnectionManager: Removing > ReceivingConnection to > ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) > > 14/11/01 22:07:09 ERROR network.ConnectionManager: Corresponding > SendingConnection to > ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) not > found > > 14/11/01 22:07:10 ERROR cluster.YarnClientSchedulerBackend: Yarn application > already ended: FINISHED > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/metrics/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage/kill,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/static,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/rdd/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/rdd,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/pool/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/pool,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/json,null} > > 14/11/01 22:07:10 INFO handler.ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages,null} > > 14/11/01 22:07:10 INFO ui.SparkUI: Stopped Spark web UI at > http://ip-172-31-20-69.us-west-2.compute.internal:4040 > > 14/11/01 22:07:10 INFO scheduler.DAGScheduler: Stopping DAGScheduler > > 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all > executors > > 14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Stopped > > 14/11/01 22:07:11 ERROR spark.MapOutputTrackerMaster: Error communicating > with MapOutputTracker > > akka.pattern.AskTimeoutException: > Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had > already been terminated. > > at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) > > at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106) > > at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117) > > at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) > > at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80) > > at org.apache.spark.SparkContext.stop(SparkContext.scala:1024) > > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131) > > Exception in thread "Yarn Application State Checker" > org.apache.spark.SparkException: Error communicating with MapOutputTracker > > at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111) > > at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117) > > at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) > > at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80) > > at org.apache.spark.SparkContext.stop(SparkContext.scala:1024) > > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131) > > Caused by: akka.pattern.AskTimeoutException: > Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had > already been terminated. > > at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) > > at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106) > > ... 5 more > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org