Hi,

I am using Spark on Yarn, particularly Spark in Python. I am trying to run:

myrdd = sc.textFile("s3n://mybucket/files/*/*/*.json")
myrdd.getNumPartitions()

Unfortunately it seems that Spark tries to load everything to RAM, or at least 
after while of running this everything slows down and then I am getting errors 
with log below. Everything works fine for datasets smaller than RAM, but I 
would expect Spark doing this without storing everything to RAM. So I would 
like to ask if I'm not missing some settings in Spark on Yarn?


Thank you in advance for any help.


14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-375] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:06:57 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-381] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
11744,575: [Full GC 1194515K->1192839K(1365504K), 2,2367150 secs]
11746,814: [Full GC 1194507K->1193186K(1365504K), 2,1788150 secs]
11748,995: [Full GC 1194507K->1193278K(1365504K), 1,3511480 secs]
11750,347: [Full GC 1194507K->1193263K(1365504K), 2,2735350 secs]
11752,622: [Full GC 1194506K->1193192K(1365504K), 1,2700110 secs]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 391, in getNumPartitions
    return self._jrdd.partitions().size()
  File 
"/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
538, in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
line 300, in get_return_value
py4j.protocol.Py4JJavaError14/11/01 22:07:07 INFO scheduler.DAGScheduler: 
Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2
: An error occurred while calling o112.partitions.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
 
11753,896: [Full GC 1194506K->947839K(1365504K), 2,1483780 secs]
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-381] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:07:09 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-309] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
14/11/01 22:07:09 INFO Remoting: Remoting shut down
14/11/01 22:07:09 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.
14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,55871)
14/11/01 22:07:09 INFO network.ConnectionManager: Key not valid ? 
sun.nio.ch.SelectionKeyImpl@5ca1c790
14/11/01 22:07:09 INFO network.ConnectionManager: key already cancelled ? 
sun.nio.ch.SelectionKeyImpl@5ca1c790
java.nio.channels.CancelledKeyException
at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
at 
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
14/11/01 22:07:09 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768)
14/11/01 22:07:09 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-31-18-35.us-west-2.compute.internal,52768) not found
14/11/01 22:07:10 ERROR cluster.YarnClientSchedulerBackend: Yarn application 
already ended: FINISHED
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/metrics/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/static,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/executors,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/environment,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/storage,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/pool,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/stage,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages/json,null}
14/11/01 22:07:10 INFO handler.ContextHandler: stopped 
o.e.j.s.ServletContextHandler{/stages,null}
14/11/01 22:07:10 INFO ui.SparkUI: Stopped Spark web UI at 
http://ip-172-31-20-69.us-west-2.compute.internal:4040
14/11/01 22:07:10 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all 
executors
14/11/01 22:07:10 INFO cluster.YarnClientSchedulerBackend: Stopped
14/11/01 22:07:11 ERROR spark.MapOutputTrackerMaster: Error communicating with 
MapOutputTracker
akka.pattern.AskTimeoutException: 
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had 
already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
Exception in thread "Yarn Application State Checker" 
org.apache.spark.SparkException: Error communicating with MapOutputTracker
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:117)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:80)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1024)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:131)
Caused by: akka.pattern.AskTimeoutException: 
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#132167931]] had 
already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:106)
... 5 more 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to