Hello, have question , we seeing below exceptions, and at the moment are
enabling JVM profiler to look into gc activity on workers and if you have
any other suggestions do let know please , we dont just want increase rpc
timeout (from 120) to 600 sec lets say but get to reason why workers
timeout

but the question also is , it appears this timeout causes data loss, data
in mappers does not make to reducers and back to driver to collect the data

so the resiliency is lost it seems. because spark does not look like it is
retrying data at some later time but gives up and input record is lost

-Andrew


16/02/08 08:33:37 ERROR YarnScheduler: Lost executor 265 on
ip-172-20-35-115.ec2.internal: remote Rpc client disassociated
[Stage 4313813:>                                                  (0 + 94)
/ 95]16/02/08 08:35:35 ERROR ContextCleaner: Error cleaning broadcast
4311376
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org
$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
at
org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136)
at
org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
at
org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
at
org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
at org.apache.spark.ContextCleaner.org
$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):I'll still send over
the log file
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):Some guy online was
asking about garbage collection and how it can throw off executors
[2/8/16, 9:25 AM] Chad Ladensack (c...@revcontent.com):
http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning
<http://www.google.com/url?q=http%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Ftuning.html%23garbage-collection-tuning&sa=D&sntz=1&usg=AFQjCNH5hSX1iidn3Lr66b91tkeaw9JGcA>
[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):He referenced the
tuning guide and it shows how to measure it
[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):idk, maybe worth
looking into?
[2/8/16, 9:26 AM] Chad Ladensack (c...@revcontent.com):I'll get the log and
email it now

Reply via email to