[ https://issues.apache.org/jira/browse/SPARK-12831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brett Stime updated SPARK-12831: -------------------------------- Description: Getting the following error in my executor logs: ERROR akka.ErrorMonitor: Transient association error (association remains live) akka.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]: max allowed size 134217728 bytes, actual size of encoded class org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes. Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a little bigger--maybe proportional to spark.akka.frameSize and/or user configurable. A more robust solution might be to catch OversizedPayloadException and retry using the BlockManager. I should also mention that this has the effect of stalling the entire job (my use case also requires fairly liberal timeouts). For now, I'll see if setting spark.akka.frameSize a little smaller gives me more proportional overhead. Thanks. was: Getting the following error in my executor logs: ERROR akka.ErrorMonitor: Transient association error (association remains live) akka.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]: max allowed size 134217728 bytes, actual size of encoded class org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes. Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a little bigger--maybe proportional to spark.akka.frameSize and/or user configurable. A more robust solution might be to catch OversizedPayloadException and retry using the BlockManager. For now, I'll see if setting spark.akka.frameSize a little smaller gives me more overhead. Thanks. > akka.remote.OversizedPayloadException on DirectTaskResult > --------------------------------------------------------- > > Key: SPARK-12831 > URL: https://issues.apache.org/jira/browse/SPARK-12831 > Project: Spark > Issue Type: Bug > Reporter: Brett Stime > > Getting the following error in my executor logs: > ERROR akka.ErrorMonitor: Transient association error (association remains > live) > akka.remote.OversizedPayloadException: Discarding oversized payload sent to > Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]: > max allowed size 134217728 bytes, actual size of encoded class > org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes. > Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a > little bigger--maybe proportional to spark.akka.frameSize and/or user > configurable. > A more robust solution might be to catch OversizedPayloadException and retry > using the BlockManager. > I should also mention that this has the effect of stalling the entire job (my > use case also requires fairly liberal timeouts). For now, I'll see if setting > spark.akka.frameSize a little smaller gives me more proportional overhead. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org