[ https://issues.apache.org/jira/browse/SPARK-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595060#comment-14595060 ]
Ilya Rakitsin commented on SPARK-8503: -------------------------------------- The structure is a simple cycled graph, like you would imagine it: public abstract class Edge implements Serializable { private static final long serialVersionUID = MavenVersion.VERSION.getUID(); private int id; protected Vertex fromv; protected Vertex tov; ... } public abstract class Vertex implements Serializable, Cloneable { private String name; private transient Edge[] incoming = new Edge[0]; private transient Edge[] outgoing = new Edge[0]; ... } So, as you can see, edges in vertex are transient, so are serialized correctly (basically, not serialized) when using kryo or regular serialization. But when broadcasting, size is computed in a eternal loop until it's negative (at least it seems that way) due to cycles in the graph and transient edges not being handled. Does this help? Another issue is that in SizeTracker#takeSample() negative value returned by the estimator is not handled as well. Do you think this could be a separate issue, or could you investigate it as well? Hope this helps. > SizeEstimator returns negative value for recursive data structures > ------------------------------------------------------------------ > > Key: SPARK-8503 > URL: https://issues.apache.org/jira/browse/SPARK-8503 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.1 > Reporter: Ilya Rakitsin > > When estimating size of recursive data structures like graphs, with transient > fields referencing one another, SizeEstimator may return negative value if > the structure if big enough. > This then affects the logic of other components, e.g. > SizeTracker#takeSample() and may lead to incorrect behavior and exceptions > like: > java.lang.IllegalArgumentException: requirement failed: sizeInBytes was > negative: -9223372036854691384 > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:810) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637) > at > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:991) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) > at > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org