[ 
https://issues.apache.org/jira/browse/SPARK-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Dave updated SPARK-3649:
------------------------------
    Description: 
As 
[reported|http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501]
 on the mailing list, GraphX throws

{code}
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
        at 
org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
 
        at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
 
        at 
org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
{code}

when sort-based shuffle attempts to spill to disk. This is because GraphX 
defines custom serializers for shuffling pair RDDs that assume Spark will 
always serialize the entire pair object rather than breaking it up into its 
components. However, the spill code path in sort-based shuffle [violates this 
assumption|https://github.com/apache/spark/blob/f9d6220c792b779be385f3022d146911a22c2130/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L329].

GraphX uses the custom serializers to compress vertex ID keys using 
variable-length integer encoding. However, since the serializer can no longer 
rely on the key and value being serialized and deserialized together, 
performing such encoding would require writing a tag byte. Therefore it may be 
better to simply remove the custom serializers.

  was:
As 
[reported|http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501]
 on the mailing list, GraphX throws

{code}
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
        at 
org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
 
        at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
 
        at 
org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
{code}

when sort-based shuffle attempts to spill to disk. This is because GraphX 
defines custom serializers for shuffling pair RDDs that assume Spark will 
always serialize the entire pair object rather than breaking it up into its 
components. However, the spill code path in sort-based shuffle [violates this 
assumption|https://github.com/apache/spark/blob/f9d6220c792b779be385f3022d146911a22c2130/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L329].

GraphX uses the custom serializers to compress vertex ID keys using 
variable-length integer encoding. However, since the serializer can no longer 
rely on the key and value being serialized and deserialized together, 
performing such encoding would require writing a tag byte.


> ClassCastException in GraphX custom serializers when sort-based shuffle spills
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-3649
>                 URL: https://issues.apache.org/jira/browse/SPARK-3649
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.2.0
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>
> As 
> [reported|http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501]
>  on the mailing list, GraphX throws
> {code}
> java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
>         at 
> org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
>  
>         at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
>  
>         at 
> org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
> {code}
> when sort-based shuffle attempts to spill to disk. This is because GraphX 
> defines custom serializers for shuffling pair RDDs that assume Spark will 
> always serialize the entire pair object rather than breaking it up into its 
> components. However, the spill code path in sort-based shuffle [violates this 
> assumption|https://github.com/apache/spark/blob/f9d6220c792b779be385f3022d146911a22c2130/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L329].
> GraphX uses the custom serializers to compress vertex ID keys using 
> variable-length integer encoding. However, since the serializer can no longer 
> rely on the key and value being serialized and deserialized together, 
> performing such encoding would require writing a tag byte. Therefore it may 
> be better to simply remove the custom serializers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to