Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19586 Hi @cloud-fan, for most case the data type should be same. So I think this optimization is valuable, because it can save the space and cpu resource considerable. What about setting a flag for the RDD, which indicates whether the RDD only has the same types. If it'st not valid, could we putting it to the ml package for special serializer, then user could configure it. But for this case, there must be provided the exactly classtag of the RDD for serialization due to the relocation of unsafeshufflewrite.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org