[ https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049284#comment-15049284 ]
Adam Roberts commented on SPARK-9858: ------------------------------------- Yep, I added System.identityHashCode(serializer) prints in both the creation method and when it's used (both in the Exchange class) Creating new unsafe row serializer ADAMTEST. myUnsafeRowSerializer identity hash: -555078685 Creating new unsafe row serializer ADAMTEST. myUnsafeRowSerializer identity hash: 1088823803 preparing shuffle dependency ADAMTEST. In needToCopy function and serializer hash is: 1088823803 New development, on Intel (LE platform) if we take the 200 elements and print them, we get 20 rows containing (3,[0,13,5,ff00000000000000]) in a row. On our BE platforms this isn't the case, everything is (3,[0,13,5,0]) - the same as the rest of the file on Intel. This print is in DAGScheduler's submitMapStage method: val rdd = dependency.rdd rdd.take(200).foreach(println) > Introduce an ExchangeCoordinator to estimate the number of post-shuffle > partitions. > ----------------------------------------------------------------------------------- > > Key: SPARK-9858 > URL: https://issues.apache.org/jira/browse/SPARK-9858 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Yin Huai > Assignee: Yin Huai > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org