[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065006#comment-14065006 ]
Matthew Farrellee commented on SPARK-2494: ------------------------------------------ [~davies] will you provide an example that demonstrates the issue? > Hash of None is different cross machines in CPython > --------------------------------------------------- > > Key: SPARK-2494 > URL: https://issues.apache.org/jira/browse/SPARK-2494 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.0.0, 1.0.1 > Environment: CPython 2.x > Reporter: Davies Liu > Priority: Blocker > Labels: pyspark, shuffle > Fix For: 1.0.0, 1.0.1 > > Original Estimate: 24h > Remaining Estimate: 24h > > The hash of None, also tuple with None in it, is different cross machines, so > the result will be wrong if None appear in the key of partitionBy(). > It should use an portable hash function as the default partition function, > which generate same hash for all the builtin immutable types, especially > tuple. -- This message was sent by Atlassian JIRA (v6.2#6252)