Davies Liu created SPARK-2494: --------------------------------- Summary: Hash of None is different cross machines in CPython Key: SPARK-2494 URL: https://issues.apache.org/jira/browse/SPARK-2494 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0, 1.0.1 Environment: CPython 2.x Reporter: Davies Liu Priority: Blocker Fix For: 1.0.1, 1.0.0
The hash of None, also tuple with None in it, is different cross machines, so the result will be wrong if None appear in the key of partitionBy(). It should use an portable hash function as the default partition function, which generate same hash for all the builtin immutable types, especially tuple. -- This message was sent by Atlassian JIRA (v6.2#6252)