Davies Liu created SPARK-2494:
---------------------------------

             Summary: Hash of None is different cross machines in CPython
                 Key: SPARK-2494
                 URL: https://issues.apache.org/jira/browse/SPARK-2494
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.0.0, 1.0.1
         Environment: CPython 2.x 
            Reporter: Davies Liu
            Priority: Blocker
             Fix For: 1.0.1, 1.0.0


The hash of None, also tuple with None in it, is different cross machines, so 
the result will be wrong if None appear in the key of partitionBy().

It should use an portable hash function as the default partition function, 
which generate same hash for all the builtin immutable types, especially tuple.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to