Which version of Spark is this (or is it a dev build)? We've recently made some improvements with PYTHONHASHSEED propagation.
On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven cal.com> wrote: 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremb...@gmail.com>: When I try to to do a groupByKey() in my spark environment, I get the error described here: http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh In order to attempt to fix the problem, I set up my ipython environment with the additional line: PYTHONHASHSEED=1 When I fire up my ipython shell, and do: In [7]: hash("foo") Out[7]: -2457967226571033580 In [8]: hash("foo") Out[8]: -2457967226571033580 So my hash function is now seeded so it returns consistent values. But when I do a groupByKey(), I get the same error: Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED Anyone know how to fix this problem in python 3.4? Independent of the python version, you have to ensure that Python on spark-master and -workers is started with PYTHONHASHSEED set, e.g. by adding it to the environment of the spark processes. Best Eike -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau