It is fixed in https://issues.apache.org/jira/browse/SPARK-13330
Holden Karau <hol...@pigscanfly.ca>于2017年4月5日周三 上午12:03写道: > Which version of Spark is this (or is it a dev build)? We've recently made > some improvements with PYTHONHASHSEED propagation. > > On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven > cal.com> wrote: > > 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremb...@gmail.com>: > > When I try to to do a groupByKey() in my spark environment, I get the > error described here: > > > http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh > > In order to attempt to fix the problem, I set up my ipython environment > with the additional line: > > PYTHONHASHSEED=1 > > When I fire up my ipython shell, and do: > > In [7]: hash("foo") > Out[7]: -2457967226571033580 > > In [8]: hash("foo") > Out[8]: -2457967226571033580 > > So my hash function is now seeded so it returns consistent values. But > when I do a groupByKey(), I get the same error: > > > Exception: Randomness of hash of string should be disabled via > PYTHONHASHSEED > > Anyone know how to fix this problem in python 3.4? > > > Independent of the python version, you have to ensure that Python on > spark-master and -workers is started with PYTHONHASHSEED set, e.g. by > adding it to the environment of the spark processes. > > Best > > Eike > > -- > Cell : 425-233-8271 <(425)%20233-8271> > Twitter: https://twitter.com/holdenkarau >