I saw the bug fix. I am using the latest Spark available on AWS EMR which I
think is 2.01. I am at work and can't check my home config. I don't think
AWS merged in this fix.

Henry

On Tue, Apr 4, 2017 at 4:42 PM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> It is fixed in https://issues.apache.org/jira/browse/SPARK-13330
>
>
>
> Holden Karau <hol...@pigscanfly.ca>于2017年4月5日周三 上午12:03写道:
>
>> Which version of Spark is this (or is it a dev build)? We've recently
>> made some improvements with PYTHONHASHSEED propagation.
>>
>> On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven
>> cal.com> wrote:
>>
>> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremb...@gmail.com>:
>>
>> When I try to to do a groupByKey() in my spark environment, I get the
>> error described here:
>>
>> http://stackoverflow.com/questions/36798833/what-does-
>> exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>>
>> In order to attempt to fix the problem, I set up my ipython environment
>> with the additional line:
>>
>> PYTHONHASHSEED=1
>>
>> When I fire up my ipython shell, and do:
>>
>> In [7]: hash("foo")
>> Out[7]: -2457967226571033580
>>
>> In [8]: hash("foo")
>> Out[8]: -2457967226571033580
>>
>> So my hash function is now seeded so it returns consistent values. But
>> when I do a groupByKey(), I get the same error:
>>
>>
>> Exception: Randomness of hash of string should be disabled via
>> PYTHONHASHSEED
>>
>> Anyone know how to fix this problem in python 3.4?
>>
>>
>> Independent of the python version, you have to ensure that Python on
>> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
>> adding it to the environment of the spark processes.
>>
>> Best
>>
>> Eike
>>
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>


-- 
Paul Henry Tremblay
Robert Half Technology

Reply via email to