[ https://issues.apache.org/jira/browse/SPARK-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996217#comment-15996217 ]
Al Johri commented on SPARK-13330: ---------------------------------- Can this be backported to 2.0 or 2.1? I'm having trouble using Python 3 on spark at the moment. Currently I have to set `SPARK_YARN_USER_ENV=PYTHONHASHSEED=0` before running spark-submit. Until 2.2 is released, would it be best practice to put this variable into spark-env.sh? > PYTHONHASHSEED is not propgated to python worker > ------------------------------------------------ > > Key: SPARK-13330 > URL: https://issues.apache.org/jira/browse/SPARK-13330 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.6.0 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Fix For: 2.2.0 > > > when using python 3.3 , PYTHONHASHSEED is only set in driver, but not > propagated to executor, and cause the following error. > {noformat} > File "/Users/jzhang/github/spark/python/pyspark/rdd.py", line 74, in > portable_hash > raise Exception("Randomness of hash of string should be disabled via > PYTHONHASHSEED") > Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) > at > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45) > at org.apache.spark.scheduler.Task.run(Task.scala:81) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org