Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-05 Thread Marcelo Vanzin
Sorry, I can't help you if that doesn't work. Your YARN RM really should not have SPARK_HOME set if you want to use more than one Spark version. On Thu, Oct 4, 2018 at 9:54 PM Jianshi Huang wrote: > > Hi Marcelo, > > I see what you mean. Tried it but still got same error message. > >> Error from

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Jianshi Huang
Hi Marcelo, I see what you mean. Tried it but still got same error message. Error from python worker: > Traceback (most recent call last): > File "/usr/local/Python-3.6.4/lib/python3.6/runpy.py", line 183, in > _run_module_as_main > mod_name, mod_spec, code =

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Jianshi Huang
Thanks Marcelo, But I don't want to install 2.3.2 on the worker nodes. I just want Spark to use the path of the files uploaded to YARN instead of the SPARK_HOME. On Fri, Oct 5, 2018 at 1:25 AM Marcelo Vanzin wrote: > Try "spark.executorEnv.SPARK_HOME=$PWD" (in quotes so it does not get >

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Jianshi Huang
Yes, that's right. On Fri, Oct 5, 2018 at 3:35 AM Gourav Sengupta wrote: > Hi Marcelo, > it will be great if you illustrate what you mean, I will be interested to > know. > > Hi Jianshi, > so just to be sure you want to work on SPARK 2.3 while having SPARK 2.1 > installed in your cluster? > >

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Gourav Sengupta
Hi Marcelo, it will be great if you illustrate what you mean, I will be interested to know. Hi Jianshi, so just to be sure you want to work on SPARK 2.3 while having SPARK 2.1 installed in your cluster? Regards, Gourav Sengupta On Thu, Oct 4, 2018 at 6:26 PM Marcelo Vanzin wrote: > Try

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
Try "spark.executorEnv.SPARK_HOME=$PWD" (in quotes so it does not get expanded by the shell). But it's really weird to be setting SPARK_HOME in the environment of your node managers. YARN shouldn't need to know about that. On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang wrote: > >

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Jianshi Huang
https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L31 The code shows Spark will try to find the path if SPARK_HOME is specified. And on my worker node, SPARK_HOME is specified in .bashrc , for the

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
Normally the version of Spark installed on the cluster does not matter, since Spark is uploaded from your gateway machine to YARN by default. You probably have some configuration (in spark-defaults.conf) that tells YARN to use a cached copy. Get rid of that configuration, and you can use whatever

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Apostolos N. Papadopoulos
Maybe this can help. https://stackoverflow.com/questions/32959723/set-python-path-for-spark-worker On 04/10/2018 12:19 μμ, Jianshi Huang wrote: Hi, I have a problem using multiple versions of Pyspark on YARN, the driver and worker nodes are all preinstalled with Spark 2.2.1, for

Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Jianshi Huang
Hi, I have a problem using multiple versions of Pyspark on YARN, the driver and worker nodes are all preinstalled with Spark 2.2.1, for production tasks. And I want to use 2.3.2 for my personal EDA. I've tried both 'pyFiles=' option and sparkContext.addPyFiles(), however on the worker node, the