Should zeppelin.pyspark.python be used on the worker nodes ?
I'm trying to use zeppelin.pyspark.python as the variable to set the python that Spark worker nodes should use for my job, but it doesn't seem to be working. Am I missing something or this variable does not do that ? My goal is to change that variable to point to different conda environments. These environments are available in all worker nodes since it's on a shared location and ideally all nodes then would have access to the same libraries and dependencies. Thanks, ~/William
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
You can set PYSPARK_PYTHON environment variable for that. Not sure about zeppelin.pyspark.python. I think it does not work See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 Eventually, i think we can remove zeppelin.pyspark.python and use only PYSPARK_PYTHON instead to avoid confusion. -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < mark...@apache.org> wrote: > I'm trying to use zeppelin.pyspark.python as the variable to set the > python that Spark worker nodes should use for my job, but it doesn't seem > to be working. > > Am I missing something or this variable does not do that ? > > My goal is to change that variable to point to different conda > environments. These environments are available in all worker nodes since > it's on a shared location and ideally all nodes then would have access to > the same libraries and dependencies. > > Thanks, > > ~/William >
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
Thanks for the quick response Ruslan. But given that it's an environment variable, I can't quickly change that value and point to a different python environment without restarting the Zeppelin process, can I ? I mean is there a way to set the value for PYSPARK_PYTHON from the Interpreter configuration screen ? Thanks, On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov wrote: > You can set PYSPARK_PYTHON environment variable for that. > > Not sure about zeppelin.pyspark.python. I think it does not work > See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 > > Eventually, i think we can remove zeppelin.pyspark.python and use only > PYSPARK_PYTHON instead to avoid confusion. > > > -- > Ruslan Dautkhanov > > On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < > mark...@apache.org> wrote: > >> I'm trying to use zeppelin.pyspark.python as the variable to set the >> python that Spark worker nodes should use for my job, but it doesn't seem >> to be working. >> >> Am I missing something or this variable does not do that ? >> >> My goal is to change that variable to point to different conda >> environments. These environments are available in all worker nodes since >> it's on a shared location and ideally all nodes then would have access to >> the same libraries and dependencies. >> >> Thanks, >> >> ~/William >> > > -- ~/William
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
When property key in interpreter configuration screen matches certain condition [1], it'll be treated as a environment variable. You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in interpreter configuration. Thanks, moon [1] https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreter.java#L152 On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira < william.mark...@gmail.com> wrote: > Thanks for the quick response Ruslan. > > But given that it's an environment variable, I can't quickly change that > value and point to a different python environment without restarting the > Zeppelin process, can I ? I mean is there a way to set the value for > PYSPARK_PYTHON from the Interpreter configuration screen ? > > Thanks, > > > On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov > wrote: > > You can set PYSPARK_PYTHON environment variable for that. > > Not sure about zeppelin.pyspark.python. I think it does not work > See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 > > Eventually, i think we can remove zeppelin.pyspark.python and use only > PYSPARK_PYTHON instead to avoid confusion. > > > -- > Ruslan Dautkhanov > > On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < > mark...@apache.org> wrote: > > I'm trying to use zeppelin.pyspark.python as the variable to set the > python that Spark worker nodes should use for my job, but it doesn't seem > to be working. > > Am I missing something or this variable does not do that ? > > My goal is to change that variable to point to different conda > environments. These environments are available in all worker nodes since > it's on a shared location and ideally all nodes then would have access to > the same libraries and dependencies. > > Thanks, > > ~/William > > > > > > -- > ~/William >
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
You're right - it will not be dynamic. You may want to check https://issues.apache.org/jira/browse/ZEPPELIN-2195 https://github.com/apache/zeppelin/pull/2079 it seems it is fixed in a current snapshot of Zeppelin (comitted 3 weeks ago). -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 1:21 PM, William Markito Oliveira < william.mark...@gmail.com> wrote: > Thanks for the quick response Ruslan. > > But given that it's an environment variable, I can't quickly change that > value and point to a different python environment without restarting the > Zeppelin process, can I ? I mean is there a way to set the value for > PYSPARK_PYTHON from the Interpreter configuration screen ? > > Thanks, > > > On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov > wrote: > >> You can set PYSPARK_PYTHON environment variable for that. >> >> Not sure about zeppelin.pyspark.python. I think it does not work >> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 >> >> Eventually, i think we can remove zeppelin.pyspark.python and use only >> PYSPARK_PYTHON instead to avoid confusion. >> >> >> -- >> Ruslan Dautkhanov >> >> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < >> mark...@apache.org> wrote: >> >>> I'm trying to use zeppelin.pyspark.python as the variable to set the >>> python that Spark worker nodes should use for my job, but it doesn't seem >>> to be working. >>> >>> Am I missing something or this variable does not do that ? >>> >>> My goal is to change that variable to point to different conda >>> environments. These environments are available in all worker nodes since >>> it's on a shared location and ideally all nodes then would have access to >>> the same libraries and dependencies. >>> >>> Thanks, >>> >>> ~/William >>> >> >> > > > -- > ~/William >
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
Ah! Thanks Ruslan! I'm still using 0.7.0 - Let me update to 0.8.0 and I'll come back update this thread with the results. On Mon, Mar 20, 2017 at 3:10 PM, William Markito Oliveira < william.mark...@gmail.com> wrote: > Hi moon, thanks for the tip. Here to summarize my current settings are the > following > > conf/zeppelin-env.sh has only SPARK_HOME setting: > > export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ > > Then on the configuration of the interpreter through the web interface I > have: > > PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python > zeppelin.pyspark.python=python > > But when I submit from the notebook I'm receiving: pyspark is not > responding > > And the log file outputs: > > Traceback (most recent call last): File > "/tmp/zeppelin_pyspark-6480867511995958556.py", > line 22, in from pyspark.conf import SparkConf ImportError: No > module named pyspark.conf > > Any thoughts ? Thanks a lot! > > On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee wrote: > >> When property key in interpreter configuration screen matches certain >> condition [1], it'll be treated as a environment variable. >> >> You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in >> interpreter configuration. >> >> Thanks, >> moon >> >> [1] https://github.com/apache/zeppelin/blob/master/zeppelin- >> interpreter/src/main/java/org/apache/zeppelin/interpreter/re >> mote/RemoteInterpreter.java#L152 >> >> >> On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira < >> william.mark...@gmail.com> wrote: >> >>> Thanks for the quick response Ruslan. >>> >>> But given that it's an environment variable, I can't quickly change that >>> value and point to a different python environment without restarting the >>> Zeppelin process, can I ? I mean is there a way to set the value for >>> PYSPARK_PYTHON from the Interpreter configuration screen ? >>> >>> Thanks, >>> >>> >>> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov >> > wrote: >>> >>> You can set PYSPARK_PYTHON environment variable for that. >>> >>> Not sure about zeppelin.pyspark.python. I think it does not work >>> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 >>> >>> Eventually, i think we can remove zeppelin.pyspark.python and use only >>> PYSPARK_PYTHON instead to avoid confusion. >>> >>> >>> -- >>> Ruslan Dautkhanov >>> >>> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < >>> mark...@apache.org> wrote: >>> >>> I'm trying to use zeppelin.pyspark.python as the variable to set the >>> python that Spark worker nodes should use for my job, but it doesn't seem >>> to be working. >>> >>> Am I missing something or this variable does not do that ? >>> >>> My goal is to change that variable to point to different conda >>> environments. These environments are available in all worker nodes since >>> it's on a shared location and ideally all nodes then would have access to >>> the same libraries and dependencies. >>> >>> Thanks, >>> >>> ~/William >>> >>> >>> >>> >>> >>> -- >>> ~/William >>> >> > > > -- > ~/William > -- ~/William
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
Hi moon, thanks for the tip. Here to summarize my current settings are the following conf/zeppelin-env.sh has only SPARK_HOME setting: export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ Then on the configuration of the interpreter through the web interface I have: PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python zeppelin.pyspark.python=python But when I submit from the notebook I'm receiving: pyspark is not responding And the log file outputs: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-6480867511995958556.py", line 22, in from pyspark.conf import SparkConf ImportError: No module named pyspark.conf Any thoughts ? Thanks a lot! On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee wrote: > When property key in interpreter configuration screen matches certain > condition [1], it'll be treated as a environment variable. > > You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in > interpreter configuration. > > Thanks, > moon > > [1] https://github.com/apache/zeppelin/blob/master/zeppelin- > interpreter/src/main/java/org/apache/zeppelin/interpreter/ > remote/RemoteInterpreter.java#L152 > > > On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira < > william.mark...@gmail.com> wrote: > >> Thanks for the quick response Ruslan. >> >> But given that it's an environment variable, I can't quickly change that >> value and point to a different python environment without restarting the >> Zeppelin process, can I ? I mean is there a way to set the value for >> PYSPARK_PYTHON from the Interpreter configuration screen ? >> >> Thanks, >> >> >> On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov >> wrote: >> >> You can set PYSPARK_PYTHON environment variable for that. >> >> Not sure about zeppelin.pyspark.python. I think it does not work >> See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 >> >> Eventually, i think we can remove zeppelin.pyspark.python and use only >> PYSPARK_PYTHON instead to avoid confusion. >> >> >> -- >> Ruslan Dautkhanov >> >> On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < >> mark...@apache.org> wrote: >> >> I'm trying to use zeppelin.pyspark.python as the variable to set the >> python that Spark worker nodes should use for my job, but it doesn't seem >> to be working. >> >> Am I missing something or this variable does not do that ? >> >> My goal is to change that variable to point to different conda >> environments. These environments are available in all worker nodes since >> it's on a shared location and ideally all nodes then would have access to >> the same libraries and dependencies. >> >> Thanks, >> >> ~/William >> >> >> >> >> >> -- >> ~/William >> > -- ~/William
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
> from pyspark.conf import SparkConf > ImportError: No module named *pyspark.conf* William, you probably meant from pyspark import SparkConf ? -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 2:12 PM, William Markito Oliveira < william.mark...@gmail.com> wrote: > Ah! Thanks Ruslan! I'm still using 0.7.0 - Let me update to 0.8.0 and > I'll come back update this thread with the results. > > On Mon, Mar 20, 2017 at 3:10 PM, William Markito Oliveira < > william.mark...@gmail.com> wrote: > >> Hi moon, thanks for the tip. Here to summarize my current settings are >> the following >> >> conf/zeppelin-env.sh has only SPARK_HOME setting: >> >> export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ >> >> Then on the configuration of the interpreter through the web interface I >> have: >> >> PYSPARK_PYTHON=/opt/miniconda2/envs/myenv/bin/python >> zeppelin.pyspark.python=python >> >> But when I submit from the notebook I'm receiving: pyspark is not >> responding >> >> And the log file outputs: >> >> Traceback (most recent call last): File >> "/tmp/zeppelin_pyspark-6480867511995958556.py", >> line 22, in from pyspark.conf import SparkConf ImportError: No >> module named pyspark.conf >> >> Any thoughts ? Thanks a lot! >> >> On Mon, Mar 20, 2017 at 2:27 PM, moon soo Lee wrote: >> >>> When property key in interpreter configuration screen matches certain >>> condition [1], it'll be treated as a environment variable. >>> >>> You can remove PYSPARK_PYTHON from conf/zeppelin-env.sh and place it in >>> interpreter configuration. >>> >>> Thanks, >>> moon >>> >>> [1] https://github.com/apache/zeppelin/blob/master/zeppelin- >>> interpreter/src/main/java/org/apache/zeppelin/interpreter/re >>> mote/RemoteInterpreter.java#L152 >>> >>> >>> On Mon, Mar 20, 2017 at 12:21 PM William Markito Oliveira < >>> william.mark...@gmail.com> wrote: >>> Thanks for the quick response Ruslan. But given that it's an environment variable, I can't quickly change that value and point to a different python environment without restarting the Zeppelin process, can I ? I mean is there a way to set the value for PYSPARK_PYTHON from the Interpreter configuration screen ? Thanks, On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov < dautkha...@gmail.com> wrote: You can set PYSPARK_PYTHON environment variable for that. Not sure about zeppelin.pyspark.python. I think it does not work See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 Eventually, i think we can remove zeppelin.pyspark.python and use only PYSPARK_PYTHON instead to avoid confusion. -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira < mark...@apache.org> wrote: I'm trying to use zeppelin.pyspark.python as the variable to set the python that Spark worker nodes should use for my job, but it doesn't seem to be working. Am I missing something or this variable does not do that ? My goal is to change that variable to point to different conda environments. These environments are available in all worker nodes since it's on a shared location and ideally all nodes then would have access to the same libraries and dependencies. Thanks, ~/William -- ~/William >>> >> >> >> -- >> ~/William >> > > > > -- > ~/William >
Re: Should zeppelin.pyspark.python be used on the worker nodes ?
It is dynamic, you can set enviroment variable in interpreter setting page. Best Regard, Jeff Zhang From: Ruslan Dautkhanov mailto:dautkha...@gmail.com>> Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" mailto:users@zeppelin.apache.org>> Date: Tuesday, March 21, 2017 at 3:27 AM To: users mailto:users@zeppelin.apache.org>> Subject: Re: Should zeppelin.pyspark.python be used on the worker nodes ? You're right - it will not be dynamic. You may want to check https://issues.apache.org/jira/browse/ZEPPELIN-2195 https://github.com/apache/zeppelin/pull/2079 it seems it is fixed in a current snapshot of Zeppelin (comitted 3 weeks ago). -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 1:21 PM, William Markito Oliveira mailto:william.mark...@gmail.com>> wrote: Thanks for the quick response Ruslan. But given that it's an environment variable, I can't quickly change that value and point to a different python environment without restarting the Zeppelin process, can I ? I mean is there a way to set the value for PYSPARK_PYTHON from the Interpreter configuration screen ? Thanks, On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov mailto:dautkha...@gmail.com>> wrote: You can set PYSPARK_PYTHON environment variable for that. Not sure about zeppelin.pyspark.python. I think it does not work See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265 Eventually, i think we can remove zeppelin.pyspark.python and use only PYSPARK_PYTHON instead to avoid confusion. -- Ruslan Dautkhanov On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira mailto:mark...@apache.org>> wrote: I'm trying to use zeppelin.pyspark.python as the variable to set the python that Spark worker nodes should use for my job, but it doesn't seem to be working. Am I missing something or this variable does not do that ? My goal is to change that variable to point to different conda environments. These environments are available in all worker nodes since it's on a shared location and ideally all nodes then would have access to the same libraries and dependencies. Thanks, ~/William -- ~/William