Hello Sasha, I have no answer for debian. My cluster is on Linux and I'm using CDH 5.4 Your question- "Error from python worker: /cube/PY/Python27/bin/python: No module named pyspark"
On a single node (ie one server/machine/computer) I installed pyspark binaries and it worked. Connected it to pycharm and it worked too. Next I tried executing pyspark command on another node (say the worker) in the cluster and i got this error message, Error from python worker: PATH: No module named pyspark". My first guess was that the worker is not picking up the path of pyspark binaries installed on the server ( I tried many a things like hard-coding the pyspark path in the config.sh file on the worker- NO LUCK; tried dynamic path from the code in pycharm- NO LUCK... ; searched the web and asked the question in almost every online forum--NO LUCK..; banged my head several times with pyspark/hadoop books--NO LUCK... Finally, one fine day a 'watermelon' dropped while brooding on this problem and I installed pyspark binaries on all the worker machines ) Now when I tried executing just the command pyspark on the worker's it worked. Tried some simple program snippets on each worker, it works too. I am not sure if this will help or not for your use-case. Sincerely, Ashish On Mon, Sep 7, 2015 at 11:04 PM, Sasha Kacanski <skacan...@gmail.com> wrote: > Thanks Ashish, > nice blog but does not cover my issue. Actually I have pycharm running and > loading pyspark and rest of libraries perfectly fine. > My issue is that I am not sure what is triggering > > Error from python worker: > /cube/PY/Python27/bin/python: No module named pyspark > pyspark > PYTHONPATH was: > > /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1. > 4.1-hadoop2.6.0.jar > > Question is why is yarn not getting python package to run on the single > node via YARN? > Some people are saying run with JAVA 6 due to zip library changes between > 6/7/8, some identified bug w RH, i am on debian, then some documentation > errors but nothing is really clear. > > i have binaries for spark hadoop and i did just fine with spark sql > module, hive, python, pandas ad yarn. > Locally as i said app is working fine (pandas to spark df to parquet) > But as soon as I move to yarn client mode yarn is not getting packages > required to run app. > > If someone confirms that I need to build everything from source with > specific version of software I will do that, but at this point I am not > sure what to do to remedy this situation... > > --sasha > > > On Sun, Sep 6, 2015 at 8:27 PM, Ashish Dutt <ashish.du...@gmail.com> > wrote: > >> Hi Aleksandar, >> Quite some time ago, I faced the same problem and I found a solution >> which I have posted here on my blog >> <https://edumine.wordpress.com/category/apache-spark/>. >> See if that can help you and if it does not then you can check out these >> questions & solution on stackoverflow >> <http://stackoverflow.com/search?q=no+module+named+pyspark> website >> >> >> Sincerely, >> Ashish Dutt >> >> >> On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com> >> wrote: >> >>> Hi, >>> I am successfully running python app via pyCharm in local mode >>> setMaster("local[*]") >>> >>> When I turn on SparkConf().setMaster("yarn-client") >>> >>> and run via >>> >>> park-submit PysparkPandas.py >>> >>> >>> I run into issue: >>> Error from python worker: >>> /cube/PY/Python27/bin/python: No module named pyspark >>> PYTHONPATH was: >>> >>> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.4.1-hadoop2.6.0.jar >>> >>> I am running java >>> hadoop@pluto:~/pySpark$ /opt/java/jdk/bin/java -version >>> java version "1.8.0_31" >>> Java(TM) SE Runtime Environment (build 1.8.0_31-b13) >>> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) >>> >>> Should I try same thing with java 6/7 >>> >>> Is this packaging issue or I have something wrong with configurations ... >>> >>> Regards, >>> >>> -- >>> Aleksandar Kacanski >>> >> >> > > > -- > Aleksandar Kacanski >