Hello Sasha,

I have no answer for debian. My cluster is on Linux and I'm using CDH 5.4
Your question-  "Error from python worker:
  /cube/PY/Python27/bin/python: No module named pyspark"

On a single node (ie one server/machine/computer) I installed pyspark
binaries and it worked. Connected it to pycharm and it worked too.

Next I tried executing pyspark command on another node (say the worker) in
the cluster and i got this error message, Error from python worker: PATH:
No module named pyspark".

My first guess was that the worker is not picking up the path of pyspark
binaries installed on the server ( I tried many a things like hard-coding
the pyspark path in the config.sh file on the worker- NO LUCK; tried
dynamic path from the code in pycharm- NO LUCK... ; searched the web and
asked the question in almost every online forum--NO LUCK..; banged my head
several times with pyspark/hadoop books--NO LUCK... Finally, one fine day a
'watermelon' dropped while brooding on this problem and I installed pyspark
binaries on all the worker machines ) Now when I tried executing just the
command pyspark on the worker's it worked. Tried some simple program
snippets on each worker, it works too.

I am not sure if this will help or not for your use-case.



Sincerely,
Ashish

On Mon, Sep 7, 2015 at 11:04 PM, Sasha Kacanski <skacan...@gmail.com> wrote:

> Thanks Ashish,
> nice blog but does not cover my issue. Actually I have pycharm running and
> loading pyspark and rest of libraries perfectly fine.
> My issue is that I am not sure what is triggering
>
> Error from python worker:
>   /cube/PY/Python27/bin/python: No module named pyspark
> pyspark
> PYTHONPATH was:
>
> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.
> 4.1-hadoop2.6.0.jar
>
> Question is why is yarn not getting python package to run on the single
> node via YARN?
> Some people are saying run with JAVA 6 due to zip library changes between
> 6/7/8, some identified bug w RH, i am on debian,  then some documentation
> errors but nothing is really clear.
>
> i have binaries for spark hadoop and i did just fine with spark sql
> module, hive, python, pandas ad yarn.
> Locally as i said app is working fine (pandas to spark df to parquet)
> But as soon as I move to yarn client mode yarn is not getting packages
> required to run app.
>
> If someone confirms that I need to build everything from source with
> specific version of software I will do that, but at this point I am not
> sure what to do to remedy this situation...
>
> --sasha
>
>
> On Sun, Sep 6, 2015 at 8:27 PM, Ashish Dutt <ashish.du...@gmail.com>
> wrote:
>
>> Hi Aleksandar,
>> Quite some time ago, I faced the same problem and I found a solution
>> which I have posted here on my blog
>> <https://edumine.wordpress.com/category/apache-spark/>.
>> See if that can help you and if it does not then you can check out these
>> questions & solution on stackoverflow
>> <http://stackoverflow.com/search?q=no+module+named+pyspark> website
>>
>>
>> Sincerely,
>> Ashish Dutt
>>
>>
>> On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I am successfully running python app via pyCharm in local mode
>>> setMaster("local[*]")
>>>
>>> When I turn on SparkConf().setMaster("yarn-client")
>>>
>>> and run via
>>>
>>> park-submit PysparkPandas.py
>>>
>>>
>>> I run into issue:
>>> Error from python worker:
>>>   /cube/PY/Python27/bin/python: No module named pyspark
>>> PYTHONPATH was:
>>>
>>> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.4.1-hadoop2.6.0.jar
>>>
>>> I am running java
>>> hadoop@pluto:~/pySpark$ /opt/java/jdk/bin/java -version
>>> java version "1.8.0_31"
>>> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
>>>
>>> Should I try same thing with java 6/7
>>>
>>> Is this packaging issue or I have something wrong with configurations ...
>>>
>>> Regards,
>>>
>>> --
>>> Aleksandar Kacanski
>>>
>>
>>
>
>
> --
> Aleksandar Kacanski
>

Reply via email to