Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-11 Thread Jeff Zhang
The error message is clear, it is due to the folder permission. Try to do that via user root. Manuel Sopena Ballesteros 于2018年6月12日周二 上午7:42写道: > Ok, this is what I am getting > > > > $/tmp/pythonvenv/bin/pip install pandas > > > > The directory '/home/zeppelin/.cache/pip/http' or its parent

RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-11 Thread Manuel Sopena Ballesteros
Ok, this is what I am getting $/tmp/pythonvenv/bin/pip install pandas The directory '/home/zeppelin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo,

RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Sorry for the stupid question How can I use pip? Zeppelin will run pip through the shell interpreter but my system global python is 2.6… [cid:image002.jpg@01D3FF37.8827CBF0] thanks Manuel From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Friday, June 8, 2018 1:45 PM To:

Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
pip should be available under your python3.6.5, you can use that to install pandas Manuel Sopena Ballesteros 于2018年6月8日周五 上午11:40写道: > Hi Jeff, > > > > Thank you very much for your quick response. My zeppelin is deployed using > HDP (hortonworks platform) so I already have spark/yarn

RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Hi Jeff, Thank you very much for your quick response. My zeppelin is deployed using HDP (hortonworks platform) so I already have spark/yarn integration and I am using zeppelin.pyspark.python to tell pyspark to run python 3.6: zeppelin.pyspark.python --> /tmp/Python-3.6.5/python I do have root

Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
First I would suggest you to use python 2.7 or python 3.x, because spark2.x has drop the support of python 2.6. Second you need to configure PYSPARK_PYTHON in spark interpreter setting to point to the python that you installed. (I don't know what do you mena that you can't install pandas system