Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-11 Thread Jeff Zhang
The error message is clear, it is due to the folder permission.  Try to do
that via user root.



Manuel Sopena Ballesteros 于2018年6月12日周二 上午7:42写道:

> Ok, this is what I am getting
>
>
>
> $/tmp/pythonvenv/bin/pip install pandas
>
>
>
> The directory '/home/zeppelin/.cache/pip/http' or its parent directory is
> not owned by the current user and the cache has been disabled. Please check
> the permissions and owner of that directory. If executing pip with sudo,
> you may want sudo's -H flag.
>
> pip is configured with locations that require TLS/SSL, however the ssl
> module in Python is not available.
>
> The directory '/home/zeppelin/.cache/pip' or its parent directory is not
> owned by the current user and caching wheels has been disabled. check the
> permissions and owner of that directory. If executing pip with sudo, you
> may want sudo's -H flag.
>
> Collecting pandas
>
>   Retrying (Retry(total=4, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=3, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=2, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=1, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=0, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Could not find a version that satisfies the requirement pandas (from
> versions: )
>
> No matching distribution found for pandas
>
>   Could not fetch URL https://pypi.python.org/simple/pandas/: There was a
> problem confirming the ssl certificate: HTTPSConnectionPool(host='
> pypi.python.org', port=443): Max retries exceeded with url:
> /simple/pandas/ (Caused by SSLError("Can't connect to HTTPS URL because the
> SSL module is not available.",)) - skipping
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Friday, June 8, 2018 2:54 PM
>
>
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> Just find pip in your python 3.6 folder, and run pip using full path. e.g.
>
>
>
> /tmp/Python-3.6.5/pip install pandas
>
>
>
> Manuel Sopena Ballesteros 于2018年6月8日周五 下午12:47写道:
>
> Sorry for the stupid question
>
>
>
> How can I use pip? Zeppelin will run pip through the shell interpreter but
> my system global python is 2.6…
>
>
>
>
>
>
>
> thanks
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Friday, June 8, 2018 1:45 PM
>
>
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> pip should be available under your python3.6.5, you can use that to
> install pandas
>
>
>
>
>
> Manuel Sopena Ballesteros 于2018年6月8日周五 上午11:40写道:
>
> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know

RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-11 Thread Manuel Sopena Ballesteros
Ok, this is what I am getting

$/tmp/pythonvenv/bin/pip install pandas

The directory '/home/zeppelin/.cache/pip/http' or its parent directory is not 
owned by the current user and the cache has been disabled. Please check the 
permissions and owner of that directory. If executing pip with sudo, you may 
want sudo's -H flag.
pip is configured with locations that require TLS/SSL, however the ssl module 
in Python is not available.
The directory '/home/zeppelin/.cache/pip' or its parent directory is not owned 
by the current user and caching wheels has been disabled. check the permissions 
and owner of that directory. If executing pip with sudo, you may want sudo's -H 
flag.
Collecting pandas
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, 
status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, 
status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, 
status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, 
status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, 
status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)': /simple/pandas/
  Could not find a version that satisfies the requirement pandas (from 
versions: )
No matching distribution found for pandas
  Could not fetch URL https://pypi.python.org/simple/pandas/: There was a 
problem confirming the ssl certificate: 
HTTPSConnectionPool(host='pypi.python.org', port=443): Max retries exceeded 
with url: /simple/pandas/ (Caused by SSLError("Can't connect to HTTPS URL 
because the SSL module is not available.",)) - skipping

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 2:54 PM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


Just find pip in your python 3.6 folder, and run pip using full path. e.g.

/tmp/Python-3.6.5/pip install pandas

Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 下午12:47写道:
Sorry for the stupid question

How can I use pip? Zeppelin will run pip through the shell interpreter but my 
system global python is 2.6…


[cid:image002.jpg@01D3FF37.8827CBF0]

thanks

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com<mailto:zjf...@gmail.com>]
Sent: Friday, June 8, 2018 1:45 PM

To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


pip should be available under your python3.6.5, you can use that to install 
pandas


Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午11:40写道:
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP 
(hortonworks platform) so I already have spark/yarn integration and I am using 
zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system 
environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com<mailto:zjf...@gmail.com>]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has 
drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to 
point to the python that you installed. (I don't know what do you mena that you 
can't install pandas system wide). Do you mean you are not root and don't have 
permission to install python packages ?



Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午9:26写道:
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can’t install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The

RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Sorry for the stupid question

How can I use pip? Zeppelin will run pip through the shell interpreter but my 
system global python is 2.6…


[cid:image002.jpg@01D3FF37.8827CBF0]

thanks

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 1:45 PM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


pip should be available under your python3.6.5, you can use that to install 
pandas


Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午11:40写道:
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP 
(hortonworks platform) so I already have spark/yarn integration and I am using 
zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system 
environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com<mailto:zjf...@gmail.com>]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has 
drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to 
point to the python that you installed. (I don't know what do you mena that you 
can't install pandas system wide). Do you mean you are not root and don't have 
permission to install python packages ?



Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午9:26写道:
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can’t install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 
8507 | E: 
manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
pip should be available under your python3.6.5, you can use that to install
pandas


Manuel Sopena Ballesteros 于2018年6月8日周五 上午11:40写道:

> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know what do you mena
> that you can't install pandas system wide). Do you mean you are not root
> and don't have permission to install python packages ?
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros 于2018年6月8日周五 上午9:26写道:
>
> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel...@garvan.org.au
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP 
(hortonworks platform) so I already have spark/yarn integration and I am using 
zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system 
environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has 
drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to 
point to the python that you installed. (I don't know what do you mena that you 
can't install pandas system wide). Do you mean you are not root and don't have 
permission to install python packages ?



Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午9:26写道:
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can’t install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 
8507 | E: 
manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
First I would suggest you to use python 2.7 or python 3.x, because spark2.x
has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to
point to the python that you installed. (I don't know what do you mena that
you can't install pandas system wide). Do you mean you are not root and
don't have permission to install python packages ?



Manuel Sopena Ballesteros 于2018年6月8日周五 上午9:26写道:

> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> 
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>