Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
For some, like graphframes that are Spark packages, you could also use 
--packages in the command line of spark-submit or pyspark. 
Seehttp://spark.apache.org/docs/latest/submitting-applications.html

_
From: Jakob Odersky <ja...@odersky.com>
Sent: Thursday, March 17, 2016 6:40 PM
Subject: Re: installing packages with pyspark
To: Ajinkya Kale <kaleajin...@gmail.com>
Cc:  <user@spark.apache.org>


   Hi,   
 regarding 1, packages are resolved locally. That means that when you   
 specify a package, spark-submit will resolve the dependencies and   
 download any jars on the local machine, before shipping* them to the   
 cluster. So, without a priori knowledge of dataproc clusters, it   
 should be no different to specify packages.   

 Unfortunatly I can't help with 2.   

 --Jakob   

 *shipping in this case means making them available via the network   

 On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> wrote:   
 > Hi all,   
 >   
 > I had couple of questions.   
 > 1. Is there documentation on how to add the graphframes or any other package 
 >   
 > for that matter on the google dataproc managed spark clusters ?   
 >   
 > 2. Is there a way to add a package to an existing pyspark context through a  
 >  
 > jupyter notebook ?   
 >   
 > --aj   

 -   
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org   
 For additional commands, e-mail: user-h...@spark.apache.org   

   


  

Re: installing packages with pyspark

2016-03-19 Thread Franc Carter
Thanks - I'll give that a try

cheers

On 20 March 2016 at 09:42, Felix Cheung <felixcheun...@hotmail.com> wrote:

> You are running pyspark in Spark client deploy mode. I have ran into the
> same error as well and I'm not sure if this is graphframes specific - the
> python process can't find the graphframes Python code when it is loaded as
> a Spark package.
>
> To workaround this, I extract the graphframes Python directory locally
> where I run pyspark into a directory called graphframes.
>
>
>
>
>
>
> On Thu, Mar 17, 2016 at 10:11 PM -0700, "Franc Carter" <
> franc.car...@gmail.com> wrote:
>
>
> I'm having trouble with that for pyspark, yarn and graphframes. I'm using:-
>
> pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5
>
> which starts and gives me a REPL, but when I try
>
>from graphframes import *
>
> I get
>
>   No module names graphframes
>
> without '--master yarn' it works as expected
>
> thanks
>
>
> On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote:
>
> For some, like graphframes that are Spark packages, you could also use
> --packages in the command line of spark-submit or pyspark. See
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> _____
> From: Jakob Odersky <ja...@odersky.com>
> Sent: Thursday, March 17, 2016 6:40 PM
> Subject: Re: installing packages with pyspark
> To: Ajinkya Kale <kaleajin...@gmail.com>
> Cc: <user@spark.apache.org>
>
>
>
> Hi,
> regarding 1, packages are resolved locally. That means that when you
> specify a package, spark-submit will resolve the dependencies and
> download any jars on the local machine, before shipping* them to the
> cluster. So, without a priori knowledge of dataproc clusters, it
> should be no different to specify packages.
>
> Unfortunatly I can't help with 2.
>
> --Jakob
>
> *shipping in this case means making them available via the network
>
> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com>
> wrote:
> > Hi all,
> >
> > I had couple of questions.
> > 1. Is there documentation on how to add the graphframes or any other
> package
> > for that matter on the google dataproc managed spark clusters ?
> >
> > 2. Is there a way to add a package to an existing pyspark context
> through a
> > jupyter notebook ?
> >
> > --aj
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
> --
> Franc
>



-- 
Franc


Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
You are running pyspark in Spark client deploy mode. I have ran into the same 
error as well and I'm not sure if this is graphframes specific - the python 
process can't find the graphframes Python code when it is loaded as a Spark 
package.
To workaround this, I extract the graphframes Python directory locally where I 
run pyspark into a directory called graphframes.






On Thu, Mar 17, 2016 at 10:11 PM -0700, "Franc Carter" <franc.car...@gmail.com> 
wrote:





I'm having trouble with that for pyspark, yarn and graphframes. I'm using:-

pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5

which starts and gives me a REPL, but when I try

   from graphframes import *

I get

  No module names graphframes

without '--master yarn' it works as expected

thanks


On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote:

> For some, like graphframes that are Spark packages, you could also use
> --packages in the command line of spark-submit or pyspark. See
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> _
> From: Jakob Odersky <ja...@odersky.com>
> Sent: Thursday, March 17, 2016 6:40 PM
> Subject: Re: installing packages with pyspark
> To: Ajinkya Kale <kaleajin...@gmail.com>
> Cc: <user@spark.apache.org>
>
>
>
> Hi,
> regarding 1, packages are resolved locally. That means that when you
> specify a package, spark-submit will resolve the dependencies and
> download any jars on the local machine, before shipping* them to the
> cluster. So, without a priori knowledge of dataproc clusters, it
> should be no different to specify packages.
>
> Unfortunatly I can't help with 2.
>
> --Jakob
>
> *shipping in this case means making them available via the network
>
> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com>
> wrote:
> > Hi all,
> >
> > I had couple of questions.
> > 1. Is there documentation on how to add the graphframes or any other
> package
> > for that matter on the google dataproc managed spark clusters ?
> >
> > 2. Is there a way to add a package to an existing pyspark context
> through a
> > jupyter notebook ?
> >
> > --aj
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>


--
Franc


Re: installing packages with pyspark

2016-03-19 Thread Ajinkya Kale
Thanks Jakob, Felix. I am aware you can do it with --packages but i was
wondering if there is a way to do something like "!pip install "
like i do for other packages from jupyter notebook for python. But I guess
I cannot add a package once i launch the pyspark context right ?

On Thu, Mar 17, 2016 at 6:59 PM Felix Cheung <felixcheun...@hotmail.com>
wrote:

> For some, like graphframes that are Spark packages, you could also use
> --packages in the command line of spark-submit or pyspark. See
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> _
> From: Jakob Odersky <ja...@odersky.com>
> Sent: Thursday, March 17, 2016 6:40 PM
> Subject: Re: installing packages with pyspark
> To: Ajinkya Kale <kaleajin...@gmail.com>
> Cc: <user@spark.apache.org>
>
>
> Hi,
> regarding 1, packages are resolved locally. That means that when you
> specify a package, spark-submit will resolve the dependencies and
> download any jars on the local machine, before shipping* them to the
> cluster. So, without a priori knowledge of dataproc clusters, it
> should be no different to specify packages.
>
> Unfortunatly I can't help with 2.
>
> --Jakob
>
> *shipping in this case means making them available via the network
>
> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com>
> wrote:
> > Hi all,
> >
> > I had couple of questions.
> > 1. Is there documentation on how to add the graphframes or any other
> package
> > for that matter on the google dataproc managed spark clusters ?
> >
> > 2. Is there a way to add a package to an existing pyspark context
> through a
> > jupyter notebook ?
> >
> > --aj
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>


Re: installing packages with pyspark

2016-03-19 Thread Jakob Odersky
Hi,
regarding 1, packages are resolved locally. That means that when you
specify a package, spark-submit will resolve the dependencies and
download any jars on the local machine, before shipping* them to the
cluster. So, without a priori knowledge of dataproc clusters, it
should be no different to specify packages.

Unfortunatly I can't help with 2.

--Jakob

*shipping in this case means making them available via the network

On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale  wrote:
> Hi all,
>
> I had couple of questions.
> 1. Is there documentation on how to add the graphframes or any other package
> for that matter on the google dataproc managed spark clusters ?
>
> 2. Is there a way to add a package to an existing pyspark context through a
> jupyter notebook ?
>
> --aj

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: installing packages with pyspark

2016-03-19 Thread Franc Carter
I'm having trouble with that for pyspark, yarn and graphframes. I'm using:-

pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5

which starts and gives me a REPL, but when I try

   from graphframes import *

I get

  No module names graphframes

without '--master yarn' it works as expected

thanks


On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote:

> For some, like graphframes that are Spark packages, you could also use
> --packages in the command line of spark-submit or pyspark. See
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> _
> From: Jakob Odersky <ja...@odersky.com>
> Sent: Thursday, March 17, 2016 6:40 PM
> Subject: Re: installing packages with pyspark
> To: Ajinkya Kale <kaleajin...@gmail.com>
> Cc: <user@spark.apache.org>
>
>
>
> Hi,
> regarding 1, packages are resolved locally. That means that when you
> specify a package, spark-submit will resolve the dependencies and
> download any jars on the local machine, before shipping* them to the
> cluster. So, without a priori knowledge of dataproc clusters, it
> should be no different to specify packages.
>
> Unfortunatly I can't help with 2.
>
> --Jakob
>
> *shipping in this case means making them available via the network
>
> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com>
> wrote:
> > Hi all,
> >
> > I had couple of questions.
> > 1. Is there documentation on how to add the graphframes or any other
> package
> > for that matter on the google dataproc managed spark clusters ?
> >
> > 2. Is there a way to add a package to an existing pyspark context
> through a
> > jupyter notebook ?
> >
> > --aj
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>


-- 
Franc


Re: installing packages with pyspark

2016-03-19 Thread Jakob Odersky
> But I guess I cannot add a package once i launch the pyspark context right ?

Correct. Potentially, if you really really wanted to, you could maybe
(with lots of pain) load packages dynamically with some class-loader
black magic, but Spark does not provide that functionality.

On Thu, Mar 17, 2016 at 7:20 PM, Ajinkya Kale <kaleajin...@gmail.com> wrote:
> Thanks Jakob, Felix. I am aware you can do it with --packages but i was
> wondering if there is a way to do something like "!pip install "
> like i do for other packages from jupyter notebook for python. But I guess I
> cannot add a package once i launch the pyspark context right ?
>
> On Thu, Mar 17, 2016 at 6:59 PM Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>>
>> For some, like graphframes that are Spark packages, you could also use
>> --packages in the command line of spark-submit or pyspark. See
>> http://spark.apache.org/docs/latest/submitting-applications.html
>>
>> _
>> From: Jakob Odersky <ja...@odersky.com>
>> Sent: Thursday, March 17, 2016 6:40 PM
>> Subject: Re: installing packages with pyspark
>> To: Ajinkya Kale <kaleajin...@gmail.com>
>> Cc: <user@spark.apache.org>
>>
>>
>> Hi,
>> regarding 1, packages are resolved locally. That means that when you
>> specify a package, spark-submit will resolve the dependencies and
>> download any jars on the local machine, before shipping* them to the
>> cluster. So, without a priori knowledge of dataproc clusters, it
>> should be no different to specify packages.
>>
>> Unfortunatly I can't help with 2.
>>
>> --Jakob
>>
>> *shipping in this case means making them available via the network
>>
>> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I had couple of questions.
>> > 1. Is there documentation on how to add the graphframes or any other
>> > package
>> > for that matter on the google dataproc managed spark clusters ?
>> >
>> > 2. Is there a way to add a package to an existing pyspark context
>> > through a
>> > jupyter notebook ?
>> >
>> > --aj
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



installing packages with pyspark

2016-03-19 Thread Ajinkya Kale
Hi all,

I had couple of questions.
1. Is there documentation on how to add the graphframes or any other
package for that matter on the google dataproc managed spark clusters ?

2. Is there a way to add a package to an existing pyspark context through a
jupyter notebook ?

--aj