Re: installing packages with pyspark
For some, like graphframes that are Spark packages, you could also use --packages in the command line of spark-submit or pyspark. Seehttp://spark.apache.org/docs/latest/submitting-applications.html _ From: Jakob Odersky <ja...@odersky.com> Sent: Thursday, March 17, 2016 6:40 PM Subject: Re: installing packages with pyspark To: Ajinkya Kale <kaleajin...@gmail.com> Cc: <user@spark.apache.org> Hi, regarding 1, packages are resolved locally. That means that when you specify a package, spark-submit will resolve the dependencies and download any jars on the local machine, before shipping* them to the cluster. So, without a priori knowledge of dataproc clusters, it should be no different to specify packages. Unfortunatly I can't help with 2. --Jakob *shipping in this case means making them available via the network On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> wrote: > Hi all, > > I had couple of questions. > 1. Is there documentation on how to add the graphframes or any other package > > for that matter on the google dataproc managed spark clusters ? > > 2. Is there a way to add a package to an existing pyspark context through a > > jupyter notebook ? > > --aj - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: installing packages with pyspark
Thanks - I'll give that a try cheers On 20 March 2016 at 09:42, Felix Cheung <felixcheun...@hotmail.com> wrote: > You are running pyspark in Spark client deploy mode. I have ran into the > same error as well and I'm not sure if this is graphframes specific - the > python process can't find the graphframes Python code when it is loaded as > a Spark package. > > To workaround this, I extract the graphframes Python directory locally > where I run pyspark into a directory called graphframes. > > > > > > > On Thu, Mar 17, 2016 at 10:11 PM -0700, "Franc Carter" < > franc.car...@gmail.com> wrote: > > > I'm having trouble with that for pyspark, yarn and graphframes. I'm using:- > > pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5 > > which starts and gives me a REPL, but when I try > >from graphframes import * > > I get > > No module names graphframes > > without '--master yarn' it works as expected > > thanks > > > On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote: > > For some, like graphframes that are Spark packages, you could also use > --packages in the command line of spark-submit or pyspark. See > http://spark.apache.org/docs/latest/submitting-applications.html > > _____ > From: Jakob Odersky <ja...@odersky.com> > Sent: Thursday, March 17, 2016 6:40 PM > Subject: Re: installing packages with pyspark > To: Ajinkya Kale <kaleajin...@gmail.com> > Cc: <user@spark.apache.org> > > > > Hi, > regarding 1, packages are resolved locally. That means that when you > specify a package, spark-submit will resolve the dependencies and > download any jars on the local machine, before shipping* them to the > cluster. So, without a priori knowledge of dataproc clusters, it > should be no different to specify packages. > > Unfortunatly I can't help with 2. > > --Jakob > > *shipping in this case means making them available via the network > > On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> > wrote: > > Hi all, > > > > I had couple of questions. > > 1. Is there documentation on how to add the graphframes or any other > package > > for that matter on the google dataproc managed spark clusters ? > > > > 2. Is there a way to add a package to an existing pyspark context > through a > > jupyter notebook ? > > > > --aj > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > > -- > Franc > -- Franc
Re: installing packages with pyspark
You are running pyspark in Spark client deploy mode. I have ran into the same error as well and I'm not sure if this is graphframes specific - the python process can't find the graphframes Python code when it is loaded as a Spark package. To workaround this, I extract the graphframes Python directory locally where I run pyspark into a directory called graphframes. On Thu, Mar 17, 2016 at 10:11 PM -0700, "Franc Carter" <franc.car...@gmail.com> wrote: I'm having trouble with that for pyspark, yarn and graphframes. I'm using:- pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5 which starts and gives me a REPL, but when I try from graphframes import * I get No module names graphframes without '--master yarn' it works as expected thanks On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote: > For some, like graphframes that are Spark packages, you could also use > --packages in the command line of spark-submit or pyspark. See > http://spark.apache.org/docs/latest/submitting-applications.html > > _ > From: Jakob Odersky <ja...@odersky.com> > Sent: Thursday, March 17, 2016 6:40 PM > Subject: Re: installing packages with pyspark > To: Ajinkya Kale <kaleajin...@gmail.com> > Cc: <user@spark.apache.org> > > > > Hi, > regarding 1, packages are resolved locally. That means that when you > specify a package, spark-submit will resolve the dependencies and > download any jars on the local machine, before shipping* them to the > cluster. So, without a priori knowledge of dataproc clusters, it > should be no different to specify packages. > > Unfortunatly I can't help with 2. > > --Jakob > > *shipping in this case means making them available via the network > > On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> > wrote: > > Hi all, > > > > I had couple of questions. > > 1. Is there documentation on how to add the graphframes or any other > package > > for that matter on the google dataproc managed spark clusters ? > > > > 2. Is there a way to add a package to an existing pyspark context > through a > > jupyter notebook ? > > > > --aj > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > -- Franc
Re: installing packages with pyspark
Thanks Jakob, Felix. I am aware you can do it with --packages but i was wondering if there is a way to do something like "!pip install " like i do for other packages from jupyter notebook for python. But I guess I cannot add a package once i launch the pyspark context right ? On Thu, Mar 17, 2016 at 6:59 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > For some, like graphframes that are Spark packages, you could also use > --packages in the command line of spark-submit or pyspark. See > http://spark.apache.org/docs/latest/submitting-applications.html > > _ > From: Jakob Odersky <ja...@odersky.com> > Sent: Thursday, March 17, 2016 6:40 PM > Subject: Re: installing packages with pyspark > To: Ajinkya Kale <kaleajin...@gmail.com> > Cc: <user@spark.apache.org> > > > Hi, > regarding 1, packages are resolved locally. That means that when you > specify a package, spark-submit will resolve the dependencies and > download any jars on the local machine, before shipping* them to the > cluster. So, without a priori knowledge of dataproc clusters, it > should be no different to specify packages. > > Unfortunatly I can't help with 2. > > --Jakob > > *shipping in this case means making them available via the network > > On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> > wrote: > > Hi all, > > > > I had couple of questions. > > 1. Is there documentation on how to add the graphframes or any other > package > > for that matter on the google dataproc managed spark clusters ? > > > > 2. Is there a way to add a package to an existing pyspark context > through a > > jupyter notebook ? > > > > --aj > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > >
Re: installing packages with pyspark
Hi, regarding 1, packages are resolved locally. That means that when you specify a package, spark-submit will resolve the dependencies and download any jars on the local machine, before shipping* them to the cluster. So, without a priori knowledge of dataproc clusters, it should be no different to specify packages. Unfortunatly I can't help with 2. --Jakob *shipping in this case means making them available via the network On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kalewrote: > Hi all, > > I had couple of questions. > 1. Is there documentation on how to add the graphframes or any other package > for that matter on the google dataproc managed spark clusters ? > > 2. Is there a way to add a package to an existing pyspark context through a > jupyter notebook ? > > --aj - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: installing packages with pyspark
I'm having trouble with that for pyspark, yarn and graphframes. I'm using:- pyspark --master yarn --packages graphframes:graphframes:0.1.0-spark1.5 which starts and gives me a REPL, but when I try from graphframes import * I get No module names graphframes without '--master yarn' it works as expected thanks On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wrote: > For some, like graphframes that are Spark packages, you could also use > --packages in the command line of spark-submit or pyspark. See > http://spark.apache.org/docs/latest/submitting-applications.html > > _ > From: Jakob Odersky <ja...@odersky.com> > Sent: Thursday, March 17, 2016 6:40 PM > Subject: Re: installing packages with pyspark > To: Ajinkya Kale <kaleajin...@gmail.com> > Cc: <user@spark.apache.org> > > > > Hi, > regarding 1, packages are resolved locally. That means that when you > specify a package, spark-submit will resolve the dependencies and > download any jars on the local machine, before shipping* them to the > cluster. So, without a priori knowledge of dataproc clusters, it > should be no different to specify packages. > > Unfortunatly I can't help with 2. > > --Jakob > > *shipping in this case means making them available via the network > > On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> > wrote: > > Hi all, > > > > I had couple of questions. > > 1. Is there documentation on how to add the graphframes or any other > package > > for that matter on the google dataproc managed spark clusters ? > > > > 2. Is there a way to add a package to an existing pyspark context > through a > > jupyter notebook ? > > > > --aj > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > -- Franc
Re: installing packages with pyspark
> But I guess I cannot add a package once i launch the pyspark context right ? Correct. Potentially, if you really really wanted to, you could maybe (with lots of pain) load packages dynamically with some class-loader black magic, but Spark does not provide that functionality. On Thu, Mar 17, 2016 at 7:20 PM, Ajinkya Kale <kaleajin...@gmail.com> wrote: > Thanks Jakob, Felix. I am aware you can do it with --packages but i was > wondering if there is a way to do something like "!pip install " > like i do for other packages from jupyter notebook for python. But I guess I > cannot add a package once i launch the pyspark context right ? > > On Thu, Mar 17, 2016 at 6:59 PM Felix Cheung <felixcheun...@hotmail.com> > wrote: >> >> For some, like graphframes that are Spark packages, you could also use >> --packages in the command line of spark-submit or pyspark. See >> http://spark.apache.org/docs/latest/submitting-applications.html >> >> _ >> From: Jakob Odersky <ja...@odersky.com> >> Sent: Thursday, March 17, 2016 6:40 PM >> Subject: Re: installing packages with pyspark >> To: Ajinkya Kale <kaleajin...@gmail.com> >> Cc: <user@spark.apache.org> >> >> >> Hi, >> regarding 1, packages are resolved locally. That means that when you >> specify a package, spark-submit will resolve the dependencies and >> download any jars on the local machine, before shipping* them to the >> cluster. So, without a priori knowledge of dataproc clusters, it >> should be no different to specify packages. >> >> Unfortunatly I can't help with 2. >> >> --Jakob >> >> *shipping in this case means making them available via the network >> >> On Thu, Mar 17, 2016 at 5:36 PM, Ajinkya Kale <kaleajin...@gmail.com> >> wrote: >> > Hi all, >> > >> > I had couple of questions. >> > 1. Is there documentation on how to add the graphframes or any other >> > package >> > for that matter on the google dataproc managed spark clusters ? >> > >> > 2. Is there a way to add a package to an existing pyspark context >> > through a >> > jupyter notebook ? >> > >> > --aj >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
installing packages with pyspark
Hi all, I had couple of questions. 1. Is there documentation on how to add the graphframes or any other package for that matter on the google dataproc managed spark clusters ? 2. Is there a way to add a package to an existing pyspark context through a jupyter notebook ? --aj