Hi, I am grateful for everyone's response, but sadly no one here actually has read the question before responding.
Has anyone yet tried starting a SPARK cluster as mentioned in the link in my email? :) Regards, Gourav On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado <jom...@me.com> wrote: > $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 > > > > It will download everything for you and register into your JVM. If you > want to use it in your Prod just package it with maven. > > On 15/02/2016, at 12:14, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > > Hi, > > How to we include the following package: > https://github.com/databricks/spark-csv while starting a SPARK standalone > cluster as mentioned here: > http://spark.apache.org/docs/latest/spark-standalone.html > > > > Thanks and Regards, > Gourav Sengupta > > On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com> > wrote: > >> Hi Gourav, >> >> If your question is how to distribute python package dependencies across >> the Spark cluster programmatically? ...here is an example - >> >> $ export >> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application' >> >> And in code: >> >> sc.addPyFile('/path/to/thrift.zip') >> sc.addPyFile('/path/to/happybase.zip') >> >> Regards, >> Ram >> >> >> >> On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com> >> wrote: >> >>> Hi, >>> >>> So far no one is able to get my question at all. I know what it takes to >>> load packages via SPARK shell or SPARK submit. >>> >>> How do I load packages when starting a SPARK cluster, as mentioned here >>> http://spark.apache.org/docs/latest/spark-standalone.html ? >>> >>> >>> Regards, >>> Gourav Sengupta >>> >>> >>> >>> >>> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com> >>> wrote: >>> >>>> with conf option >>>> >>>> spark-submit --conf 'key = value ' >>>> >>>> Hope that helps you. >>>> >>>> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com> >>>> wrote: >>>> >>>>> Hi Gourav, >>>>> you can use like below to load packages at the start of the spark >>>>> shell. >>>>> >>>>> spark-shell --packages com.databricks:spark-csv_2.10:1.1.0 >>>>> >>>>> On 14 February 2016 at 03:34, Gourav Sengupta < >>>>> gourav.sengu...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I was interested in knowing how to load the packages into SPARK >>>>>> cluster started locally. Can someone pass me on the links to set the conf >>>>>> file so that the packages can be loaded? >>>>>> >>>>>> Regards, >>>>>> Gourav >>>>>> >>>>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello Gourav, >>>>>>> >>>>>>> The packages need to be loaded BEFORE you start the JVM, therefore >>>>>>> you won't be able to add packages dynamically in code. You should use >>>>>>> the >>>>>>> --packages with pyspark before you start your application. >>>>>>> One option is to add a `conf` that will load some packages if you >>>>>>> are constantly going to use them. >>>>>>> >>>>>>> Best, >>>>>>> Burak >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta < >>>>>>> gourav.sengu...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am creating sparkcontext in a SPARK standalone cluster as >>>>>>>> mentioned here: >>>>>>>> http://spark.apache.org/docs/latest/spark-standalone.html using >>>>>>>> the following code: >>>>>>>> >>>>>>>> >>>>>>>> -------------------------------------------------------------------------------------------------------------------------- >>>>>>>> sc.stop() >>>>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , >>>>>>>> False) \ >>>>>>>> .setMaster("spark://hostname:7077") \ >>>>>>>> .set('spark.shuffle.service.enabled', True) \ >>>>>>>> .set('spark.dynamicAllocation.enabled','true') \ >>>>>>>> .set('spark.executor.memory','20g') \ >>>>>>>> .set('spark.driver.memory', '4g') \ >>>>>>>> >>>>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) >>>>>>>> conf.getAll() >>>>>>>> sc = SparkContext(conf = conf) >>>>>>>> >>>>>>>> -----(we should definitely be able to optimise the configuration >>>>>>>> but that is not the point here) --- >>>>>>>> >>>>>>>> I am not able to use packages, a list of which is mentioned here >>>>>>>> http://spark-packages.org, using this method. >>>>>>>> >>>>>>>> Where as if I use the standard "pyspark --packages" option then the >>>>>>>> packages load just fine. >>>>>>>> >>>>>>>> I will be grateful if someone could kindly let me know how to load >>>>>>>> packages when starting a cluster as mentioned above. >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Gourav Sengupta >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > >