Re: Using SPARK packages in Spark Cluster

Gourav Sengupta Mon, 15 Feb 2016 03:40:36 -0800

Hi,

I am grateful for everyone's response, but sadly no one here actually has
read the question before responding.


Has anyone yet tried starting a SPARK cluster as mentioned in the link in
my email?

:)

Regards,
Gourav

On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado <jom...@me.com> wrote:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
>
>
>
> It will download everything for you and register into your  JVM.  If you
> want to use it in your Prod just package it with maven.
>
> On 15/02/2016, at 12:14, Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
> Hi,
>
> How to we include the following package:
> https://github.com/databricks/spark-csv while starting a SPARK standalone
> cluster as mentioned here:
> http://spark.apache.org/docs/latest/spark-standalone.html
>
>
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com>
> wrote:
>
>> Hi Gourav,
>>
>> If your question is how to distribute python package dependencies across
>> the Spark cluster programmatically? ...here is an example -
>>
>>          $ export
>> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
>>
>> And in code:
>>
>>         sc.addPyFile('/path/to/thrift.zip')
>>         sc.addPyFile('/path/to/happybase.zip')
>>
>> Regards,
>> Ram
>>
>>
>>
>> On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> So far no one is able to get my question at all. I know what it takes to
>>> load packages via SPARK shell or SPARK submit.
>>>
>>> How do I load packages when starting a SPARK cluster, as mentioned here
>>> http://spark.apache.org/docs/latest/spark-standalone.html ?
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>>
>>>
>>>
>>> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com>
>>> wrote:
>>>
>>>> with conf option
>>>>
>>>> spark-submit --conf 'key = value '
>>>>
>>>> Hope that helps you.
>>>>
>>>> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gourav,
>>>>> you can use like below to load packages at the start of the spark
>>>>> shell.
>>>>>
>>>>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>>>>
>>>>> On 14 February 2016 at 03:34, Gourav Sengupta <
>>>>> gourav.sengu...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was interested in knowing how to load the packages into SPARK
>>>>>> cluster started locally. Can someone pass me on the links to set the conf
>>>>>> file so that the packages can be loaded?
>>>>>>
>>>>>> Regards,
>>>>>> Gourav
>>>>>>
>>>>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Gourav,
>>>>>>>
>>>>>>> The packages need to be loaded BEFORE you start the JVM, therefore
>>>>>>> you won't be able to add packages dynamically in code. You should use 
>>>>>>> the
>>>>>>> --packages with pyspark before you start your application.
>>>>>>> One option is to add a `conf` that will load some packages if you
>>>>>>> are constantly going to use them.
>>>>>>>
>>>>>>> Best,
>>>>>>> Burak
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>>>>>>> gourav.sengu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am creating sparkcontext in a SPARK standalone cluster as
>>>>>>>> mentioned here:
>>>>>>>> http://spark.apache.org/docs/latest/spark-standalone.html using
>>>>>>>> the following code:
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>>>>> sc.stop()
>>>>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' ,
>>>>>>>> False) \
>>>>>>>>                   .setMaster("spark://hostname:7077") \
>>>>>>>>                   .set('spark.shuffle.service.enabled', True) \
>>>>>>>>                   .set('spark.dynamicAllocation.enabled','true') \
>>>>>>>>                   .set('spark.executor.memory','20g') \
>>>>>>>>                   .set('spark.driver.memory', '4g') \
>>>>>>>>
>>>>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>>>>>>>> conf.getAll()
>>>>>>>> sc = SparkContext(conf = conf)
>>>>>>>>
>>>>>>>> -----(we should definitely be able to optimise the configuration
>>>>>>>> but that is not the point here) ---
>>>>>>>>
>>>>>>>> I am not able to use packages, a list of which is mentioned here
>>>>>>>> http://spark-packages.org, using this method.
>>>>>>>>
>>>>>>>> Where as if I use the standard "pyspark --packages" option then the
>>>>>>>> packages load just fine.
>>>>>>>>
>>>>>>>> I will be grateful if someone could kindly let me know how to load
>>>>>>>> packages when starting a cluster as mentioned above.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Gourav Sengupta
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: Using SPARK packages in Spark Cluster

Reply via email to