Hi,

So far no one is able to get my question at all. I know what it takes to
load packages via SPARK shell or SPARK submit.

How do I load packages when starting a SPARK cluster, as mentioned here
http://spark.apache.org/docs/latest/spark-standalone.html ?


Regards,
Gourav Sengupta




On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> with conf option
>
> spark-submit --conf 'key = value '
>
> Hope that helps you.
>
> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Hi Gourav,
>> you can use like below to load packages at the start of the spark shell.
>>
>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>
>> On 14 February 2016 at 03:34, Gourav Sengupta <gourav.sengu...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I was interested in knowing how to load the packages into SPARK cluster
>>> started locally. Can someone pass me on the links to set the conf file so
>>> that the packages can be loaded?
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> wrote:
>>>
>>>> Hello Gourav,
>>>>
>>>> The packages need to be loaded BEFORE you start the JVM, therefore you
>>>> won't be able to add packages dynamically in code. You should use the
>>>> --packages with pyspark before you start your application.
>>>> One option is to add a `conf` that will load some packages if you are
>>>> constantly going to use them.
>>>>
>>>> Best,
>>>> Burak
>>>>
>>>>
>>>>
>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>>>> gourav.sengu...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am creating sparkcontext in a SPARK standalone cluster as mentioned
>>>>> here: http://spark.apache.org/docs/latest/spark-standalone.html using
>>>>> the following code:
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>> sc.stop()
>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>>>>>                   .setMaster("spark://hostname:7077") \
>>>>>                   .set('spark.shuffle.service.enabled', True) \
>>>>>                   .set('spark.dynamicAllocation.enabled','true') \
>>>>>                   .set('spark.executor.memory','20g') \
>>>>>                   .set('spark.driver.memory', '4g') \
>>>>>
>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>>>>> conf.getAll()
>>>>> sc = SparkContext(conf = conf)
>>>>>
>>>>> -----(we should definitely be able to optimise the configuration but
>>>>> that is not the point here) ---
>>>>>
>>>>> I am not able to use packages, a list of which is mentioned here
>>>>> http://spark-packages.org, using this method.
>>>>>
>>>>> Where as if I use the standard "pyspark --packages" option then the
>>>>> packages load just fine.
>>>>>
>>>>> I will be grateful if someone could kindly let me know how to load
>>>>> packages when starting a cluster as mentioned above.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Gourav Sengupta
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to