Hello Gourav,

The packages need to be loaded BEFORE you start the JVM, therefore you
won't be able to add packages dynamically in code. You should use the
--packages with pyspark before you start your application.
One option is to add a `conf` that will load some packages if you are
constantly going to use them.

Best,
Burak



On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> I am creating sparkcontext in a SPARK standalone cluster as mentioned
> here: http://spark.apache.org/docs/latest/spark-standalone.html using the
> following code:
>
>
> --------------------------------------------------------------------------------------------------------------------------
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>                   .setMaster("spark://hostname:7077") \
>                   .set('spark.shuffle.service.enabled', True) \
>                   .set('spark.dynamicAllocation.enabled','true') \
>                   .set('spark.executor.memory','20g') \
>                   .set('spark.driver.memory', '4g') \
>
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
>
> -----(we should definitely be able to optimise the configuration but that
> is not the point here) ---
>
> I am not able to use packages, a list of which is mentioned here
> http://spark-packages.org, using this method.
>
> Where as if I use the standard "pyspark --packages" option then the
> packages load just fine.
>
> I will be grateful if someone could kindly let me know how to load
> packages when starting a cluster as mentioned above.
>
>
> Regards,
> Gourav Sengupta
>

Reply via email to