Hello Gourav, The packages need to be loaded BEFORE you start the JVM, therefore you won't be able to add packages dynamically in code. You should use the --packages with pyspark before you start your application. One option is to add a `conf` that will load some packages if you are constantly going to use them.
Best, Burak On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > > I am creating sparkcontext in a SPARK standalone cluster as mentioned > here: http://spark.apache.org/docs/latest/spark-standalone.html using the > following code: > > > -------------------------------------------------------------------------------------------------------------------------- > sc.stop() > conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ > .setMaster("spark://hostname:7077") \ > .set('spark.shuffle.service.enabled', True) \ > .set('spark.dynamicAllocation.enabled','true') \ > .set('spark.executor.memory','20g') \ > .set('spark.driver.memory', '4g') \ > > .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) > conf.getAll() > sc = SparkContext(conf = conf) > > -----(we should definitely be able to optimise the configuration but that > is not the point here) --- > > I am not able to use packages, a list of which is mentioned here > http://spark-packages.org, using this method. > > Where as if I use the standard "pyspark --packages" option then the > packages load just fine. > > I will be grateful if someone could kindly let me know how to load > packages when starting a cluster as mentioned above. > > > Regards, > Gourav Sengupta >