[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130259#comment-16130259
 ] 

Stavros Kontopoulos commented on SPARK-21752:
---------------------------------------------

[~jsnowacki] Do you have a step by step script to verify it, because I was not 
able to reproduce it although my notebook works fine (tested it with other 
commands in local mode), maybe I am missing something here. I am still curious 
btw to see the path called and see why this works accidently. Now back to the 
essence of stuff.
In the docs it states how you can create a spark session for an application eg. 
submitted jar or something, IMHO it does not imply use in a shell. For the 
shell everything is taken care for you regarding the session stuff and the only 
control you have is by passing parameters or env vars. Even if you dont use a 
shell you need to pass the options either in config or via env vars as the 
logic for packages is in the spark submit functionality (to the best of my 
knowledge).   
Jupyter integration is another story for which I am not sure if you cannot pass 
the vars elsewhere.
Of course there is a misleading statement in the docs:
"All of the examples on this page use sample data included in the Spark 
distribution and can be run in the spark-shell, pyspark shell, or sparkR shell."
I guess this does not apply for the spark session creation example.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>
>                 Key: SPARK-21752
>                 URL: https://issues.apache.org/jira/browse/SPARK-21752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to