[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135669#comment-16135669
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/21/17 8:01 PM:
----------------------------------------------------------------------

In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. I suggest we close the bug (if we can) and create an issue for 
docs. If it is a bug at the end of the day, we need a fix and then consider the 
bigger problem of dynamic configuration and spark config semantics.


was (Author: skonto):
In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. I suggest we close the bug (if we can) and create an issue for 
docs.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>
>                 Key: SPARK-21752
>                 URL: https://issues.apache.org/jira/browse/SPARK-21752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to