[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129466#comment-16129466
 ] 

Jakub Nowacki commented on SPARK-21752:
---------------------------------------

Not really, maybe you're missing {{import pyspark}} at the top but that's it. I 
checked it with at least two different Spark versions set up in two different 
environments, on other versions of Python and it behaves the same, i.e. imports 
the package and works fine. Note that you'd need to change the {{mongo}} in the 
address to a correct one as well, depending on your setup.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>
>                 Key: SPARK-21752
>                 URL: https://issues.apache.org/jira/browse/SPARK-21752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to