[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129475#comment-16129475
 ] 

Jakub Nowacki edited comment on SPARK-21752 at 8/16/17 9:46 PM:
----------------------------------------------------------------

I'm aware you cannot do it with pyspark command as you have a session 
automatically created there. 

We use this spark session creation with Jupyter notebook or some workflow 
scripts (e.g. used in Airflow), so this is pretty much bare Python with pyspark 
being a module; much like creating SparkSession in Scala object's main 
function. I'm assuming you don't have SparkSession running beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't 
work nonetheless as the parenthesis gives you just a syntax error.


was (Author: jsnowacki):
OK so you don't need session creation with pyspark command line. We use this 
spark session creation with Jupyter notebook, so this is pretty much bare 
Python with pyspark being a module; much like creating SparkSession in Scala 
object's main function. I'm assuming you don't have SparkSession running 
beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't 
work nonetheless as the parenthesis gives you just a syntax error.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>
>                 Key: SPARK-21752
>                 URL: https://issues.apache.org/jira/browse/SPARK-21752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to