[ https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129475#comment-16129475 ]
Jakub Nowacki commented on SPARK-21752: --------------------------------------- OK so you don't need session creation with pyspark command line. We use this spark session creation with Jupyter notebook, so this is pretty much bare Python with pyspark being a module; much like creating SparkSession in Scala object's main function. I'm assuming you don't have SparkSession running beforehand. As for the double parenthesis in the first one, yes true, sorry. But it doesn't work nonetheless as the parenthesis gives you just a syntax error. > Config spark.jars.packages is ignored in SparkSession config > ------------------------------------------------------------ > > Key: SPARK-21752 > URL: https://issues.apache.org/jira/browse/SPARK-21752 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Jakub Nowacki > > If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder > as follows: > {code} > spark = pyspark.sql.SparkSession.builder\ > .appName('test-mongo')\ > .master('local[*]')\ > .config("spark.jars.packages", > "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\ > .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \ > .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \ > .getOrCreate() > {code} > the SparkSession gets created but there are no package download logs printed, > and if I use the loaded classes, Mongo connector in this case, but it's the > same for other packages, I get {{java.lang.ClassNotFoundException}} for the > missing classes. > If I use the config file {{conf/spark-defaults.comf}}, command line option > {{--packages}}, e.g.: > {code} > import os > os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages > org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell' > {code} > it works fine. Interestingly, using {{SparkConf}} object works fine as well, > e.g.: > {code} > conf = pyspark.SparkConf() > conf.set("spark.jars.packages", > "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0") > conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll") > conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll") > spark = pyspark.sql.SparkSession.builder\ > .appName('test-mongo')\ > .master('local[*]')\ > .config(conf=conf)\ > .getOrCreate() > {code} > The above is in Python but I've seen the behavior in other languages, though, > I didn't check R. > I also have seen it in older Spark versions. > It seems that this is the only config key that doesn't work for me via the > {{SparkSession}} builder config. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org