[ https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135669#comment-16135669 ]
Stavros Kontopoulos edited comment on SPARK-21752 at 8/21/17 8:01 PM: ---------------------------------------------------------------------- In order to be correct before we update anything I think we need to separate the issues here. One thing is to reproduce the current issue which I didnt manage to reproduce (details above) and point the code path that triggers the questioned behavior. As a result, someone has to make sure this actually is an issue for spark.jars.packages (IMHO config is a public api and it is important to be consistent in behavior which I think it is as all configuration ends up to the spark submit logic). The other thing is to document what is the behavior for spark session in interactive envs and in general to what extend spark can be configured dynamically. Priority and urgency depends which we want to address first. I suggest we close the bug (if we can) and create an issue for docs. If it is a bug at the end of the day, we need a fix and then consider the bigger problem of dynamic configuration and spark config semantics. was (Author: skonto): In order to be correct before we update anything I think we need to separate the issues here. One thing is to reproduce the current issue which I didnt manage to reproduce (details above) and point the code path that triggers the questioned behavior. As a result, someone has to make sure this actually is an issue for spark.jars.packages (IMHO config is a public api and it is important to be consistent in behavior which I think it is as all configuration ends up to the spark submit logic). The other thing is to document what is the behavior for spark session in interactive envs and in general to what extend spark can be configured dynamically. Priority and urgency depends which we want to address first. I suggest we close the bug (if we can) and create an issue for docs. > Config spark.jars.packages is ignored in SparkSession config > ------------------------------------------------------------ > > Key: SPARK-21752 > URL: https://issues.apache.org/jira/browse/SPARK-21752 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Jakub Nowacki > > If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder > as follows: > {code} > spark = pyspark.sql.SparkSession.builder\ > .appName('test-mongo')\ > .master('local[*]')\ > .config("spark.jars.packages", > "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\ > .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \ > .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \ > .getOrCreate() > {code} > the SparkSession gets created but there are no package download logs printed, > and if I use the loaded classes, Mongo connector in this case, but it's the > same for other packages, I get {{java.lang.ClassNotFoundException}} for the > missing classes. > If I use the config file {{conf/spark-defaults.comf}}, command line option > {{--packages}}, e.g.: > {code} > import os > os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages > org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell' > {code} > it works fine. Interestingly, using {{SparkConf}} object works fine as well, > e.g.: > {code} > conf = pyspark.SparkConf() > conf.set("spark.jars.packages", > "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0") > conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll") > conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll") > spark = pyspark.sql.SparkSession.builder\ > .appName('test-mongo')\ > .master('local[*]')\ > .config(conf=conf)\ > .getOrCreate() > {code} > The above is in Python but I've seen the behavior in other languages, though, > I didn't check R. > I also have seen it in older Spark versions. > It seems that this is the only config key that doesn't work for me via the > {{SparkSession}} builder config. > Note that this is related to creating new {{SparkSession}} as getting new > packages into existing {{SparkSession}} doesn't indeed make sense. Thus this > will only work with bare Python, Scala or Java, and not on {{pyspark}} or > {{spark-shell}} as they create the session automatically; it this case one > would need to use {{--packages}} option. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org