[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-21 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135669#comment-16135669
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/21/17 8:01 PM:
--

In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. I suggest we close the bug (if we can) and create an issue for 
docs. If it is a bug at the end of the day, we need a fix and then consider the 
bigger problem of dynamic configuration and spark config semantics.


was (Author: skonto):
In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. I suggest we close the bug (if we can) and create an issue for 
docs.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-21 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135669#comment-16135669
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/21/17 7:58 PM:
--

In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. I suggest we close the bug (if we can) and create an issue for 
docs.


was (Author: skonto):
In order to be correct before we update anything I think we need to separate 
the issues here. One thing is to reproduce the current issue which I didnt 
manage to reproduce (details above) and point the code path that triggers the 
questioned behavior. As a result, someone has to make sure this actually is an 
issue for spark.jars.packages (IMHO config is a public api and it is important 
to be consistent in behavior which I think it is as all configuration ends up 
to the spark submit logic). The other thing is to document what is the behavior 
for spark session in interactive envs and in general to what extend spark can 
be configured dynamically. Priority and urgency depends which we want to 
address first. The second path could cover the first.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:29 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter, although it creates a spark object upfront. Btw its far from 
manual as it works out of the box, but anyway the point here is Spark's config 
api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could have just 
used plain python)
and followed your example and I got the following (some results verify what you 
have already observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:19 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter. Btw its far from manual as it works out of the box, but anyway 
the point here is Spark's config api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could have just 
used plain python)
and followed your example and I got the following (some results verify what you 
have already observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:13 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter. Btw its far from manual as it works out of the box, but anyway 
the point here is Spark's config api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could just use 
plain python)
and followed your example and I got the following (some results verify what you 
have already observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:12 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter. Btw its far from manual as it works out of the box, but anyway 
the point here is Spark's config api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could just use 
plain python)
and followed your example and I got the following (some results verify what you 
observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
  

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:12 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter. Btw its far from manual as it works out of the box, but anyway 
the point here is Spark's config api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could just use 
plain python)
and followed your example and I got the following (some results verify what you 
have already observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133565#comment-16133565
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/18/17 8:11 PM:
--

[~jsnowacki] What I am doing is not manual its just another legitimate way to 
start jupyter. Btw its far from manual as it works out of the box, but anyway 
the point here is Spark's config api consistency (since its a public API).
I agree the other way to start things is more pythonic and that way is very 
manual IMHO but its ok since its common practice (that is why I insisted for 
all the details).

Now I did use pip install pyspark, then `jupyter notebook` (or could just use 
plain python)
and followed your example and I got the following (some are verifying what you 
observed):

a) setting env variable always works without caring about configuration:

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.getOrCreate()
{code}

[I 22:50:58.697 NotebookApp] Adapting to protocol v5.1 for kernel 
d05897ed-6de4-4ec2-842f-adb094bf0f0d
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 160ms :: artifacts dl 3ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   0   |   0   |   0   ||   2   |   0   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 2 already retrieved (0kB/5ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/18 22:52:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/08/18 22:52:05 WARN Utils: Your hostname, universe resolves to a loopback 
address: 127.0.1.1; using 192.168.2.7 instead (on interface wlp2s0)
17/08/18 22:52:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address

b)
Example 1 for me works without issues:

{code:java}
 import pyspark
spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
.config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
.getOrCreate()
{code}

Output:
Creating new notebook in 
[I 23:03:52.055 NotebookApp] Kernel started: 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
[W 23:03:52.058 NotebookApp] 404 GET 
/nbextensions/widgets/notebook/js/extension.js?v=20170818230343 (127.0.0.1) 
1.46ms 
referer=http://localhost:/notebooks/Untitled2.ipynb?kernel_name=python3
[I 23:04:21.361 NotebookApp] Adapting to protocol v5.1 for kernel 
bc93a17a-e7a5-4e83-8a63-df0adba97c79
Ivy Default Cache set to: /home/stavros/.ivy2/cache
The jars for the packages stored in: /home/stavros/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/lib/python3.5/dist-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
:: resolution report :: resolve 169ms :: artifacts dl 4ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.2.0 from central in 
[default]
   

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-18 Thread Jakub Nowacki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131945#comment-16131945
 ] 

Jakub Nowacki edited comment on SPARK-21752 at 8/18/17 9:11 AM:


[~skonto] What you are doing is in fact starting manually pyspark 
({{shell.py}}) inside jupyter, which creates SparkSession, so what I written 
above doesn't have any effect as it is the same as running pyspark command.

More Pythonic way of installing it is either adding modules to PYTHONPATH from 
the bundle {{python}} folder (e.g. 
http://sigdelta.com/blog/how-to-install-pyspark-locally/), which is very 
similar to what happens when you use {{pip}}/{{conda}} install. Also, I am 
referring to a plain python kernel in Jupyter (or any other python interpreter) 
started without executing {{shell.py}}. BTW you can create kernels in Jupyter 
e.g. https://gist.github.com/cogfor/903c911c9b1963dcd530bbc0b9d9f0ce, which 
will work as pyspark shell, similar to your setup

While I understand that this is not a desired behavior to use {{master}} or 
{{spark.jars.packages}} in the config, I'd like to work out a preferred way of 
passing configuration options to SparkSession, especially for notebook users. 
Also, my experience is that many of the options other than  {{master}} and 
{{spark.jars.packages}} work quite well with the SparkSession config, e.g. 
{{spark.executor.memory}} etc, which are sometimes need to be tuned to run some 
specific jobs; in a generic jobs I always rely on the defaults, which I often 
tune for a specific cluster.

So my question is: in case we need to add some custom configuration to PySpark 
submission, should interactive Python users:
# add *all* configurations to {{PYSPARK_SUBMIT_ARGS}}
# some configuration like {{master}} or {{packages}} to to 
{{PYSPARK_SUBMIT_ARGS}} but others can be passed in the SparkSession config, 
maybe also saying which ones they are
# we should fix something in SparkSession creation to make SparkSession config 
equally effective to {{PYSPARK_SUBMIT_ARGS}}

Also, sometimes we know that e.g. job (not interactive, run by 
{{spark-submit}}) requires more executor memory or different number of 
partitions. Could we in this case use SparkSession config or each of these 
tuned parameters should be passed via {{spark-submit}} arguments?

I'm happy to extend the documentation with such section for Python users as I 
don't think it's clear currently and would be very useful for python users.


was (Author: jsnowacki):
[~skonto] What you are doing is in fact starting manually pyspark 
({{shell.py}}) inside jupyter, which creates SparkSession, so what I written 
above doesn't have any effect as it is the same as running pyspark command.

More Pythonic way of installing it is either adding modules to PYTHONPATH from 
the bundle {{python}} folder (e.g. 
http://sigdelta.com/blog/how-to-install-pyspark-locally/), which is very 
similar to what happens when you use {{pip}}/{{conda}} install. Also, I am 
referring to a plain python kernel in Jupyter (or any other python interpreter) 
started without executing {{shell.py}}. BTW you can create kernels in Jupyter 
e.g. https://gist.github.com/cogfor/903c911c9b1963dcd530bbc0b9d9f0ce, which 
will work as pyspark shell, similar to your setup

While I understand that not desired behavior to use {{master}} or 
{{spark.jars.packages}} in the config, I'd like to work out a preferred way of 
passing configuration options to SparkSession, especially for notebook users. 
Also, my experience is that many of the options other than  {{master}} and 
{{spark.jars.packages}} work quite well with the SparkSession config, e.g. 
{{spark.executor.memory}} etc, which are sometimes need to be tuned to run some 
specific jobs; in a generic jobs I always rely on the defaults, which I often 
tune for a specific cluster.

So my question is: in case we need to add some custom configuration to PySpark 
submission, should interactive Python users:
# add *all* configurations to {{PYSPARK_SUBMIT_ARGS}}
# some configuration like {{master}} or {{packages}} to to 
{{PYSPARK_SUBMIT_ARGS}} but others can be passed in the SparkSession config, 
maybe also saying which ones they are
# we should fix something in SparkSession creation to make SparkSession config 
equally effective to {{PYSPARK_SUBMIT_ARGS}}

Also, sometimes we know that e.g. job (not interactive, run by 
{{spark-submit}}) requires more executor memory or different number of 
partitions. Could we in this case use SparkSession config or each of these 
tuned parameters should be passed via {{spark-submit}} arguments?

I'm happy to extend the documentation with such section for Python users as I 
don't think it's clear currently and would be very useful for python users.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130525#comment-16130525
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 3:15 PM:
--

[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. I 
searched the places where this variable is utilized so nothing related to 
SparkConf unless somehow you use spark submit (pyspark calls that btw). I will 
try installing pyspark with pip but not sure if it will make any difference. 



was (Author: skonto):
[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. I 
searched the places where this variable is utilized so nothing 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130525#comment-16130525
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 3:12 PM:
--

[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. I 
searched the places where this variable is utilized so nothing related to 
SparkConf unless somehow you use spark submit (pyspark calls that btw). I will 
try installing pyspark with pip but not sure if it will make any difference.



was (Author: skonto):
[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. I 
searched the places where this variable is utilized so nothing related 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130525#comment-16130525
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 3:11 PM:
--

[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. I 
searched the places where this variable is utilized so nothing related to 
SparkConf unless somehow you use spark submit (pyspark calls that btw).



was (Author: skonto):
[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. 


> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: 

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130525#comment-16130525
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 3:02 PM:
--

[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar to (likely).

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. 



was (Author: skonto):
[~jsnowacki] I dont think I am doing anything wrong. I followed your 
instructions. I use pyspark which comes with the spark distro no need to 
install it on my system.

So when I do:
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_DRIVER_PYTHON=jupyter
and then ./pyspark
I have a fully working jupyter notebook.
Also by typing in a cell spark, a spark session is already defined and there is 
also sc defined.
SparkSession - in-memory
SparkContext
Spark UI
Version
v2.3.0-SNAPSHOT
Master
local[*]
AppName
PySparkShell

So its not the case that you need to setup spark session on your own unless 
things are setup in some other way I am not familiar with.

Then I run your example but the --packages has no effect.

{code:java}
import pyspark
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")

spark = pyspark.sql.SparkSession.builder\
.appName('test-mongo')\
.master('local[*]')\
.config(conf=conf)\
.getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), 
("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", 
None)], ["name", "age"])

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

{code}

Check here:
https://github.com/jupyter/notebook/issues/743
https://gist.github.com/ololobus/4c221a0891775eaa86b0
for someways to start things. 

Now, I suspect this is the responsible line 
https://github.com/apache/spark/blob/d695a528bef6291e0e1657f4f3583a8371abd7c8/python/pyspark/java_gateway.py#L54
so that PYSPARK_SUBMIT_ARGS is taken into consideration but as I said from what 
I observed java gateway is used once when my pythonbook
is started. You can easily check that by modifying the file to print something 
and also by checking if you have spark already defined as in my case. 


> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  

[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130132#comment-16130132
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 9:14 AM:
--

[~jerryshao] That is true. I was curious which path was hit btw since packages 
property is set in the spark submit logic and I didnt see something relevant in 
SparkConf, only for spark.jars when a new context is created.


was (Author: skonto):
[~jerryshao] That is true. I was curious which path was hit btw.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130132#comment-16130132
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/17/17 9:14 AM:
--

[~jerryshao] That is true. I was curious which path was hit btw since packages 
property is set in the spark submit logic and I didnt see anything relevant in 
SparkConf, only for spark.jars there is logic to set it when a new context is 
created.


was (Author: skonto):
[~jerryshao] That is true. I was curious which path was hit btw since packages 
property is set in the spark submit logic and I didnt see something relevant in 
SparkConf, only for spark.jars when a new context is created.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129589#comment-16129589
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 11:18 PM:
---

[~jsnowacki] I tried the second example with jupyter.

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
./pyspark
This way still it fails. So probably you start jupyter differently right?
Here is the error:
https://gist.github.com/skonto/e6f6996c7665a6d0a826d20d820cfd4f

and my jupyter info:
The version of the notebook server is 4.2.3 and is running on:
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609]

Current Kernel Information:

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
Type "copyright", "credits" or "license" for more information.





was (Author: skonto):
[~jsnowacki] I tried the second example with jupyter.

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
./pyspark
This way still it fails. So probably you start jupyter differently right?
Here is the error:
https://gist.github.com/skonto/e6f6996c7665a6d0a826d20d820cfd4f




> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129472#comment-16129472
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 10:29 PM:
---

@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.
Here is a gist with my run trying to override (probably will not work as it 
returns the same session):
https://gist.github.com/skonto/31f4d43ca4e095d72a5185d8b81fa526
https://gist.github.com/skonto/0ac92f0225bce155feec5234041bf62f


was (Author: skonto):
@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.
Here is a gist with my run trying to override (probably will not work as it 
returns the same session):
https://gist.github.com/skonto/31f4d43ca4e095d72a5185d8b81fa526

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129512#comment-16129512
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 10:27 PM:
---

[~jsnowacki] Actually it didnt help, it only changes the conf value but action 
is not triggered for packages.


was (Author: skonto):
[~jsnowacki] Actually it didnt help as shown in the gist, it only changes the 
conf value but action is not triggered for packages.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129512#comment-16129512
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 10:15 PM:
---

[~jsnowacki] Actually it didnt help as shown in the gist, it only changes the 
conf value but action is not triggered for packages.


was (Author: skonto):
[~jsnowacki] Actually it didnt help I tried it, it only changes the conf value 
but action is not triggered for packages.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129512#comment-16129512
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 10:11 PM:
---

[~jsnowacki] Actually it didnt help I tried it, it only changes the conf value 
but action is not triggered for packages.


was (Author: skonto):
[~jsnowacki] Actually it didnt help I tried it, it only changes the conf value 
but action is triggered for packages.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 10:01 PM:
---

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense.
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this would not 
help. Anyway I would suggest you just add in the description that this is 
reproducible in Jupyter.


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense.
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway I would suggest you just add in the description that this is 
reproducible in Jupyter.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129472#comment-16129472
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:59 PM:
--

@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.
Here is a gist with my run trying to override (probably will not work as it 
returns the same session):
https://gist.github.com/skonto/31f4d43ca4e095d72a5185d8b81fa526


was (Author: skonto):
@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.
Here is a gist with my run:
https://gist.github.com/skonto/31f4d43ca4e095d72a5185d8b81fa526

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:58 PM:
--

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense.
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway I would suggest you just add in the description that this is 
reproducible in Jupyter.


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense.
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway just add in the description that this is reproducible in 
Jupyter.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:57 PM:
--

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense.
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway just add in the description that this is reproducible in 
Jupyter.


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway just add in the description that this is reproducible in 
Jupyter.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:54 PM:
--

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520 but probably this is not the 
same case. Anyway just add in the description that this is reproducible in 
Jupyter.


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:52 PM:
--

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?
Btw there is a fix here for overriding confs: 
https://issues.apache.org/jira/browse/SPARK-15520


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129488#comment-16129488
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:50 PM:
--

[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using pyspark. Now my 
understanding is that example 2 should work from within pyspark as well, I 
would consider this simple env as a reference, no?


was (Author: skonto):
[~jsnowacki] I didnt say parenthesis is the root cause just saying to correct 
it in the description... Ok now it makes more sense, still though overriding 
spark session shouldnt be a problem to reproduce by using simple pyspark. Now 
my understanding is that example 2 should work from simple within pyspark as 
well, I would consider this simple env as a reference, no?

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Jakub Nowacki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129475#comment-16129475
 ] 

Jakub Nowacki edited comment on SPARK-21752 at 8/16/17 9:46 PM:


I'm aware you cannot do it with pyspark command as you have a session 
automatically created there. 

We use this spark session creation with Jupyter notebook or some workflow 
scripts (e.g. used in Airflow), so this is pretty much bare Python with pyspark 
being a module; much like creating SparkSession in Scala object's main 
function. I'm assuming you don't have SparkSession running beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't 
work nonetheless as the parenthesis gives you just a syntax error.


was (Author: jsnowacki):
OK so you don't need session creation with pyspark command line. We use this 
spark session creation with Jupyter notebook, so this is pretty much bare 
Python with pyspark being a module; much like creating SparkSession in Scala 
object's main function. I'm assuming you don't have SparkSession running 
beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't 
work nonetheless as the parenthesis gives you just a syntax error.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129472#comment-16129472
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:41 PM:
--

@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.
Here is a gist with my run:
https://gist.github.com/skonto/31f4d43ca4e095d72a5185d8b81fa526


was (Author: skonto):
@Jakub Nowacki I figured out the import, no I mean it does now show any 
downloading of jars and fails the same way as example one fails. I am using 
spark-2.2.0-bin-hadoop2.7 downloaded from the official site.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129454#comment-16129454
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:37 PM:
--

[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)", 
example 2. I just copy pasted your code it does not work, anything special I 
should do?
I just run ./pyspark then import pyspark and then run the second example.
Btw there is a typo here:
 .config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\

in the first example you need to remove one parenthesis. 


was (Author: skonto):
[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)", 
example 2. I just copy pasted your code it does not work, anything special I 
should do?
Btw there is a typo here:
 .config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\

in the first example you need to remove one parenthesis. 

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129454#comment-16129454
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:35 PM:
--

[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)", 
example 2. I just copy pasted your code it does not work, anything special I 
should do?
Btw there is a typo here:
 .config("spark.jars.packages", 
"org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\

in the first example you need to remove one parenthesis. 


was (Author: skonto):
[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)", 
example 2. I just copy pasted your code it does not work, anything special I 
should do?

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129454#comment-16129454
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:33 PM:
--

[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)", 
example 2. I just copy pasted your code it does not work, anything special I 
should do?


was (Author: skonto):
[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)". I 
just copy pasted your code it does not work, anything special I should do?

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129454#comment-16129454
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:33 PM:
--

[~jsnowacki] I couldnt reproduce the scenario with  ".config(conf=conf)". I 
just copy pasted your code it does not work, anything special I should do?


was (Author: skonto):
[~jsnowacki] I couldnt reproduce this one: ".config(conf=conf)". I just copy 
pasted your code it does not work, anything special I should do?

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129454#comment-16129454
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:32 PM:
--

[~jsnowacki] I couldnt reproduce this one: ".config(conf=conf)". I just copy 
pasted your code it does not work, anything special I should do?


was (Author: skonto):
I couldnt reproduce this one: ".config(conf=conf)". I just copy pasted your 
code it does not work.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Jakub Nowacki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129432#comment-16129432
 ] 

Jakub Nowacki edited comment on SPARK-21752 at 8/16/17 9:15 PM:


[~skonto] Not sure which one you couldn't reproduce. Using {{--packages}} works 
fine as I explained. Using the latter (example 2) {{SparkConf}} created  before 
{{SparkSession}} and passing it to the builder via {{.config(conf=conf)}} works 
fine as well. Only the version with passing key-values directly to {{config}} 
(examle 1) does not work for me. I tried on different instances of Spark 2+, 
and it behaves the same.


was (Author: jsnowacki):
[~skonto] Not sure which one you couldn't reproduce. Using {{--packages}} works 
fine as I explained. Using the latter (example 2) {{SparkConf}} created  before 
{SparkSession}} and passing it to the builder via {{.config(conf=conf)}} works 
fine as well. Only the version with passing key-values directly to {{config}} 
(examle 1) does not work for me. I tried on different instances of Spark 2+, 
and it behaves the same.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129406#comment-16129406
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:04 PM:
--

"unclear why passing SparkConf via SparkSession config works"

[~jsnowacki] I was not able to reproduce it using pyspark with the latest spark 
version 2.2.0. I tried ./pyspark --packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 though and things work as 
expected using the simple example here: 
https://docs.mongodb.com/spark-connector/master/python/write-to-mongodb/


was (Author: skonto):
"unclear why passing SparkConf via SparkSession config works"

[~jsnowacki] I was not able to reproduce it using pyspar with latest spark 
2.2.0. I tried ./pyspark --packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 though and things work as 
expected using the simple example here: 
https://docs.mongodb.com/spark-connector/master/python/write-to-mongodb/

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-16 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129406#comment-16129406
 ] 

Stavros Kontopoulos edited comment on SPARK-21752 at 8/16/17 9:04 PM:
--

"unclear why passing SparkConf via SparkSession config works"

[~jsnowacki] I was not able to reproduce it using pyspar with latest spark 
2.2.0. I tried ./pyspark --packages 
org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 though and things work as 
expected using the simple example here: 
https://docs.mongodb.com/spark-connector/master/python/write-to-mongodb/


was (Author: skonto):
"unclear why passing SparkConf via SparkSession config works"

[~jsnowacki] I was not able to reproduce it using pyspark, . I tried ./pyspark 
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 though and things 
work as expected using the simple example here: 
https://docs.mongodb.com/spark-connector/master/python/write-to-mongodb/

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0"))\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org