[ 
https://issues.apache.org/jira/browse/SPARK-16263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353658#comment-15353658
 ] 

Vladimir Feinberg commented on SPARK-16263:
-------------------------------------------

Right, I'm not arguing for the need for multiple sessions at once, but I think 
it's reasonable to expect this global state to have some notion of idempotency. 
I think whatever we do the restrictions on the use case must be enforced by the 
API. If I'm really only ever allowed to invoke SparkSession creation once, then 
the builder should raise on the second time (and building a session should be a 
process independent of getOrCreate()-ing it).

On the other hand, if we're ok with the one-spark-session-at-a-time (which the 
code is mostly in line with already), then it's just a matter of clearing the 
global variables on shutdown.

> SparkSession caches configuration in an unituitive global way
> -------------------------------------------------------------
>
>                 Key: SPARK-16263
>                 URL: https://issues.apache.org/jira/browse/SPARK-16263
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Vladimir Feinberg
>            Priority: Minor
>
> The following use case demonstrates the issue. Note that as a workaround to 
> SPARK-16262 I use {{reset_spark()}} to stop the current {{SparkSession}}.
> {code} 
> >>> from pyspark.sql import SparkSession
> >>> def reset_spark(): global spark; spark.stop(); 
> >>> SparkSession._instantiatedContext = None
> ... 
> >>> spark = SparkSession.builder.getOrCreate()
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/06/28 11:41:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/06/28 11:41:36 WARN Utils: Your hostname, vlad-databricks resolves to a 
> loopback address: 127.0.1.1; using 192.168.3.166 instead (on interface 
> enp0s31f6)
> 16/06/28 11:41:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'true'
> >>> reset_spark()
> >>> spark = SparkSession.builder.config("spark.sql.retainGroupColumns", 
> >>> "false").getOrCreate()
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'false'
> >>> reset_spark()
> >>> spark = SparkSession.builder.getOrCreate()
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'false'
> >>> 
> {code}
> The last line should output {{u'true'}} instead - there is absolutely no 
> expectation for global config state to persist across sessions, which should 
> use default configuration unless deviated from in each session's specific 
> builder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to