Furcy Pin created SPARK-39059: --------------------------------- Summary: When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session Key: SPARK-39059 URL: https://issues.apache.org/jira/browse/SPARK-39059 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Furcy Pin
We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2 *(OK)* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. ``` *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() ``` *Example 2.A* This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a, a. /* This should fail because "spark.sql.caseSensitive" has not been set on s2, but it works */ // *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") ``` *Example 2.C* This should work because "spark.sql.caseSensitive" has been set on s2, but it fails instead *[NOT NORMAL]* ``` val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is a ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org