[ https://issues.apache.org/jira/browse/SPARK-39059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Furcy Pin updated SPARK-39059: ------------------------------ Description: We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a.{code} *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2, even if it has been set on s1 *(OK)* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. {code} *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show(){code} *Example 2.A* This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a, a.{code} *Example 2.B* This should fail because "spark.sql.caseSensitive" has not been set on s2, but it works *[NOT NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a"){code} *Example 2.C* This should work because "spark.sql.caseSensitive" has been set on s2, but it fails instead *[NOT NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is a{code} was: We encountered unexpected error when using SparkSession.newSession and the "spark.sql.caseSensitive" option. I wrote a handful of examples below to illustrate the problem, but from the examples below it looks like when you use _SparkSession.newSession()_ and change the configuration of that new session, _DataFrame.apply(col_name)_ seems to use the configuration from the initial session instead of the new one. *Example 1.A* This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a.{code} *Example 1.B* This fails because "spark.sql.caseSensitive" has not been set on s2, even if it has been set on s1 *(OK)* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show() > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference > 'a' is ambiguous, could be: a, a. {code} *Example 1.C* This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df.select("a").show(){code} *Example 2.A* This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a, a.{code} *Example 2.B* This should fail because "spark.sql.caseSensitive" has not been set on s2, but it works *[NOT NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() // s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a"){code} *Example 2.C* This should work because "spark.sql.caseSensitive" has been set on s2, but it fails instead *[NOT NORMAL]* {code:java} val s1 = SparkSession.builder.master("local[2]").getOrCreate() // s1.conf.set("spark.sql.caseSensitive", "true") val s2 = s1.newSession() s2.conf.set("spark.sql.caseSensitive", "true") val df = s2.sql("select 'a' as A, 'a' as a") df("a") // > Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'a' is a{code} > When using multiple SparkSessions, DataFrame.resolve uses configuration from > the wrong session > ---------------------------------------------------------------------------------------------- > > Key: SPARK-39059 > URL: https://issues.apache.org/jira/browse/SPARK-39059 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Reporter: Furcy Pin > Priority: Minor > > We encountered unexpected error when using SparkSession.newSession and the > "spark.sql.caseSensitive" option. > I wrote a handful of examples below to illustrate the problem, but from the > examples below it looks like when you use _SparkSession.newSession()_ and > change the configuration of that new session, _DataFrame.apply(col_name)_ > seems to use the configuration from the initial session instead of the new > one. > *Example 1.A* > This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > // s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df.select("a").show() > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > > Reference 'a' is ambiguous, could be: a, a.{code} > > *Example 1.B* > This fails because "spark.sql.caseSensitive" has not been set on s2, even if > it has been set on s1 *(OK)* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > // s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df.select("a").show() > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > > Reference 'a' is ambiguous, could be: a, a. > {code} > > *Example 1.C* > This works because "spark.sql.caseSensitive" has been set on s2 *[OK]* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > // s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df.select("a").show(){code} > > *Example 2.A* > This fails because "spark.sql.caseSensitive" has not been set at all > *[NORMAL]* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > // s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > // s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df("a") > // > Exception in thread "main" org.apache.spark.sql.AnalysisException: > Reference 'a' is ambiguous, could be: a, a.{code} > > *Example 2.B* > This should fail because "spark.sql.caseSensitive" has not been set on s2, > but it works *[NOT NORMAL]* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > // s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df("a"){code} > > *Example 2.C* > This should work because "spark.sql.caseSensitive" has been set on s2, but it > fails instead *[NOT NORMAL]* > {code:java} > val s1 = SparkSession.builder.master("local[2]").getOrCreate() > // s1.conf.set("spark.sql.caseSensitive", "true") > val s2 = s1.newSession() > s2.conf.set("spark.sql.caseSensitive", "true") > val df = s2.sql("select 'a' as A, 'a' as a") > df("a") > // > Exception in thread "main" org.apache.spark.sql.AnalysisException: > Reference 'a' is a{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org