[jira] [Updated] (SPARK-39059) When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session

Furcy Pin (Jira) Thu, 28 Apr 2022 09:36:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Furcy Pin updated SPARK-39059:
------------------------------
    Description: 
We encountered unexpected error when using SparkSession.newSession and the 
"spark.sql.caseSensitive" option.

I wrote a handful of examples below to illustrate the problem, but from the 
examples below it looks like when you use _SparkSession.newSession()_ and 
change the configuration of that new session, _DataFrame.apply(col_name)_ seems 
to use the configuration from the initial session instead of the new one.

*Example 1.A*
This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* 
```

val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 
> 'a' is ambiguous, could be: a, a.
```

*Example 1.B*
This fails because "spark.sql.caseSensitive" has not been set on s2, even if it 
has been set on s1 *(OK)* 
```

val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 
> 'a' is ambiguous, could be: a, a.
```

*Example 1.C*
This works because "spark.sql.caseSensitive" has been set on s2 *[OK]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
```

*Example 2.A*
This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]*

```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
// > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Reference 'a' is ambiguous, could be: a, a.


*Example 2.B*

This should fail because "spark.sql.caseSensitive" has not been set on s2, but 
it works *[NOT NORMAL]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
```

*Example 2.C*
This should work because "spark.sql.caseSensitive" has been set on s2, but it 
fails instead *[NOT NORMAL]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
// > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Reference 'a' is a
```

  was:
We encountered unexpected error when using SparkSession.newSession and the 
"spark.sql.caseSensitive" option.

I wrote a handful of examples below to illustrate the problem, but from the 
examples below it looks like when you use _SparkSession.newSession()_ and 
change the configuration of that new session, _DataFrame.apply(col_name)_ seems 
to use the configuration from the initial session instead of the new one.


*Example 1.A*
This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* 
```

val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 
> 'a' is ambiguous, could be: a, a.
```



*Example 1.B*
This fails because "spark.sql.caseSensitive" has not been set on s2 *(OK)* 
```

val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 
> 'a' is ambiguous, could be: a, a.
```


*Example 1.C*
This works because "spark.sql.caseSensitive" has been set on s2 *[OK]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df.select("a").show()
```


*Example 2.A*
This fails because "spark.sql.caseSensitive" has not been set at all *[NORMAL]*

```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
// > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Reference 'a' is ambiguous, could be: a, a.

/* This should fail because "spark.sql.caseSensitive" has not been set on s2, 
but it works */ // *[NOT NORMAL]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
// s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
```

*Example 2.C*
This should work because "spark.sql.caseSensitive" has been set on s2, but it 
fails instead *[NOT NORMAL]*
```
val s1 = SparkSession.builder.master("local[2]").getOrCreate()
// s1.conf.set("spark.sql.caseSensitive", "true")
val s2 = s1.newSession()
s2.conf.set("spark.sql.caseSensitive", "true")
val df = s2.sql("select 'a' as A, 'a' as a")
df("a")
// > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
Reference 'a' is a
```


> When using multiple SparkSessions, DataFrame.resolve uses configuration from 
> the wrong session
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39059
>                 URL: https://issues.apache.org/jira/browse/SPARK-39059
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: Furcy Pin
>            Priority: Minor
>
> We encountered unexpected error when using SparkSession.newSession and the 
> "spark.sql.caseSensitive" option.
> I wrote a handful of examples below to illustrate the problem, but from the 
> examples below it looks like when you use _SparkSession.newSession()_ and 
> change the configuration of that new session, _DataFrame.apply(col_name)_ 
> seems to use the configuration from the initial session instead of the new 
> one.
> *Example 1.A*
> This fails because "spark.sql.caseSensitive" has not been set at all *(OK)* 
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> // s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df.select("a").show()
> > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> > Reference 'a' is ambiguous, could be: a, a.
> ```
> *Example 1.B*
> This fails because "spark.sql.caseSensitive" has not been set on s2, even if 
> it has been set on s1 *(OK)* 
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> // s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df.select("a").show()
> > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> > Reference 'a' is ambiguous, could be: a, a.
> ```
> *Example 1.C*
> This works because "spark.sql.caseSensitive" has been set on s2 *[OK]*
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> // s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df.select("a").show()
> ```
> *Example 2.A*
> This fails because "spark.sql.caseSensitive" has not been set at all 
> *[NORMAL]*
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> // s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> // s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df("a")
> // > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> Reference 'a' is ambiguous, could be: a, a.
> *Example 2.B*
> This should fail because "spark.sql.caseSensitive" has not been set on s2, 
> but it works *[NOT NORMAL]*
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> // s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df("a")
> ```
> *Example 2.C*
> This should work because "spark.sql.caseSensitive" has been set on s2, but it 
> fails instead *[NOT NORMAL]*
> ```
> val s1 = SparkSession.builder.master("local[2]").getOrCreate()
> // s1.conf.set("spark.sql.caseSensitive", "true")
> val s2 = s1.newSession()
> s2.conf.set("spark.sql.caseSensitive", "true")
> val df = s2.sql("select 'a' as A, 'a' as a")
> df("a")
> // > Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> Reference 'a' is a
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39059) When using multiple SparkSessions, DataFrame.resolve uses configuration from the wrong session

Reply via email to