[jira] [Updated] (SPARK-43522) Creating struct column occurs error 'org.apache.spark.sql.AnalysisException [DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING]'

2023-05-16 Thread Heedo Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heedo Lee updated SPARK-43522:
--
Description: 
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ 
|      value|      key_value|           map_entry| 
+---+---++ 
|a=b,c=d,d=f|[a=b, c=d, d=f]|[{a, b}, {c, d}, ...| 
+---+---++
 
testDF.printSchema()
root
 |-- value: string (nullable = true)
 |-- key_value: array (nullable = true)
 |    |-- element: string (containsNull = false)
 |-- map_entry: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
{code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 
-1)[1])" due to data type mismatch: Only foldable `STRING` expressions are 
allowed to appear at odd position, but they are ["0", "1"].;
'Project [value#41, key_value#45, transform(key_value#45, 
lambdafunction(struct(0, split(lambda x_3#49, =, -1)[0], 1, split(lambda 
x_3#49, =, -1)[1]), lambda x_3#49, false)) AS map_entry#48]
+- Project [value#41, split(value#41, ,, -1) AS key_value#45]
   +- LocalRelation [value#41]  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)

 
{code}
 

However, if you do an alias to struct elements, you can get the same result as 
the previous version.

 
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0).as("col1") , split(x, "=").getItem(1).as("col2") ) )){code}
 

 

  was:
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ 
|      value|      key_value|           map_entry| 
+---+---++ 
|a=b,c=d,d=f|[a=b, c=d, d=f]|[{a, b}, {c, d}, ...| 
+---+---++
 
root
 |-- value: string (nullable = true)
 |-- key_value: array (nullable = true)
 |    |-- element: string (containsNull = false)
 |-- map_entry: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
{code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 
-1)[1])" due to data type mismatch: Only foldable `STRING` expressions are 
allowed 

[jira] [Updated] (SPARK-43522) Creating struct column occurs error 'org.apache.spark.sql.AnalysisException [DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING]'

2023-05-16 Thread Heedo Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heedo Lee updated SPARK-43522:
--
Description: 
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ 
|      value|      key_value|           map_entry| 
+---+---++ 
|a=b,c=d,d=f|[a=b, c=d, d=f]|[{a, b}, {c, d}, ...| 
+---+---++
 
root
 |-- value: string (nullable = true)
 |-- key_value: array (nullable = true)
 |    |-- element: string (containsNull = false)
 |-- map_entry: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
{code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 
-1)[1])" due to data type mismatch: Only foldable `STRING` expressions are 
allowed to appear at odd position, but they are ["0", "1"].;
'Project [value#41, key_value#45, transform(key_value#45, 
lambdafunction(struct(0, split(lambda x_3#49, =, -1)[0], 1, split(lambda 
x_3#49, =, -1)[1]), lambda x_3#49, false)) AS map_entry#48]
+- Project [value#41, split(value#41, ,, -1) AS key_value#45]
   +- LocalRelation [value#41]  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)

 
{code}
 

However, if you do an alias to struct elements, you can get the same result as 
the previous version.

 
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0).as("col1") , split(x, "=").getItem(1).as("col2") ) )){code}
 

 

  was:
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ 
|      value|      key_value|           map_entry| 
+---+---++ 
|a=b,c=d,d=f|[a=b, c=d, d=f]|[{a, b}, {c, d}, ...| 
+---+---++
 
root
 |-- value: string (nullable = true)
 |-- key_value: array (nullable = true)
 |    |-- element: string (containsNull = false)
 |-- map_entry: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
 |-- aaa: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true) {code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 

[jira] [Updated] (SPARK-43522) Creating struct column occurs error 'org.apache.spark.sql.AnalysisException [DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING]'

2023-05-16 Thread Heedo Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heedo Lee updated SPARK-43522:
--
Description: 
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ 
|      value|      key_value|           map_entry| 
+---+---++ 
|a=b,c=d,d=f|[a=b, c=d, d=f]|[{a, b}, {c, d}, ...| 
+---+---++
 
root
 |-- value: string (nullable = true)
 |-- key_value: array (nullable = true)
 |    |-- element: string (containsNull = false)
 |-- map_entry: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
 |-- aaa: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true) {code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 
-1)[1])" due to data type mismatch: Only foldable `STRING` expressions are 
allowed to appear at odd position, but they are ["0", "1"].;
'Project [value#41, key_value#45, transform(key_value#45, 
lambdafunction(struct(0, split(lambda x_3#49, =, -1)[0], 1, split(lambda 
x_3#49, =, -1)[1]), lambda x_3#49, false)) AS map_entry#48]
+- Project [value#41, split(value#41, ,, -1) AS key_value#45]
   +- LocalRelation [value#41]  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)

 
{code}
 

However, if you do an alias to struct elements, you can get the same result as 
the previous version.

 
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0).as("col1") , split(x, "=").getItem(1).as("col2") ) )){code}
 

 

  was:
When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ |      value|      
key_value|           map_entry| 
+---+---++ |a=b,c=d,d=f|[a=b, c=d, 
d=f]|[{a, b}, {c, d}, ...| +---+---++
 
testDF.printSchema
root  |-- value: string (nullable = true)  |-- key_value: array (nullable = 
true)  |    |-- element: string (containsNull = false)  |-- map_entry: array 
(nullable = true)  |    |-- element: struct (containsNull = false)  |    |    
|-- col1: string (nullable = true)  |    |    |-- col2: string (nullable = true)
{code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], 

[jira] [Created] (SPARK-43522) Creating struct column occurs error 'org.apache.spark.sql.AnalysisException [DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING]'

2023-05-16 Thread Heedo Lee (Jira)
Heedo Lee created SPARK-43522:
-

 Summary: Creating struct column occurs  error 
'org.apache.spark.sql.AnalysisException 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING]'
 Key: SPARK-43522
 URL: https://issues.apache.org/jira/browse/SPARK-43522
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Heedo Lee


When creating a struct column in Dataframe, the code that ran without problems 
in version 3.3.1 does not work in version 3.4.0.

 

Example
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0), split(x, "=").getItem(1) ) )){code}
 

In 3.3.1

 
{code:java}
 
testDF.show()
+---+---++ |      value|      
key_value|           map_entry| 
+---+---++ |a=b,c=d,d=f|[a=b, c=d, 
d=f]|[{a, b}, {c, d}, ...| +---+---++
 
testDF.printSchema
root  |-- value: string (nullable = true)  |-- key_value: array (nullable = 
true)  |    |-- element: string (containsNull = false)  |-- map_entry: array 
(nullable = true)  |    |-- element: struct (containsNull = false)  |    |    
|-- col1: string (nullable = true)  |    |    |-- col2: string (nullable = true)
{code}
 

 

In 3.4.0

 
{code:java}
org.apache.spark.sql.AnalysisException: 
[DATATYPE_MISMATCH.CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING] Cannot resolve 
"struct(split(namedlambdavariable(), =, -1)[0], split(namedlambdavariable(), =, 
-1)[1])" due to data type mismatch: Only foldable `STRING` expressions are 
allowed to appear at odd position, but they are ["0", "1"].;
'Project [value#41, key_value#45, transform(key_value#45, 
lambdafunction(struct(0, split(lambda x_3#49, =, -1)[0], 1, split(lambda 
x_3#49, =, -1)[1]), lambda x_3#49, false)) AS map_entry#48]
+- Project [value#41, split(value#41, ,, -1) AS key_value#45]
   +- LocalRelation [value#41]  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)

 
{code}
 

However, if you do an alias to struct elements, you can get the same result as 
the previous version.

 
{code:java}
val testDF = Seq("a=b,c=d,d=f").toDF.withColumn("key_value", split('value, 
",")).withColumn("map_entry", transform(col("key_value"), x => struct(split(x, 
"=").getItem(0).as("col1") , split(x, "=").getItem(1).as("col2") ) )){code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35912) [SQL] JSON read behavior is different depending on the cache setting when nullable is false.

2021-06-27 Thread Heedo Lee (Jira)
Heedo Lee created SPARK-35912:
-

 Summary: [SQL] JSON read behavior is different depending on the 
cache setting when nullable is false.
 Key: SPARK-35912
 URL: https://issues.apache.org/jira/browse/SPARK-35912
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Heedo Lee


Below is the reproduced code.

 
{code:java}
import org.apache.spark.sql.Encoders
 
case class TestSchema(x: Int, y: Int)
case class BaseSchema(value: TestSchema)
 
val schema = Encoders.product[BaseSchema].schema
val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
val jsonDS = spark.read.schema(schema).json(testDS)

jsonDS.show
+-+
|value|
+-+
|{1, null}|
|{2, null}|
+-+

jsonDS.cache.show
+--+
| value|
+--+
|{1, 0}|
|{2, 0}|
+--+

{code}
 

The above result occurs when a schema is created with a nested StructType and 
nullable of StructField is false.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org