Bram Boogaarts created SPARK-43341:
--------------------------------------
Summary: StructType.toDDL does not pick up on non-nullability of
column in nested struct
Key: SPARK-43341
URL: https://issues.apache.org/jira/browse/SPARK-43341
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.2, 3.3.1, 3.3.0
Reporter: Bram Boogaarts
h2. The problem
When converting a StructType instance containing a nested StructType column
which in turn contains a column for which {{nullable = false}} to a DDL string
using {{{}.toDDL{}}}, the resulting DDL string does not include this
non-nullability. For example:
{code:java}
val testschema = StructType(List(
StructField("key", IntegerType, false),
StructField("value", StringType, true),
StructField("nestedCols", StructType(List(
StructField("nestedKey", IntegerType, false),
StructField("nestedValue", StringType, true)
)), false)
))
println(testschema.toDDL)
println(StructType.fromDDL(testschema.toDDL)){code}
gives:
{code:java}
key INT NOT NULL,value STRING,nestedCols STRUCT<nestedKey: INT, nestedValue:
STRING> NOT NULL
StructType(
StructField(key,IntegerType,false),
StructField(value,StringType,true),
StructField(nestedCols,StructType(
StructField(nestedKey,IntegerType,true),
StructField(nestedValue,StringType,true)
),false)
){code}
This is due to the fact that {{StructType.toDDL}} calls {{StructField.toDDL}}
for its fields, which in turn calls {{.sql}} for its {{{}dataType{}}}. If
{{dataType}} is a {{{}StructType{}}}, the call to {{.sql}} in turn calls
{{.sql}} for all the nested fields, and this last method does not include the
nullability of the field in its output.
h2. Proposed solution
{{StructField.toDDL}} should call {{dataType.toDDL}} for a {{{}StructType{}}},
since this will include information about nullability of nested columns.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]