ksbeyer commented on code in PR #51466:
URL: https://github.com/apache/spark/pull/51466#discussion_r2237947869
##########
sql/core/src/test/resources/sql-tests/analyzer-results/charvarchar.sql.out:
##########
@@ -376,7 +376,7 @@ CreateDataSourceTableCommand
`spark_catalog`.`default`.`char_tbl4`, false
insert into char_tbl4 select c, c, v, c from str_view
-- !query analysis
InsertIntoHadoopFsRelationCommand file:[not included in
comparison]/{warehouse_dir}/char_tbl4, false, Parquet, [path=file:[not included
in comparison]/{warehouse_dir}/char_tbl4], Append,
`spark_catalog`.`default`.`char_tbl4`,
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included
in comparison]/{warehouse_dir}/char_tbl4), [c7, c8, v, s]
-+- Project
[static_invoke(CharVarcharCodegenUtils.charTypeWriteSideCheck(cast(c#x as
string), 7)) AS c7#x,
static_invoke(CharVarcharCodegenUtils.charTypeWriteSideCheck(cast(c#x as
string), 8)) AS c8#x,
static_invoke(CharVarcharCodegenUtils.varcharTypeWriteSideCheck(cast(v#x as
string), 6)) AS v#x, cast(c#x as string) AS s#x]
++- Project
[static_invoke(CharVarcharCodegenUtils.charTypeWriteSideCheck(cast(c#x as
string), 7)) AS c7#x,
static_invoke(CharVarcharCodegenUtils.charTypeWriteSideCheck(cast(c#x as
string), 8)) AS c8#x,
static_invoke(CharVarcharCodegenUtils.varcharTypeWriteSideCheck(cast(v#x as
string), 6)) AS v#x, c#x AS s#x]
Review Comment:
The cast is avoided by the change in
org.apache.spark.sql.catalyst.analysis.TableOutputResolver.checkField
from
if (isCompatible(tableAttr, queryExpr))
where isCompatible:
DataTypeUtils.sameType(tableAttr.dataType, queryExpr.dataType) &&
tableAttr.name == queryExpr.name &&
tableAttr.metadata == queryExpr.metadata
to
if (DataTypeUtils.sameType(tableAttr.dataType, queryExpr.dataType))
The name check is not required because we will make sure the name is correct
in applyColumnMetadata. The metadata check seems wrong. Why do we care if the
table side has a default value that matches the query side?
I'm guessing the concern is solely for the char/varchar metadata. For this,
the cast did help in a complicated way: it helped to keep the char/varchar
metadata from propagating to the write command, but it is a no-op cast from
String (with char/varchar metadata) to String as far as i can tell. With the
new code, applyColumnMetadata will keep the char/varchar metadata from
propagating (along with the rewrite changes to not undo its work).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]