ksbeyer commented on code in PR #51466:
URL: https://github.com/apache/spark/pull/51466#discussion_r2203743569
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -180,7 +180,7 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
} else {
CharVarcharUtils.stringLengthCheck(casted, attr.dataType)
}
- Alias(exprWithStrLenCheck, attr.name)(explicitMetadata =
Some(attr.metadata))
+ Alias(exprWithStrLenCheck, attr.name)()
Review Comment:
The metadata is inconsistently set. It is not set for complex types, but is
set for simple types and for promotions. Either it needs to be set always or
never.
If we transfer all column attributes to the query, then bad things happen
with char / varchar metadata -- which is why I think the code was marking
non-promotable metadata, which I also removed.
I don't see any solid reason to move column properties onto the query
columns. I'd rather have table columns, and perhaps the entire definition,
made available to the writers. I considered making that change but it requires
making changes that affect all v1 writers. I'd like to remove the metadata use
altogether, but the change was too big. Instead, I just pushed the issue into
a very special case for JSON and fixed the inconsistent null handling. I
didn't find any more tests failing, so I'm hoping, perhaps naively, that there
isn't more (half broken!) need for the metadata. If there are more uses,
they are not working correctly with complex types.
We can go the other way and make all types return metadata, but it copies
all table attribute metadata without a clean understanding of impact. Then we
get random attempt to suppress metadata that causes problems (eg varchar
supression). It will also make the dsv2 transition harder, as we continue to
flow Column configuration via metadata. We need a clear design and
documentation on what is transferred and why.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]