ksbeyer commented on code in PR #51466:
URL: https://github.com/apache/spark/pull/51466#discussion_r2203743569


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -180,7 +180,7 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
       } else {
         CharVarcharUtils.stringLengthCheck(casted, attr.dataType)
       }
-      Alias(exprWithStrLenCheck, attr.name)(explicitMetadata = 
Some(attr.metadata))
+      Alias(exprWithStrLenCheck, attr.name)()

Review Comment:
   The metadata is inconsistently set.  It is not set for complex types, but is 
set for simple types and for promotions.  Either it needs to be set always or 
never.
   
   If we transfer all column attributes to the query, then bad things happen 
with char / varchar metadata -- which is why I think the code was marking 
non-promotable metadata, which I also removed.
   
   I don't see any solid reason to move column properties onto the query 
columns.  I'd rather have table columns, and perhaps the entire definition, 
made available to the writers.  I considered making that change but it requires 
making changes that affect all v1 writers. I'd like to remove the metadata use 
altogether, but the change was too big.   Instead, I just pushed the issue into 
a very special case for JSON and fixed the inconsistent null handling.  I 
didn't find any more tests failing, so I'm hoping, perhaps naively, that there 
isn't more (half broken!) need for the metadata.    If there are more uses, 
they are not working correctly with complex types.  
   
   We can go the other way and make all types return metadata, but it copies 
all table attribute metadata without a clean understanding of impact. Then we 
get random attempt to suppress metadata that causes problems (eg varchar 
supression). It will also make the dsv2 transition harder, as we continue to 
flow Column configuration via metadata. We need a clear design and 
documentation on what is transferred and why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to