HeartSaVioR commented on pull request #28326:
URL: https://github.com/apache/spark/pull/28326#issuecomment-619831846


   > Each Attribute/Alias has its own metadata and can easily be hidden by the 
outer-most Alias.
   
   Yeah I see the concern - I'm not sure the column metadata was considered as 
critical information for the first time it was introduced (was it before 
Structured Streaming was introduced?), and it becomes critical for the 
structured streaming queries. If the query fails while analyzing that might be 
happy case - if the query doesn't fail then state will grow without evicting 
anything (end users may not notice it if they don't watch the status from 
streaming listener), and incurs runtime issues in production.
   
   > The only "propagation" I know is in Column.name, where we keep the column 
metadata when adding a new Alias.
   
   That sounds as event-time metadata could be lost when we apply non-Alias 
operations. I roughly remember I've met the situation when I played with 
flatMapGroupsWithState (where I should convert the untyped Dataset to typed 
one, especially convert to `Dataset[<case class>]`) but I've just struggled 
with workaround at that time and I don't have the reproducer as of now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to