HeartSaVioR commented on pull request #28326: URL: https://github.com/apache/spark/pull/28326#issuecomment-619831846
> Each Attribute/Alias has its own metadata and can easily be hidden by the outer-most Alias. Yeah I see the concern - I'm not sure the column metadata was considered as critical information for the first time it was introduced (was it before Structured Streaming was introduced?), and it becomes critical for the structured streaming queries. If the query fails while analyzing that might be happy case - if the query doesn't fail then state will grow without evicting anything (end users may not notice it if they don't watch the status from streaming listener), and incurs runtime issues in production. > The only "propagation" I know is in Column.name, where we keep the column metadata when adding a new Alias. That sounds as event-time metadata could be lost when we apply non-Alias operations. I roughly remember I've met the situation when I played with flatMapGroupsWithState (where I should convert the untyped Dataset to typed one, especially convert to `Dataset[<case class>]`) but I've just struggled with workaround at that time and I don't have the reproducer as of now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org