[ https://issues.apache.org/jira/browse/SPARK-48587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-48587. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46941 [https://github.com/apache/spark/pull/46941] > Avoid storage amplification when accessing sub-Variant > ------------------------------------------------------ > > Key: SPARK-48587 > URL: https://issues.apache.org/jira/browse/SPARK-48587 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 4.0.0 > Reporter: David Cashman > Assignee: David Cashman > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When a variant_get expression returns a Variant, or a nested type containing > Variant, we just return the sub-slice of the Variant value along with the > full metadata, even though most of the metadata is probably unnecessary to > represent the value. This may be very inefficient if the value is then > written to disk (e.g. shuffle file or parquet). We should instead rebuild the > value with minimal metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org