Harsh Motwani created SPARK-49443: ------------------------------------- Summary: Implement to_variant_object expression and make schema_of_variant expressions print OBJECT for for Variant Objects Key: SPARK-49443 URL: https://issues.apache.org/jira/browse/SPARK-49443 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Harsh Motwani
Cast from structs to variant objects should not be legal since variant objects are unordered bags of key-value pairs while structs are ordered sets of elements of fixed types. Therefore, casts between structs and variant objects do not behave like casts between structs. Example (produced by Serge Rielau): {code:java} scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2') as struct<b int, c int>)").show() +------------------------+ |named_struct(c, 1, b, 2)| +------------------------+ |{1, 2}| +------------------------+ Passing a struct into VARIANT loses the position scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2')::variant as struct<b int, c int>)").show() +-----------------------------------------+ |CAST(named_struct(c, 1, b, 2) AS VARIANT)| +-----------------------------------------+ |{2, 1}| +-----------------------------------------+ {code} Casts from maps to variant objects should also not be legal since they represent completely orthogonal data types. Maps can represent a variable number of key value pairs based on just a key and value type in the schema but in objects, the schema (produced by schema_of_variant expressions) will have a type corresponding to each value in the object. Objects can have values of different types while maps cannot and objects can only have string keys while maps can also have complex keys. We should therefore prohibit the existing behavior of allowing explicit casts from structs and maps to variants as the variant spec currently only supports an object type which is remotely compatible with structs and maps. We should introduce a new expression that converts schemas containing structs and maps to variants. We will call it `to_variant_object`. Also, schema_of_variant and schema_of_variant_agg expressions currently print STRUCT when Variant Objects are observed. We should also correct that to OBJECT. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org