[
https://issues.apache.org/jira/browse/HIVE-29183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko updated HIVE-29183:
----------------------------------
Description:
A variant is a value that stores semi-structured data. The structure and data
types in a variant are not necessarily consistent across rows in a table or
data file. The variant type and binary encoding are defined in the Parquet
project, with support currently available for V1. Support for Variant is added
in Iceberg v3.
Variants are similar to JSON with a wider set of primitive values including
date, timestamp, timestamptz, binary, and decimals.
Variant values may contain nested types:
* An array is an ordered collection of variant values.
* An object is a collection of fields that are a string key and a variant value.
As a semi-structured type, there are important differences between variant and
Iceberg's other types:
* Variant arrays are similar to lists, but may contain any variant value rather
than a fixed element type.
* Variant objects are similar to structs, but may contain variable fields
identified by name and field values may be any variant value rather than a
fixed field type.
Variant data types allow for the efficient binary encoding of dynamic
semi-structured data such as JSON, Avro, Parquet, etc. By encoding
semi-structured data as a variant column, we retain the flexibility of the
source data, while allowing query engines to more efficiently operate on the
data.
With the support of Variant type, such data can be encoded in an efficient
binary representation internally for better performance. Without that, we need
to parse the data in its format inefficiently.
This will allow the following use cases:
* Create an Iceberg table with a Variant column
CREATE TABLE car_sales(record Variant);
* Insert semi-structured data into the Variant column
INSERT INTO car_sales SELECT PARSE_JSON(<json_string>)
* Query against the semi-structured data
SELECT VARIANT_GET(record, '$.dealer.ship', 'string') FROM car_sales
was:
A variant is a value that stores semi-structured data. The structure and data
types in a variant are not necessarily consistent across rows in a table or
data file. The variant type and binary encoding are defined in the Parquet
project, with support currently available for V1. Support for Variant is added
in Iceberg v3.
Variants are similar to JSON with a wider set of primitive values including
date, timestamp, timestamptz, binary, and decimals.
Variant values may contain nested types:
An array is an ordered collection of variant values.
An object is a collection of fields that are a string key and a variant value.
As a semi-structured type, there are important differences between variant and
Iceberg's other types:
Variant arrays are similar to lists, but may contain any variant value rather
than a fixed element type.
Variant objects are similar to structs, but may contain variable fields
identified by name and field values may be any variant value rather than a
fixed field type.
> Integrating Variant Type into Hive
> ----------------------------------
>
> Key: HIVE-29183
> URL: https://issues.apache.org/jira/browse/HIVE-29183
> Project: Hive
> Issue Type: New Feature
> Components: Hive, Iceberg integration, SQL
> Reporter: Denys Kuzmenko
> Priority: Major
>
> A variant is a value that stores semi-structured data. The structure and data
> types in a variant are not necessarily consistent across rows in a table or
> data file. The variant type and binary encoding are defined in the Parquet
> project, with support currently available for V1. Support for Variant is
> added in Iceberg v3.
> Variants are similar to JSON with a wider set of primitive values including
> date, timestamp, timestamptz, binary, and decimals.
> Variant values may contain nested types:
> * An array is an ordered collection of variant values.
> * An object is a collection of fields that are a string key and a variant
> value.
> As a semi-structured type, there are important differences between variant
> and Iceberg's other types:
> * Variant arrays are similar to lists, but may contain any variant value
> rather than a fixed element type.
> * Variant objects are similar to structs, but may contain variable fields
> identified by name and field values may be any variant value rather than a
> fixed field type.
> Variant data types allow for the efficient binary encoding of dynamic
> semi-structured data such as JSON, Avro, Parquet, etc. By encoding
> semi-structured data as a variant column, we retain the flexibility of the
> source data, while allowing query engines to more efficiently operate on the
> data.
> With the support of Variant type, such data can be encoded in an efficient
> binary representation internally for better performance. Without that, we
> need to parse the data in its format inefficiently.
> This will allow the following use cases:
> * Create an Iceberg table with a Variant column
> CREATE TABLE car_sales(record Variant);
> * Insert semi-structured data into the Variant column
> INSERT INTO car_sales SELECT PARSE_JSON(<json_string>)
> * Query against the semi-structured data
> SELECT VARIANT_GET(record, '$.dealer.ship', 'string') FROM car_sales
--
This message was sent by Atlassian Jira
(v8.20.10#820010)