wgtmac commented on code in PR #45375:
URL: https://github.com/apache/arrow/pull/45375#discussion_r1962879360
##########
cpp/src/parquet/types.cc:
##########
@@ -1619,6 +1629,22 @@ class LogicalType::Impl::Float16 final : public
LogicalType::Impl::Incompatible,
GENERATE_MAKE(Float16)
+class LogicalType::Impl::Variant final : public
LogicalType::Impl::Incompatible,
+ public
LogicalType::Impl::SimpleApplicable {
+ public:
+ friend class VariantLogicalType;
+
+ OVERRIDE_TOSTRING(Variant)
+ OVERRIDE_TOTHRIFT(VariantType, VARIANT)
+
+ private:
+ Variant()
+ : LogicalType::Impl(LogicalType::Type::VARIANT, SortOrder::UNKNOWN),
+ LogicalType::Impl::SimpleApplicable(parquet::Type::BYTE_ARRAY) {}
Review Comment:
Sorry that I may have not explained it clearly.
Parquet has two different kind of nodes: `primitive` and `group`.
`primitive` node is usually for a primitive physical type (e.g. int64, double,
binary, etc.) while `group` is for a complex type (e.g. struct, map, list,
etc.). Whatever the node type is, it occupies a `SchemaElement` in the Parquet
schema:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L502
`LogicalType` can be assigned to both primitive and group node. Most logical
types annotate a primitive node, examples are DECIMAL, TIMESTAMP, etc. However,
`List/Map/Variant` are logical types that must annotate a group node.
For `List` and `Map`, their group structures are dynamic because of
different subtypes (the element type of a List or the key/value types of a
Map). For `Variant`, its group structure is also dynamic depending on whether
it is shredded and shredded value types.
> I updated the thrift definition of variant to include required metadata
and value binary members
Based on above explanation, we cannot modify `parquet.thrift`. To facilitate
the current development, maybe we can define a `VariantExtensionType` similar
to implementations at
https://github.com/apache/arrow/tree/01e3f1e6829d6fcc9021ac47aebb6350590ca134/cpp/src/arrow/extension
with a storage type of `struct<metadata:binary,value:binary>`. Once stable, we
can make it canonical following the procedure at
https://arrow.apache.org/docs/format/CanonicalExtensions.html
Do you have any opinion? @emkornfield @pitrou @mapleFU
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]