neilechao commented on code in PR #45375:
URL: https://github.com/apache/arrow/pull/45375#discussion_r1951356925
##########
cpp/src/parquet/types.cc:
##########
@@ -1619,6 +1629,22 @@ class LogicalType::Impl::Float16 final : public
LogicalType::Impl::Incompatible,
GENERATE_MAKE(Float16)
+class LogicalType::Impl::Variant final : public
LogicalType::Impl::Incompatible,
+ public
LogicalType::Impl::SimpleApplicable {
+ public:
+ friend class VariantLogicalType;
+
+ OVERRIDE_TOSTRING(Variant)
+ OVERRIDE_TOTHRIFT(VariantType, VARIANT)
+
+ private:
+ Variant()
+ : LogicalType::Impl(LogicalType::Type::VARIANT, SortOrder::UNKNOWN),
+ LogicalType::Impl::SimpleApplicable(parquet::Type::BYTE_ARRAY) {}
Review Comment:
@wgtmac @emkornfield - I initially had Variant inherit from
SimpleApplicable(BYTE_ARRAY), but Variant should actually be composed of [two
separate byte
arrays](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#variant-in-parquet)
- one for metadata (the dict) and one for values. This muddies the
applicability of VariantLogicalType to a single parquet type.
`parquet column-size variant_basic.parquet
VARIANT_COL.value-> Size In Bytes: 69 Size In Ratio: 0.52671754
VARIANT_COL.metadata-> Size In Bytes: 62 Size In Ratio: 0.47328246`
1. One possibility is to create separate VariantMetadataLogicalType and
VariantValueLogicalType, with VariantLogicalType containing both as class
members. The pros are that this reflects the storage in Parquet, where metadata
and values are stored in separate columns, and the cons are that this diverges
from parquet.thrift and potentially the other language implementations
2. Other options would be to have VariantMetadata and VariantValue present
but not as logical types
What are your thoughts on these approaches?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]