neilechao commented on code in PR #45375:
URL: https://github.com/apache/arrow/pull/45375#discussion_r1951356925


##########
cpp/src/parquet/types.cc:
##########
@@ -1619,6 +1629,22 @@ class LogicalType::Impl::Float16 final : public 
LogicalType::Impl::Incompatible,
 
 GENERATE_MAKE(Float16)
 
+class LogicalType::Impl::Variant final : public 
LogicalType::Impl::Incompatible,
+                                         public 
LogicalType::Impl::SimpleApplicable {
+ public:
+  friend class VariantLogicalType;
+
+  OVERRIDE_TOSTRING(Variant)
+  OVERRIDE_TOTHRIFT(VariantType, VARIANT)
+
+ private:
+  Variant()
+      : LogicalType::Impl(LogicalType::Type::VARIANT, SortOrder::UNKNOWN),
+        LogicalType::Impl::SimpleApplicable(parquet::Type::BYTE_ARRAY) {}

Review Comment:
   @wgtmac @emkornfield - I initially had Variant inherit from 
SimpleApplicable(BYTE_ARRAY), but Variant should actually be composed of [two 
separate byte 
arrays](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#variant-in-parquet)
 - one for metadata (the dict) and one for values. This muddies the 
applicability of VariantLogicalType to a single parquet type.
   
   `parquet column-size variant_basic.parquet
   VARIANT_COL.value-> Size In Bytes: 69 Size In Ratio: 0.52671754
   VARIANT_COL.metadata-> Size In Bytes: 62 Size In Ratio: 0.47328246`
   
   1. One possibility is to create separate VariantMetadataLogicalType and 
VariantValueLogicalType, with VariantLogicalType containing both as class 
members. The pros are that this reflects the storage in Parquet, where metadata 
and values are stored in separate columns, and the cons are that this diverges 
from parquet.thrift and potentially the other language implementations
   2. Other options would be to have VariantMetadata and VariantValue present 
but not as logical types
   
   What are your thoughts on these approaches?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to