[GitHub] [arrow] bkietz commented on a change in pull request #8367: ARROW-10099: [C++][Dataset] Simplify type inference for partition columns

GitBox Wed, 07 Oct 2020 07:26:38 -0700


bkietz commented on a change in pull request #8367:
URL: https://github.com/apache/arrow/pull/8367#discussion_r501056321




##########
File path: cpp/src/arrow/dataset/partition.h
##########
@@ -85,14 +85,11 @@ class ARROW_DS_EXPORT Partitioning {
 };
 
 struct PartitioningFactoryOptions {
-  /// When inferring a schema for partition fields, string fields may be 
inferred as
-  /// a dictionary type instead. This can be more efficient when materializing 
virtual
-  /// columns. If the number of discovered unique values of a string field 
exceeds
-  /// max_partition_dictionary_size, it will instead be inferred as a string.
-  ///
-  /// max_partition_dictionary_size = 0: No fields will be inferred as 
dictionary.
-  /// max_partition_dictionary_size = -1: All fields will be inferred as 
dictionary.
-  int max_partition_dictionary_size = 0;
+  /// When inferring a schema for partition fields, yield dictionary encoded 
types
+  /// instead of plain. This can be more efficient when materializing virtual
+  /// columns, and Expressions parsed by the finished Partitioning will include
+  /// dictionaries of all unique inspected values for each field.
+  bool inspect_dictionary = false;

Review comment:
       I'll rename to `infer_dictionary`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] bkietz commented on a change in pull request #8367: ARROW-10099: [C++][Dataset] Simplify type inference for partition columns

Reply via email to