[arrow-datafusion] branch master updated: fix config descriptions for OPT_COLLECT_STATISTICS and OPT_REPARTITION_WINDOWS (#4623)

agrove Wed, 14 Dec 2022 21:14:41 -0800

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git



The following commit(s) were added to refs/heads/master by this push:
     new 3611d911a fix config descriptions for OPT_COLLECT_STATISTICS and 
OPT_REPARTITION_WINDOWS (#4623)
3611d911a is described below

commit 3611d911a3c9f3740bb1fc0527198be39ff47bfd
Author: Andy Grove <[email protected]>
AuthorDate: Wed Dec 14 22:14:31 2022 -0700

    fix config descriptions for OPT_COLLECT_STATISTICS and 
OPT_REPARTITION_WINDOWS (#4623)
---
 datafusion/core/src/config.rs     | 6 +++---
 docs/source/user-guide/configs.md | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/datafusion/core/src/config.rs b/datafusion/core/src/config.rs
index 091721554..1c98c83ca 100644
--- a/datafusion/core/src/config.rs
+++ b/datafusion/core/src/config.rs
@@ -273,14 +273,14 @@ impl BuiltInConfigs {
 
             ConfigDefinition::new_bool(
                 OPT_REPARTITION_WINDOWS,
-                "Should DataFusion collect statistics after listing files",
+                "Should DataFusion repartition data using the partitions keys 
to execute window \
+                 functions in parallel using the provided `target_partitions` 
level",
                 true
             ),
 
             ConfigDefinition::new_bool(
                 OPT_COLLECT_STATISTICS,
-                "Should DataFusion repartition data using the partitions keys 
to execute window \
-                 functions in parallel using the provided `target_partitions` 
level",
+                "Should DataFusion collect statistics after listing files",
                 false
             ),
 
diff --git a/docs/source/user-guide/configs.md 
b/docs/source/user-guide/configs.md
index 81b1ef20a..039981338 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -44,7 +44,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.execution.batch_size                           | UInt64  | 8192   
 | Default batch size while creating new batches, it's especially useful for 
buffer-in-memory batches since creating tiny batches would results in too much 
metadata memory consumption.                                                    
                                                                                
                                     |
 | datafusion.execution.coalesce_batches                     | Boolean | true   
 | When set to true, record batches will be examined between each operator and 
small batches will be coalesced into larger batches. This is helpful when there 
are highly selective filters or joins that could produce tiny output batches. 
The target batch size is determined by the configuration setting 
'datafusion.execution.coalesce_target_batch_size'. |
 | datafusion.execution.coalesce_target_batch_size           | UInt64  | 4096   
 | Target batch size when coalescing batches. Uses in conjunction with the 
configuration setting 'datafusion.execution.coalesce_batches'.                  
                                                                                
                                                                                
                                      |
-| datafusion.execution.collect_statistics                   | Boolean | false  
 | Should DataFusion repartition data using the partitions keys to execute 
window functions in parallel using the provided `target_partitions` level       
                                                                                
                                                                                
                                      |
+| datafusion.execution.collect_statistics                   | Boolean | false  
 | Should DataFusion collect statistics after listing files                     
                                                                                
                                                                                
                                                                                
                                 |
 | datafusion.execution.parquet.enable_page_index            | Boolean | false  
 | If true, uses parquet data page level metadata (Page Index) statistics to 
reduce the number of rows decoded.                                              
                                                                                
                                                                                
                                    |
 | datafusion.execution.parquet.metadata_size_hint           | UInt64  | NULL   
 | If specified, the parquet reader will try and fetch the last `size_hint` 
bytes of the parquet file optimistically. If not specified, two read are 
required: One read to fetch the 8-byte parquet footer and another to fetch the 
metadata length encoded in the footer.                                          
                                             |
 | datafusion.execution.parquet.pruning                      | Boolean | true   
 | If true, the parquet reader attempts to skip entire row groups based on the 
predicate in the query and the metadata (min/max values) stored in the parquet 
file.                                                                           
                                                                                
                                   |
@@ -62,6 +62,6 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.optimizer.prefer_hash_join                     | Boolean | true   
 | When set to true, the physical plan optimizer will prefer HashJoin over 
SortMergeJoin. HashJoin can work more efficientlythan SortMergeJoin but 
consumes more memory. Defaults to true                                          
                                                                                
                                              |
 | datafusion.optimizer.repartition_aggregations             | Boolean | true   
 | Should DataFusion repartition data using the aggregate keys to execute 
aggregates in parallel using the provided `target_partitions` level             
                                                                                
                                                                                
                                       |
 | datafusion.optimizer.repartition_joins                    | Boolean | true   
 | Should DataFusion repartition data using the join keys to execute joins in 
parallel using the provided `target_partitions` level                           
                                                                                
                                                                                
                                   |
-| datafusion.optimizer.repartition_windows                  | Boolean | true   
 | Should DataFusion collect statistics after listing files                     
                                                                                
                                                                                
                                                                                
                                 |
+| datafusion.optimizer.repartition_windows                  | Boolean | true   
 | Should DataFusion repartition data using the partitions keys to execute 
window functions in parallel using the provided `target_partitions` level       
                                                                                
                                                                                
                                      |
 | datafusion.optimizer.skip_failed_rules                    | Boolean | true   
 | When set to true, the logical plan optimizer will produce warning messages 
if any optimization rules produce errors and then proceed to the next rule. 
When set to false, any rules that produce errors will cause the query to fail.  
                                                                                
                                       |
 | datafusion.optimizer.top_down_join_key_reordering         | Boolean | true   
 | When set to true, the physical plan optimizer will run a top down process to 
reorder the join keys. Defaults to true                                         
                                                                                
                                                                                
                                 |

[arrow-datafusion] branch master updated: fix config descriptions for OPT_COLLECT_STATISTICS and OPT_REPARTITION_WINDOWS (#4623)

Reply via email to