This is an automated email from the ASF dual-hosted git repository.

jakevin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 90775b4832 Minor: Update documentation for 
`datafusion.execution.parquet.enable_page_index` (#6342)
90775b4832 is described below

commit 90775b4832fcc066d41025bc3ab29a5d8b8fbccf
Author: Andrew Lamb <and...@nerdnetworks.org>
AuthorDate: Sat May 13 00:57:58 2023 -0400

    Minor: Update documentation for 
`datafusion.execution.parquet.enable_page_index` (#6342)
---
 datafusion/common/src/config.rs   | 5 +++--
 docs/source/user-guide/configs.md | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
index 0d86131db1..c5ce3540fc 100644
--- a/datafusion/common/src/config.rs
+++ b/datafusion/common/src/config.rs
@@ -241,8 +241,9 @@ config_namespace! {
 config_namespace! {
     /// Options related to reading of parquet files
     pub struct ParquetOptions {
-        /// If true, uses parquet data page level metadata (Page Index) 
statistics
-        /// to reduce the number of rows decoded.
+        /// If true, reads the Parquet data page level metadata (the
+        /// Page Index), if present, to reduce the I/O and number of
+        /// rows decoded.
         pub enable_page_index: bool, default = true
 
         /// If true, the parquet reader attempts to skip entire row groups 
based
diff --git a/docs/source/user-guide/configs.md 
b/docs/source/user-guide/configs.md
index d64f327e06..32001b9664 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -49,7 +49,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.execution.collect_statistics                    | false      | 
Should DataFusion collect statistics after listing files                        
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | datafusion.execution.target_partitions                     | 0          | 
Number of partitions for query execution. Increasing partitions can increase 
concurrency. Defaults to the number of CPU cores on the system                  
                                                                                
                                                                                
                                                                                
                    [...]
 | datafusion.execution.time_zone                             | +00:00     | 
The default time zone Some functions, e.g. `EXTRACT(HOUR from SOME_TIME)`, 
shift the underlying datetime according to this time zone, and then extract the 
hour                                                                            
                                                                                
                                                                                
                      [...]
-| datafusion.execution.parquet.enable_page_index             | true       | If 
true, uses parquet data page level metadata (Page Index) statistics to reduce 
the number of rows decoded.                                                     
                                                                                
                                                                                
                                                                                
                [...]
+| datafusion.execution.parquet.enable_page_index             | true       | If 
true, reads the Parquet data page level metadata (the Page Index), if present, 
to reduce the I/O and number of rows decoded.                                   
                                                                                
                                                                                
                                                                                
               [...]
 | datafusion.execution.parquet.pruning                       | true       | If 
true, the parquet reader attempts to skip entire row groups based on the 
predicate in the query and the metadata (min/max values) stored in the parquet 
file                                                                            
                                                                                
                                                                                
                      [...]
 | datafusion.execution.parquet.skip_metadata                 | true       | If 
true, the parquet reader skip the optional embedded metadata that may be in the 
file Schema. This setting can help avoid schema conflicts when querying 
multiple parquet files with schemas containing compatible types but different 
metadata                                                                        
                                                                                
                        [...]
 | datafusion.execution.parquet.metadata_size_hint            | NULL       | If 
specified, the parquet reader will try and fetch the last `size_hint` bytes of 
the parquet file optimistically. If not specified, two reads are required: One 
read to fetch the 8-byte parquet footer and another to fetch the metadata 
length encoded in the footer                                                    
                                                                                
                      [...]

Reply via email to