[spark] branch master updated: [SPARK-40726][DOCS] Supplement undocumented orc configurations in documentation

srowen Mon, 10 Oct 2022 08:01:02 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9e8198d3115 [SPARK-40726][DOCS] Supplement undocumented orc 
configurations in documentation
9e8198d3115 is described below

commit 9e8198d3115848ba87b4c71b43fd7212a1b729c3
Author: Qian.Sun <qian.sun2...@gmail.com>
AuthorDate: Mon Oct 10 09:59:37 2022 -0500

    [SPARK-40726][DOCS] Supplement undocumented orc configurations in 
documentation
    
    ### What changes were proposed in this pull request?
    
    This PR aims to supplement undocumented orc configurations in documentation.
    
    ### Why are the changes needed?
    
    Help users to confirm configurations through documentation instead of code.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, more configurations in documentations.
    
    ### How was this patch tested?
    
    Pass the GA.
    
    Closes #38188 from dcoliversun/SPARK-40726.
    
    Authored-by: Qian.Sun <qian.sun2...@gmail.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 docs/sql-data-sources-orc.md | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md
index 28e237a382d..200037a7dea 100644
--- a/docs/sql-data-sources-orc.md
+++ b/docs/sql-data-sources-orc.md
@@ -153,6 +153,24 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
     </td>
     <td>2.3.0</td>
   </tr>
+  <tr>
+    <td><code>spark.sql.orc.columnarReaderBatchSize</code></td>
+    <td><code>4096</code></td>
+    <td>
+      The number of rows to include in an orc vectorized reader batch. The 
number should 
+      be carefully chosen to minimize overhead and avoid OOMs in reading data.
+    </td>
+    <td>2.4.0</td>
+  </tr>
+  <tr>
+    <td><code>spark.sql.orc.columnarWriterBatchSize</code></td>
+    <td><code>1024</code></td>
+    <td>
+      The number of rows to include in an orc vectorized writer batch. The 
number should 
+      be carefully chosen to minimize overhead and avoid OOMs in writing data.
+    </td>
+    <td>3.4.0</td>
+  </tr>
   <tr>
     <td><code>spark.sql.orc.enableNestedColumnVectorizedReader</code></td>
     <td><code>false</code></td>
@@ -163,6 +181,25 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
     </td>
     <td>3.2.0</td>
   </tr>
+  <tr>
+    <td><code>spark.sql.orc.filterPushdown</code></td>
+    <td><code>true</code></td>
+    <td>
+      When true, enable filter pushdown for ORC files.
+    </td>
+    <td>1.4.0</td>
+  </tr>
+  <tr>
+    <td><code>spark.sql.orc.aggregatePushdown</code></td>
+    <td><code>false</code></td>
+    <td>
+      If true, aggregates will be pushed down to ORC for optimization. Support 
MIN, MAX and 
+      COUNT as aggregate expression. For MIN/MAX, support boolean, integer, 
float and date 
+      type. For COUNT, support all data types. If statistics is missing from 
any ORC file 
+      footer, exception would be thrown.
+    </td>
+    <td>3.3.0</td>
+  </tr>
   <tr>
   <td><code>spark.sql.orc.mergeSchema</code></td>
   <td>false</td>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40726][DOCS] Supplement undocumented orc configurations in documentation

Reply via email to