This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 9e8198d3115 [SPARK-40726][DOCS] Supplement undocumented orc configurations in documentation 9e8198d3115 is described below commit 9e8198d3115848ba87b4c71b43fd7212a1b729c3 Author: Qian.Sun <qian.sun2...@gmail.com> AuthorDate: Mon Oct 10 09:59:37 2022 -0500 [SPARK-40726][DOCS] Supplement undocumented orc configurations in documentation ### What changes were proposed in this pull request? This PR aims to supplement undocumented orc configurations in documentation. ### Why are the changes needed? Help users to confirm configurations through documentation instead of code. ### Does this PR introduce _any_ user-facing change? Yes, more configurations in documentations. ### How was this patch tested? Pass the GA. Closes #38188 from dcoliversun/SPARK-40726. Authored-by: Qian.Sun <qian.sun2...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- docs/sql-data-sources-orc.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index 28e237a382d..200037a7dea 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -153,6 +153,24 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC </td> <td>2.3.0</td> </tr> + <tr> + <td><code>spark.sql.orc.columnarReaderBatchSize</code></td> + <td><code>4096</code></td> + <td> + The number of rows to include in an orc vectorized reader batch. The number should + be carefully chosen to minimize overhead and avoid OOMs in reading data. + </td> + <td>2.4.0</td> + </tr> + <tr> + <td><code>spark.sql.orc.columnarWriterBatchSize</code></td> + <td><code>1024</code></td> + <td> + The number of rows to include in an orc vectorized writer batch. The number should + be carefully chosen to minimize overhead and avoid OOMs in writing data. + </td> + <td>3.4.0</td> + </tr> <tr> <td><code>spark.sql.orc.enableNestedColumnVectorizedReader</code></td> <td><code>false</code></td> @@ -163,6 +181,25 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC </td> <td>3.2.0</td> </tr> + <tr> + <td><code>spark.sql.orc.filterPushdown</code></td> + <td><code>true</code></td> + <td> + When true, enable filter pushdown for ORC files. + </td> + <td>1.4.0</td> + </tr> + <tr> + <td><code>spark.sql.orc.aggregatePushdown</code></td> + <td><code>false</code></td> + <td> + If true, aggregates will be pushed down to ORC for optimization. Support MIN, MAX and + COUNT as aggregate expression. For MIN/MAX, support boolean, integer, float and date + type. For COUNT, support all data types. If statistics is missing from any ORC file + footer, exception would be thrown. + </td> + <td>3.3.0</td> + </tr> <tr> <td><code>spark.sql.orc.mergeSchema</code></td> <td>false</td> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org