Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20484#discussion_r165576328 --- Diff: docs/sql-programming-guide.md --- @@ -1776,6 +1776,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.2 to 2.3 + - Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files and Hive ORC tables. To do that, the following configurations are newly added or change their default values. + + <table class="table"> + <tr> + <th> + <b>Property Name</b> + </th> + <th> + <b>Default</b> + </th> + <th> + <b>Meaning</b> + </th> + </tr> + <tr> + <td> + spark.sql.orc.impl + </td> + <td> + native + </td> + <td> + The name of ORC implementation: 'native' means the native ORC support that is built on Apache ORC 1.4.1 instead of the ORC library in Hive 1.2.1. It is 'hive' by default prior to Spark 2.3. + </td> + </tr> + <tr> + <td> + spark.sql.orc.enableVectorizedReader + </td> + <td> + true + </td> + <td> + Enables vectorized orc decoding in 'native' implementation. If 'false', a new non-vectorized ORC reader is used in 'native' implementation. --- End diff -- Done.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org