[GitHub] [spark] HyukjinKwon commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

GitBox Fri, 14 May 2021 01:53:29 -0700


HyukjinKwon commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r632384490




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -909,13 +909,10 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
    * }}}
    * The text files will be encoded as UTF-8.
    *
-   * You can set the following option(s) for writing text files:
-   * <ul>
-   * <li>`compression` (default `null`): compression codec to use when saving 
to file. This can be
-   * one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, 
`lz4`,
-   * `snappy` and `deflate`). </li>
-   * <li>`lineSep` (default `\n`): defines the line separator that should be 
used for writing.</li>

Review comment:
       this isn't `orc`. It's `text`.

##########
File path: docs/sql-data-sources-orc.md
##########
@@ -172,3 +172,32 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
   <td>2.0.0</td>
   </tr>
 </table>
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of `DataFrameReader` or `DataFrameWriter`
+* the `.option`/`.options` methods of `DataStreamReader` or `DataStreamWriter`
+
+<table class="table">
+  <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
+  <tr>
+    <td><code>mergeSchema</code></td>
+    <td>None</td>
+    <td>sets whether we should merge schemas collected from all ORC 
part-files. This will override <code>spark.sql.orc.mergeSchema</code>. The 
default value is specified in <code>spark.sql.orc.mergeSchema</code>.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>compression</code></td>
+    <td>None</td>
+    <td>compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This 
will override <code>orc.compress</code> and 
<code>spark.sql.orc.compression.codec</code>. If None is set, it uses the value 
specified in <code>spark.sql.orc.compression.codec</code>.</td>
+    <td>write</td>
+  </tr>
+  <tr>
+    <td><code>lineSep</code></td>

Review comment:
       Can you remove this? it's text source option, not ORC




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

Reply via email to