[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

GitBox Thu, 27 May 2021 03:53:05 -0700


HyukjinKwon commented on a change in pull request #32658:
URL: https://github.com/apache/spark/pull/32658#discussion_r640512683




##########
File path: docs/sql-data-sources-csv.md
##########
@@ -38,3 +36,223 @@ Spark SQL provides `spark.read().csv("file_name")` to read 
a file or directory o
 </div>
 
 </div>
+
+## Data Source Option
+
+Data source options of CSV can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+* the built-in functions below
+  * `from_csv`
+  * `to_csv`
+  * `schema_of_csv`
+* `OPTIONS` clause at [CREATE TABLE USING 
DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+
+
+<table class="table">
+  <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
+  <tr>
+    <td><code>sep</code></td>
+    <td>,</td>
+    <td>Sets a separator (one or more characters) for each field and 
value.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>encoding</code></td>
+    <td><code>UTF-8</code> for reading, not set for writing</td>
+    <td>For reading, decodes the CSV files by the given encoding type.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>quote</code></td>
+    <td>"</td>
+    <td>Sets a single character used for escaping quoted values where the 
separator can be part of the value. If you would like to turn off quotations, 
you need to set an empty string. If an empty string is set, it uses 
<code>u0000</code> (null character).</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>quoteAll</code></td>
+    <td>false</td>
+    <td>A flag indicating whether all values should always be enclosed in 
quotes. It only escapes values containing a quote character by default.</td>
+    <td>write</td>
+  </tr>
+  <tr>
+    <td><code>escape</code></td>
+    <td>\</td>
+    <td>Sets a single character used for escaping quotes inside an already 
quoted value.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>escapeQuotes</code></td>
+    <td>true</td>
+    <td>a flag indicating whether values containing quotes should always be 
enclosed in quotes. It escapes all values containing a quote character by 
default.</td>
+    <td>write</td>
+  </tr>
+  <tr>
+    <td><code>comment</code></td>
+    <td>""</td>
+    <td>Sets a single character used for skipping lines beginning with this 
character.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>header</code></td>
+    <td>false</td>
+    <td>For reading, uses the first line as names of columns. For writing, 
writes the names of columns as the first line. Note that if the given path is a 
RDD of Strings, this header option will remove all lines same with the header 
if exists.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>inferSchema</code></td>
+    <td>false</td>
+    <td>Infers the input schema automatically from data. It requires one extra 
pass over the data.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>enforceSchema</code></td>
+    <td>true</td>
+    <td>If it is set to <code>true</code>, the specified or inferred schema 
will be forcibly applied to datasource files, and headers in CSV files will be 
ignored. If the option is set to <code>false</code>, the schema will be 
validated against all headers in CSV files or the first header in RDD if the 
<code>header</code> option is set to <code>true</code>. Field names in the 
schema and column names in CSV headers are checked by their positions taking 
into account <code>spark.sql.caseSensitive</code>. Though the default value is 
<code>true</code>, it is recommended to disable the <code>enforceSchema</code> 
option to avoid incorrect results.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>ignoreLeadingWhiteSpace</code></td>
+    <td><code>false</code> for reading, <code>true</code> for writing</td>

Review comment:
       ```suggestion
       <td><code>false</code> (for reading), <code>true</code> (for 
writing)</td>
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

Reply via email to