HyukjinKwon commented on a change in pull request #32658: URL: https://github.com/apache/spark/pull/32658#discussion_r642315323
########## File path: docs/sql-data-sources-csv.md ########## @@ -38,3 +36,217 @@ Spark SQL provides `spark.read().csv("file_name")` to read a file or directory o </div> </div> + +## Data Source Option + +Data source options of CSV can be set via: +* the `.option`/`.options` methods of + * `DataFrameReader` + * `DataFrameWriter` + * `DataStreamReader` + * `DataStreamWriter` +* the built-in functions below + * `from_csv` + * `to_csv` + * `schema_of_csv` +* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) + + +<table class="table"> + <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr> + <tr> + <td><code>sep</code></td> + <td>,</td> + <td>Sets a separator for each field and value. This separator can be one or more characters.</td> + <td>read/write</td> + </tr> + <tr> + <td><code>encoding</code></td> + <td><code>UTF-8</code> for reading, not set for writing</td> + <td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files</td> + <td>read/write</td> + </tr> + <tr> + <td><code>quote</code></td> + <td>""</td> + <td>Sets a single character used for escaping quoted values where the separator can be part of the value. For reading, If you would like to turn off quotations, you need to set not `null` but an empty string. This behaviour is different from <code>com.databricks.spark.csv</code>. For writing, If an empty string is set, it uses <code>u0000</code> (null character).</td> + <td>read/write</td> + </tr> + <tr> + <td><code>quoteAll</code></td> + <td>false</td> + <td>A flag indicating whether all values should always be enclosed in quotes. Default is to only escape values containing a quote character.</td> + <td>write</td> + </tr> + <tr> + <td><code>escape</code></td> + <td>\</td> + <td>Sets a single character used for escaping quotes inside an already quoted value.</td> + <td>read/write</td> + </tr> + <tr> + <td><code>escapeQuotes</code></td> + <td>true</td> + <td>A flag indicating whether values containing quotes should always be enclosed in quotes. Default is to escape all values containing a quote character.</td> + <td>write</td> + </tr> + <tr> + <td><code>comment</code></td> + <td>empty string</td> Review comment: `<td>empty string</td>` -> `<td></td>`. I think we can write like this for empty strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org