[spark] branch master updated: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

gurwls223 Mon, 31 May 2021 18:59:31 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 73d4f67  [SPARK-35433][DOCS] Move CSV data source options from Python 
and Scala into a single page
73d4f67 is described below

commit 73d4f67145dd7fbad282a9608ac2ac0f31c4b385
Author: itholic <haejoon....@databricks.com>
AuthorDate: Tue Jun 1 10:58:49 2021 +0900

    [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into 
a single page
    
    ### What changes were proposed in this pull request?
    
    This PR proposes move CSV data source options from Python, Scala and Java 
into a single page.
    
    ### Why are the changes needed?
    
    So far, the documentation for CSV data source options is separated into 
different pages for each language API documents. However, this makes managing 
many options inconvenient, so it is efficient to manage all options in a single 
page and provide a link to that page in the API of each language.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the documents will be shown below after this change:
    
    - "CSV Files" page
    <img width="970" alt="Screen Shot 2021-05-27 at 12 35 36 PM" 
src="https://user-images.githubusercontent.com/44108233/119762269-586a8c80-bee8-11eb-8443-ae5b3c7a685c.png";>
    
    - Python
    <img width="785" alt="Screen Shot 2021-05-25 at 4 12 10 PM" 
src="https://user-images.githubusercontent.com/44108233/119455390-83cc6a80-bd74-11eb-9156-65785ae27db0.png";>
    
    - Scala
    <img width="718" alt="Screen Shot 2021-05-25 at 4 12 39 PM" 
src="https://user-images.githubusercontent.com/44108233/119455414-89c24b80-bd74-11eb-9775-aeda549d081e.png";>
    
    - Java
    <img width="667" alt="Screen Shot 2021-05-25 at 4 13 09 PM" 
src="https://user-images.githubusercontent.com/44108233/119455422-8d55d280-bd74-11eb-97e8-86c1eabeadc2.png";>
    
    ### How was this patch tested?
    
    Manually build docs and confirm the page.
    
    Closes #32658 from itholic/SPARK-35433.
    
    Authored-by: itholic <haejoon....@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 docs/sql-data-sources-csv.md                       | 216 +++++++++++++++-
 docs/sql-data-sources-text.md                      |   2 +-
 python/pyspark/sql/readwriter.py                   | 277 +++------------------
 python/pyspark/sql/streaming.py                    | 170 ++-----------
 .../org/apache/spark/sql/DataFrameReader.scala     | 116 +--------
 .../org/apache/spark/sql/DataFrameWriter.scala     |  45 +---
 .../scala/org/apache/spark/sql/functions.scala     |  28 ++-
 .../spark/sql/streaming/DataStreamReader.scala     |  99 +-------
 8 files changed, 314 insertions(+), 639 deletions(-)

diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md
index d5390e5..2fe8f77 100644
--- a/docs/sql-data-sources-csv.md
+++ b/docs/sql-data-sources-csv.md
@@ -21,8 +21,6 @@ license: |
 
 Spark SQL provides `spark.read().csv("file_name")` to read a file or directory 
of files in CSV format into Spark DataFrame, and 
`dataframe.write().csv("path")` to write to a CSV file. Function `option()` can 
be used to customize the behavior of reading or writing, such as controlling 
behavior of the header, delimiter character, character set, and so on. 
 
-<!--TODO: add `option()` document reference--> 
-
 <div class="codetabs">
 
 <div data-lang="scala"  markdown="1">
@@ -38,3 +36,217 @@ Spark SQL provides `spark.read().csv("file_name")` to read 
a file or directory o
 </div>
 
 </div>
+
+## Data Source Option
+
+Data source options of CSV can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+* the built-in functions below
+  * `from_csv`
+  * `to_csv`
+  * `schema_of_csv`
+* `OPTIONS` clause at [CREATE TABLE USING 
DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+
+
+<table class="table">
+  <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
+  <tr>
+    <td><code>sep</code></td>
+    <td>,</td>
+    <td>Sets a separator for each field and value. This separator can be one 
or more characters.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>encoding</code></td>
+    <td>UTF-8</td>
+    <td>For reading, decodes the CSV files by the given encoding type. For 
writing, specifies encoding (charset) of saved CSV files</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>quote</code></td>
+    <td>"</td>
+    <td>Sets a single character used for escaping quoted values where the 
separator can be part of the value. For reading, if you would like to turn off 
quotations, you need to set not <code>null</code> but an empty string. For 
writing, if an empty string is set, it uses <code>u0000</code> (null 
character).</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>quoteAll</code></td>
+    <td>false</td>
+    <td>A flag indicating whether all values should always be enclosed in 
quotes. Default is to only escape values containing a quote character.</td>
+    <td>write</td>
+  </tr>
+  <tr>
+    <td><code>escape</code></td>
+    <td>\</td>
+    <td>Sets a single character used for escaping quotes inside an already 
quoted value.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>escapeQuotes</code></td>
+    <td>true</td>
+    <td>A flag indicating whether values containing quotes should always be 
enclosed in quotes. Default is to escape all values containing a quote 
character.</td>
+    <td>write</td>
+  </tr>
+  <tr>
+    <td><code>comment</code></td>
+    <td></td>
+    <td>Sets a single character used for skipping lines beginning with this 
character. By default, it is disabled.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>header</code></td>
+    <td>false</td>
+    <td>For reading, uses the first line as names of columns. For writing, 
writes the names of columns as the first line. Note that if the given path is a 
RDD of Strings, this header option will remove all lines same with the header 
if exists.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>inferSchema</code></td>
+    <td>false</td>
+    <td>Infers the input schema automatically from data. It requires one extra 
pass over the data.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>enforceSchema</code></td>
+    <td>true</td>
+    <td>If it is set to <code>true</code>, the specified or inferred schema 
will be forcibly applied to datasource files, and headers in CSV files will be 
ignored. If the option is set to <code>false</code>, the schema will be 
validated against all headers in CSV files in the case when the 
<code>header</code> option is set to <code>true</code>. Field names in the 
schema and column names in CSV headers are checked by their positions taking 
into account <code>spark.sql.caseSensitive</code> [...]
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>ignoreLeadingWhiteSpace</code></td>
+    <td><code>false</code> (for reading), <code>true</code> (for writing)</td>
+    <td>A flag indicating whether or not leading whitespaces from values being 
read/written should be skipped.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>ignoreTrailingWhiteSpace</code></td>
+    <td><code>false</code> (for reading), <code>true</code> (for writing)</td>
+    <td>A flag indicating whether or not trailing whitespaces from values 
being read/written should be skipped.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>nullValue</code></td>
+    <td></td>
+    <td>Sets the string representation of a null value. Since 2.0.1, this 
<code>nullValue</code> param applies to all supported types including the 
string type.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>nanValue</code></td>
+    <td>NaN</td>
+    <td>Sets the string representation of a non-number value.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>positiveInf</code></td>
+    <td>Inf</td>
+    <td>Sets the string representation of a positive infinity value.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>negativeInf</code></td>
+    <td>-Inf</td>
+    <td>Sets the string representation of a negative infinity value.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>dateFormat</code></td>
+    <td>yyyy-MM-dd</td>
+    <td>Sets the string that indicates a date format. Custom date formats 
follow the formats at <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
 Patterns</a>. This applies to date type.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>timestampFormat</code></td>
+    <td>yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]</td>
+    <td>Sets the string that indicates a timestamp format. Custom date formats 
follow the formats at <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
 Patterns</a>. This applies to timestamp type.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>maxColumns</code></td>
+    <td>20480</td>
+    <td>Defines a hard limit of how many columns a record can have.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>maxCharsPerColumn</code></td>
+    <td>-1</td>
+    <td>Defines the maximum number of characters allowed for any given value 
being read. By default, it is -1 meaning unlimited length</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>mode</code></td>
+    <td>PERMISSIVE</td>
+    <td>Allows a mode for dealing with corrupt records during parsing. It 
supports the following case-insensitive modes. Note that Spark tries to parse 
only required columns in CSV under column pruning. Therefore, corrupt records 
can be different based on required set of fields. This behavior can be 
controlled by <code>spark.sql.csv.parser.columnPruning.enabled</code> (enabled 
by default).<br>
+    <ul>
+      <li><code>PERMISSIVE</code>: when it meets a corrupted record, puts the 
malformed string into a field configured by 
<code>columnNameOfCorruptRecord</code>, and sets malformed fields to 
<code>null</code>. To keep corrupt records, an user can set a string type field 
named <code>columnNameOfCorruptRecord</code> in an user-defined schema. If a 
schema does not have the field, it drops corrupt records during parsing. A 
record with less/more tokens than schema is not a corrupted record to [...]
+      <li><code>DROPMALFORMED</code>: ignores the whole corrupted records.</li>
+      <li><code>FAILFAST</code>: throws an exception when it meets corrupted 
records.</li>
+    </ul>
+    </td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>columnNameOfCorruptRecord</code></td>
+    <td>The value specified in 
<code>spark.sql.columnNameOfCorruptRecord</code></td>
+    <td>Allows renaming the new field having malformed string created by 
<code>PERMISSIVE</code> mode. This overrides 
<code>spark.sql.columnNameOfCorruptRecord</code>.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>multiLine</code></td>
+    <td>false</td>
+    <td>Parse one record, which may span multiple lines, per file.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>charToEscapeQuoteEscaping</code></td>
+    <td><code>escape</code> or <code>\0</code></td>
+    <td>Sets a single character used for escaping the escape for the quote 
character. The default value is escape character when escape and quote 
characters are different, <code>\0</code> otherwise.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>samplingRatio</code></td>
+    <td>1.0</td>
+    <td>Defines fraction of rows used for schema inferring.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>emptyValue</code></td>
+    <td><code></code> (for reading), <code>""</code> (for writing)</td>
+    <td>Sets the string representation of an empty value.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>locale</code></td>
+    <td>en-US</td>
+    <td>Sets a locale as language tag in IETF BCP 47 format. For instance, 
this is used while parsing dates and timestamps.</td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>lineSep</code></td>
+    <td><code>\r</code>, <code>\r\n</code> and <code>\n</code> (for reading), 
<code>\n</code> (for writing)</td>
+    <td>Defines the line separator that should be used for parsing/writing. 
Maximum length is 1 character.</td>
+    <td>read/write</td>
+  </tr>
+  <tr>
+    <td><code>unescapedQuoteHandling</code></td>
+    <td>STOP_AT_DELIMITER</td>
+    <td>Defines how the CsvParser will handle values with unescaped quotes.<br>
+    <ul>
+      <li><code>STOP_AT_CLOSING_QUOTE</code>: If unescaped quotes are found in 
the input, accumulate the quote character and proceed parsing the value as a 
quoted value, until a closing quote is found.</li>
+      <li><code>BACK_TO_DELIMITER</code>: If unescaped quotes are found in the 
input, consider the value as an unquoted value. This will make the parser 
accumulate all characters of the current parsed value until the delimiter is 
found. If no delimiter is found in the value, the parser will continue 
accumulating characters from the input until a delimiter or line ending is 
found.</li>
+      <li><code>STOP_AT_DELIMITER</code>: If unescaped quotes are found in the 
input, consider the value as an unquoted value. This will make the parser 
accumulate all characters until the delimiter or a line ending is found in the 
input.</li>
+      <li><code>SKIP_VALUE</code>: If unescaped quotes are found in the input, 
the content parsed for the given value will be skipped and the value set in 
nullValue will be produced instead.</li>
+      <li><code>RAISE_ERROR</code>: If unescaped quotes are found in the 
input, a TextParsingException will be thrown.</li>
+    </ul>
+    </td>
+    <td>read</td>
+  </tr>
+  <tr>
+    <td><code>compression</code></td>
+    <td>(none)</td>
+    <td>Compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, 
<code>gzip</code>, <code>lz4</code>, <code>snappy</code> and 
<code>deflate</code>).</td>
+    <td>write</td>
+  </tr>
+</table>
+Other generic options can be found in <a 
href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html";>Generic
 File Source Options</a>.
diff --git a/docs/sql-data-sources-text.md b/docs/sql-data-sources-text.md
index d72b543..fac874a 100644
--- a/docs/sql-data-sources-text.md
+++ b/docs/sql-data-sources-text.md
@@ -45,7 +45,7 @@ Data source options of text can be set via:
   *  `DataFrameWriter`
   *  `DataStreamReader`
   *  `DataStreamWriter`
-  *  `OPTIONS` clause at [CREATE TABLE USING 
DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+*  `OPTIONS` clause at [CREATE TABLE USING 
DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
 
 <table class="table">
   <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 7719d48..f9e3734 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -195,9 +195,11 @@ class DataFrameReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df1 = spark.read.json('python/test_support/sql/people.json')
@@ -273,9 +275,11 @@ class DataFrameReader(OptionUtils):
         ----------------
         **options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df = 
spark.read.parquet('python/test_support/sql/parquet_partitioned')
@@ -318,9 +322,11 @@ class DataFrameReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df = spark.read.text('python/test_support/sql/text-test.txt')
@@ -364,172 +370,15 @@ class DataFrameReader(OptionUtils):
         schema : :class:`pyspark.sql.types.StructType` or str, optional
             an optional :class:`pyspark.sql.types.StructType` for the input 
schema
             or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``).
-        sep : str, optional
-            sets a separator (one or more characters) for each field and 
value. If None is
-            set, it uses the default value, ``,``.
-        encoding : str, optional
-            decodes the CSV files by the given encoding type. If None is set,
-            it uses the default value, ``UTF-8``.
-        quote : str, optional
-            sets a single character used for escaping quoted values where the
-            separator can be part of the value. If None is set, it uses the 
default
-            value, ``"``. If you would like to turn off quotations, you need 
to set an
-            empty string.
-        escape : str, optional
-            sets a single character used for escaping quotes inside an already
-            quoted value. If None is set, it uses the default value, ``\``.
-        comment : str, optional
-            sets a single character used for skipping lines beginning with this
-            character. By default (None), it is disabled.
-        header : str or bool, optional
-            uses the first line as names of columns. If None is set, it uses 
the
-            default value, ``false``.
-
-            .. note:: if the given path is a RDD of Strings, this header
-                option will remove all lines same with the header if exists.
-
-        inferSchema : str or bool, optional
-            infers the input schema automatically from data. It requires one 
extra
-            pass over the data. If None is set, it uses the default value, 
``false``.
-        enforceSchema : str or bool, optional
-            If it is set to ``true``, the specified or inferred schema will be
-            forcibly applied to datasource files, and headers in CSV files 
will be
-            ignored. If the option is set to ``false``, the schema will be
-            validated against all headers in CSV files or the first header in 
RDD
-            if the ``header`` option is set to ``true``. Field names in the 
schema
-            and column names in CSV headers are checked by their positions
-            taking into account ``spark.sql.caseSensitive``. If None is set,
-            ``true`` is used by default. Though the default value is ``true``,
-            it is recommended to disable the ``enforceSchema`` option
-            to avoid incorrect results.
-        ignoreLeadingWhiteSpace : str or bool, optional
-            A flag indicating whether or not leading whitespaces from
-            values being read should be skipped. If None is set, it
-            uses the default value, ``false``.
-        ignoreTrailingWhiteSpace : str or bool, optional
-            A flag indicating whether or not trailing whitespaces from
-            values being read should be skipped. If None is set, it
-            uses the default value, ``false``.
-        nullValue : str, optional
-            sets the string representation of a null value. If None is set, it 
uses
-            the default value, empty string. Since 2.0.1, this ``nullValue`` 
param
-            applies to all supported types including the string type.
-        nanValue : str, optional
-            sets the string representation of a non-number value. If None is 
set, it
-            uses the default value, ``NaN``.
-        positiveInf : str, optional
-            sets the string representation of a positive infinity value. If 
None
-            is set, it uses the default value, ``Inf``.
-        negativeInf : str, optional
-            sets the string representation of a negative infinity value. If 
None
-            is set, it uses the default value, ``Inf``.
-        dateFormat : str, optional
-            sets the string that indicates a date format. Custom date formats
-            follow the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to date type. If None is set, it uses the
-            default value, ``yyyy-MM-dd``.
-        timestampFormat : str, optional
-            sets the string that indicates a timestamp format.
-            Custom date formats follow the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to timestamp type. If None is set, it uses the
-            default value, ``yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]``.
-        maxColumns : str or int, optional
-            defines a hard limit of how many columns a record can have. If 
None is
-            set, it uses the default value, ``20480``.
-        maxCharsPerColumn : str or int, optional
-            defines the maximum number of characters allowed for any given
-            value being read. If None is set, it uses the default value,
-            ``-1`` meaning unlimited length.
-        maxMalformedLogPerPartition : str or int, optional
-            this parameter is no longer used since Spark 2.2.0.
-            If specified, it is ignored.
-        mode : str, optional
-            allows a mode for dealing with corrupt records during parsing. If 
None is
-            set, it uses the default value, ``PERMISSIVE``. Note that Spark 
tries to
-            parse only required columns in CSV under column pruning. 
Therefore, corrupt
-            records can be different based on required set of fields. This 
behavior can
-            be controlled by ``spark.sql.csv.parser.columnPruning.enabled``
-            (enabled by default).
-
-            * ``PERMISSIVE``: when it meets a corrupted record, puts the 
malformed string \
-              into a field configured by ``columnNameOfCorruptRecord``, and 
sets malformed \
-              fields to ``null``. To keep corrupt records, an user can set a 
string type \
-              field named ``columnNameOfCorruptRecord`` in an user-defined 
schema. If a \
-              schema does not have the field, it drops corrupt records during 
parsing. \
-              A record with less/more tokens than schema is not a corrupted 
record to CSV. \
-              When it meets a record having fewer tokens than the length of 
the schema, \
-              sets ``null`` to extra fields. When the record has more tokens 
than the \
-              length of the schema, it drops extra tokens.
-            * ``DROPMALFORMED``: ignores the whole corrupted records.
-            * ``FAILFAST``: throws an exception when it meets corrupted 
records.
-
-        columnNameOfCorruptRecord : str, optional
-            allows renaming the new field having malformed string
-            created by ``PERMISSIVE`` mode. This overrides
-            ``spark.sql.columnNameOfCorruptRecord``. If None is set,
-            it uses the value specified in
-            ``spark.sql.columnNameOfCorruptRecord``.
-        multiLine : str or bool, optional
-            parse records, which may span multiple lines. If None is
-            set, it uses the default value, ``false``.
-        charToEscapeQuoteEscaping : str, optional
-            sets a single character used for escaping the escape for
-            the quote character. If None is set, the default value is
-            escape character when escape and quote characters are
-            different, ``\0`` otherwise.
-        samplingRatio : str or float, optional
-            defines fraction of rows used for schema inferring.
-            If None is set, it uses the default value, ``1.0``.
-        emptyValue : str, optional
-            sets the string representation of an empty value. If None is set, 
it uses
-            the default value, empty string.
-        locale : str, optional
-            sets a locale as language tag in IETF BCP 47 format. If None is 
set,
-            it uses the default value, ``en-US``. For instance, ``locale`` is 
used while
-            parsing dates and timestamps.
-        lineSep : str, optional
-            defines the line separator that should be used for parsing. If 
None is
-            set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
-            Maximum length is 1 character.
-        pathGlobFilter : str or bool, optional
-            an optional glob pattern to only include files with paths matching
-            the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
-            It does not change the behavior of
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
-        recursiveFileLookup : str or bool, optional
-            recursively scan a directory for files. Using this option disables
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
-
-            modification times occurring before the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-        modifiedBefore (batch only) : an optional timestamp to only include 
files with
-            modification times occurring before the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-        modifiedAfter (batch only) : an optional timestamp to only include 
files with
-            modification times occurring after the specified time. The 
provided timestamp
-            must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-        unescapedQuoteHandling : str, optional
-            defines how the CsvParser will handle values with unescaped 
quotes. If None is
-            set, it uses the default value, ``STOP_AT_DELIMITER``.
-
-            * ``STOP_AT_CLOSING_QUOTE``: If unescaped quotes are found in the 
input, accumulate
-              the quote character and proceed parsing the value as a quoted 
value, until a closing
-              quote is found.
-            * ``BACK_TO_DELIMITER``: If unescaped quotes are found in the 
input, consider the value
-              as an unquoted value. This will make the parser accumulate all 
characters of the current
-              parsed value until the delimiter is found. If no delimiter is 
found in the value, the
-              parser will continue accumulating characters from the input 
until a delimiter or line
-              ending is found.
-            * ``STOP_AT_DELIMITER``: If unescaped quotes are found in the 
input, consider the value
-              as an unquoted value. This will make the parser accumulate all 
characters until the
-              delimiter or a line ending is found in the input.
-            * ``SKIP_VALUE``: If unescaped quotes are found in the input, the 
content parsed
-              for the given value will be skipped and the value set in 
nullValue will be produced
-              instead.
-            * ``RAISE_ERROR``: If unescaped quotes are found in the input, a 
TextParsingException
-              will be thrown.
+
+        Other Parameters
+        ----------------
+        Extra options
+            For the extra options, refer to
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option>`_
+            in the version you use.
+
+            .. # noqa
 
         Examples
         --------
@@ -595,9 +444,11 @@ class DataFrameReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df = spark.read.orc('python/test_support/sql/orc_partitioned')
@@ -963,9 +814,11 @@ class DataFrameWriter(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))
@@ -1000,9 +853,11 @@ class DataFrameWriter(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))
@@ -1028,9 +883,11 @@ class DataFrameWriter(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         The DataFrame must have only one column that is of string type.
         Each row becomes a new line in the output file.
         """
@@ -1058,68 +915,14 @@ class DataFrameWriter(OptionUtils):
             * ``error`` or ``errorifexists`` (default case): Throw an 
exception if data already \
                 exists.
 
-        compression : str, optional
-            compression codec to use when saving to file. This can be one of 
the
-            known case-insensitive shorten names (none, bzip2, gzip, lz4,
-            snappy and deflate).
-        sep : str, optional
-            sets a separator (one or more characters) for each field and 
value. If None is
-            set, it uses the default value, ``,``.
-        quote : str, optional
-            sets a single character used for escaping quoted values where the
-            separator can be part of the value. If None is set, it uses the 
default
-            value, ``"``. If an empty string is set, it uses ``u0000`` (null 
character).
-        escape : str, optional
-            sets a single character used for escaping quotes inside an already
-            quoted value. If None is set, it uses the default value, ``\``
-        escapeQuotes : str or bool, optional
-            a flag indicating whether values containing quotes should always
-            be enclosed in quotes. If None is set, it uses the default value
-            ``true``, escaping all values containing a quote character.
-        quoteAll : str or bool, optional
-            a flag indicating whether all values should always be enclosed in
-            quotes. If None is set, it uses the default value ``false``,
-            only escaping values containing a quote character.
-        header : str or bool, optional
-            writes the names of columns as the first line. If None is set, it 
uses
-            the default value, ``false``.
-        nullValue : str, optional
-            sets the string representation of a null value. If None is set, it 
uses
-            the default value, empty string.
-        dateFormat : str, optional
-            sets the string that indicates a date format. Custom date formats 
follow
-            the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to date type. If None is set, it uses the
-            default value, ``yyyy-MM-dd``.
-        timestampFormat : str, optional
-            sets the string that indicates a timestamp format.
-            Custom date formats follow the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to timestamp type. If None is set, it uses the
-            default value, ``yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]``.
-        ignoreLeadingWhiteSpace : str or bool, optional
-            a flag indicating whether or not leading whitespaces from
-            values being written should be skipped. If None is set, it
-            uses the default value, ``true``.
-        ignoreTrailingWhiteSpace : str or bool, optional
-            a flag indicating whether or not trailing whitespaces from
-            values being written should be skipped. If None is set, it
-            uses the default value, ``true``.
-        charToEscapeQuoteEscaping : str, optional
-            sets a single character used for escaping the escape for
-            the quote character. If None is set, the default value is
-            escape character when escape and quote characters are
-            different, ``\0`` otherwise..
-        encoding : str, optional
-            sets the encoding (charset) of saved csv files. If None is set,
-            the default UTF-8 charset will be used.
-        emptyValue : str, optional
-            sets the string representation of an empty value. If None is set, 
it uses
-            the default value, ``""``.
-        lineSep : str, optional
-            defines the line separator that should be used for writing. If 
None is
-            set, it uses the default value, ``\\n``. Maximum length is 1 
character.
+        Other Parameters
+        ----------------
+        Extra options
+            For the extra options, refer to
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option>`_
+            in the version you use.
+
+            .. # noqa
 
         Examples
         --------
@@ -1159,9 +962,11 @@ class DataFrameWriter(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')
diff --git a/python/pyspark/sql/streaming.py b/python/pyspark/sql/streaming.py
index f7ec69a..08c8934 100644
--- a/python/pyspark/sql/streaming.py
+++ b/python/pyspark/sql/streaming.py
@@ -484,9 +484,11 @@ class DataStreamReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Notes
         -----
         This API is evolving.
@@ -524,9 +526,11 @@ class DataStreamReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> orc_sdf = 
spark.readStream.schema(sdf_schema).orc(tempfile.mkdtemp())
@@ -558,9 +562,11 @@ class DataStreamReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_.
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option>`_.
             in the version you use.
 
+            .. # noqa
+
         Examples
         --------
         >>> parquet_sdf = 
spark.readStream.schema(sdf_schema).parquet(tempfile.mkdtemp())
@@ -598,9 +604,11 @@ class DataStreamReader(OptionUtils):
         ----------------
         Extra options
             For the extra options, refer to
-            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
  # noqa
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option>`_
             in the version you use.
 
+            .. # noqa
+
         Notes
         -----
         This API is evolving.
@@ -642,154 +650,18 @@ class DataStreamReader(OptionUtils):
         schema : :class:`pyspark.sql.types.StructType` or str, optional
             an optional :class:`pyspark.sql.types.StructType` for the input 
schema
             or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``).
-        sep : str, optional
-            sets a separator (one or more characters) for each field and 
value. If None is
-            set, it uses the default value, ``,``.
-        encoding : str, optional
-            decodes the CSV files by the given encoding type. If None is set,
-            it uses the default value, ``UTF-8``.
-        quote : str, optional sets a single character used for escaping quoted 
values where the
-            separator can be part of the value. If None is set, it uses the 
default
-            value, ``"``. If you would like to turn off quotations, you need 
to set an
-            empty string.
-        escape : str, optional
-            sets a single character used for escaping quotes inside an already
-            quoted value. If None is set, it uses the default value, ``\``.
-        comment : str, optional
-            sets a single character used for skipping lines beginning with this
-            character. By default (None), it is disabled.
-        header : str or bool, optional
-            uses the first line as names of columns. If None is set, it uses 
the
-            default value, ``false``.
-        inferSchema : str or bool, optional
-            infers the input schema automatically from data. It requires one 
extra
-            pass over the data. If None is set, it uses the default value, 
``false``.
-        enforceSchema : str or bool, optional
-            If it is set to ``true``, the specified or inferred schema will be
-            forcibly applied to datasource files, and headers in CSV files 
will be
-            ignored. If the option is set to ``false``, the schema will be
-            validated against all headers in CSV files or the first header in 
RDD
-            if the ``header`` option is set to ``true``. Field names in the 
schema
-            and column names in CSV headers are checked by their positions
-            taking into account ``spark.sql.caseSensitive``. If None is set,
-            ``true`` is used by default. Though the default value is ``true``,
-            it is recommended to disable the ``enforceSchema`` option
-            to avoid incorrect results.
-        ignoreLeadingWhiteSpace : str or bool, optional
-            a flag indicating whether or not leading whitespaces from
-            values being read should be skipped. If None is set, it
-            uses the default value, ``false``.
-        ignoreTrailingWhiteSpace : str or bool, optional
-            a flag indicating whether or not trailing whitespaces from
-            values being read should be skipped. If None is set, it
-            uses the default value, ``false``.
-        nullValue : str, optional
-            sets the string representation of a null value. If None is set, it 
uses
-            the default value, empty string. Since 2.0.1, this ``nullValue`` 
param
-            applies to all supported types including the string type.
-        nanValue : str, optional
-            sets the string representation of a non-number value. If None is 
set, it
-            uses the default value, ``NaN``.
-        positiveInf : str, optional
-            sets the string representation of a positive infinity value. If 
None
-            is set, it uses the default value, ``Inf``.
-        negativeInf : str, optional
-            sets the string representation of a negative infinity value. If 
None
-            is set, it uses the default value, ``Inf``.
-        dateFormat : str, optional
-            sets the string that indicates a date format. Custom date formats
-            follow the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to date type. If None is set, it uses the
-            default value, ``yyyy-MM-dd``.
-        timestampFormat : str, optional
-            sets the string that indicates a timestamp format.
-            Custom date formats follow the formats at
-            `datetime pattern 
<https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html>`_.  # noqa
-            This applies to timestamp type. If None is set, it uses the
-            default value, ``yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]``.
-        maxColumns : str or int, optional
-            defines a hard limit of how many columns a record can have. If 
None is
-            set, it uses the default value, ``20480``.
-        maxCharsPerColumn : str or int, optional
-            defines the maximum number of characters allowed for any given
-            value being read. If None is set, it uses the default value,
-            ``-1`` meaning unlimited length.
-        maxMalformedLogPerPartition : str or int, optional
-            this parameter is no longer used since Spark 2.2.0.
-            If specified, it is ignored.
-        mode : str, optional
-            allows a mode for dealing with corrupt records during parsing. If 
None is
-            set, it uses the default value, ``PERMISSIVE``.
-
-            * ``PERMISSIVE``: when it meets a corrupted record, puts the 
malformed string \
-              into a field configured by ``columnNameOfCorruptRecord``, and 
sets malformed \
-              fields to ``null``. To keep corrupt records, an user can set a 
string type \
-              field named ``columnNameOfCorruptRecord`` in an user-defined 
schema. If a \
-              schema does not have the field, it drops corrupt records during 
parsing. \
-              A record with less/more tokens than schema is not a corrupted 
record to CSV. \
-              When it meets a record having fewer tokens than the length of 
the schema, \
-              sets ``null`` to extra fields. When the record has more tokens 
than the \
-              length of the schema, it drops extra tokens.
-            * ``DROPMALFORMED``: ignores the whole corrupted records.
-            * ``FAILFAST``: throws an exception when it meets corrupted 
records.
-
-        columnNameOfCorruptRecord : str, optional
-            allows renaming the new field having malformed string
-            created by ``PERMISSIVE`` mode. This overrides
-            ``spark.sql.columnNameOfCorruptRecord``. If None is set,
-            it uses the value specified in
-            ``spark.sql.columnNameOfCorruptRecord``.
-        multiLine : str or bool, optional
-            parse one record, which may span multiple lines. If None is
-            set, it uses the default value, ``false``.
-        charToEscapeQuoteEscaping : str, optional
-            sets a single character used for escaping the escape for
-            the quote character. If None is set, the default value is
-            escape character when escape and quote characters are
-            different, ``\0`` otherwise.
-        emptyValue : str, optional
-            sets the string representation of an empty value. If None is set, 
it uses
-            the default value, empty string.
-        locale : str, optional
-            sets a locale as language tag in IETF BCP 47 format. If None is 
set,
-            it uses the default value, ``en-US``. For instance, ``locale`` is 
used while
-            parsing dates and timestamps.
-        lineSep : str, optional
-            defines the line separator that should be used for parsing. If 
None is
-            set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
-            Maximum length is 1 character.
-        pathGlobFilter : str or bool, optional
-            an optional glob pattern to only include files with paths matching
-            the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
-            It does not change the behavior of
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
-        recursiveFileLookup : str or bool, optional
-            recursively scan a directory for files. Using this option disables
-            `partition discovery 
<https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_.
  # noqa
-        unescapedQuoteHandling : str, optional
-            defines how the CsvParser will handle values with unescaped 
quotes. If None is
-            set, it uses the default value, ``STOP_AT_DELIMITER``.
-
-            * ``STOP_AT_CLOSING_QUOTE``: If unescaped quotes are found in the 
input, accumulate
-              the quote character and proceed parsing the value as a quoted 
value, until a closing
-              quote is found.
-            * ``BACK_TO_DELIMITER``: If unescaped quotes are found in the 
input, consider the value
-              as an unquoted value. This will make the parser accumulate all 
characters of the current
-              parsed value until the delimiter is found. If no delimiter is 
found in the value, the
-              parser will continue accumulating characters from the input 
until a delimiter or line
-              ending is found.
-            * ``STOP_AT_DELIMITER``: If unescaped quotes are found in the 
input, consider the value
-              as an unquoted value. This will make the parser accumulate all 
characters until the
-              delimiter or a line ending is found in the input.
-            * ``SKIP_VALUE``: If unescaped quotes are found in the input, the 
content parsed
-              for the given value will be skipped and the value set in 
nullValue will be produced
-              instead.
-            * ``RAISE_ERROR``: If unescaped quotes are found in the input, a 
TextParsingException
-              will be thrown.
 
         .. versionadded:: 2.0.0
 
+        Other Parameters
+        ----------------
+        Extra options
+            For the extra options, refer to
+            `Data Source Option 
<https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option>`_
+            in the version you use.
+
+            .. # noqa
+
         Notes
         -----
         This API is evolving.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index ea84785..8a066bf 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -556,119 +556,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
    * is enabled. To avoid going through the entire data once, disable 
`inferSchema` option or
    * specify the schema explicitly using `schema`.
    *
-   * You can set the following CSV-specific options to deal with CSV files:
-   * <ul>
-   * <li>`sep` (default `,`): sets a separator for each field and value. This 
separator can be one
-   * or more characters.</li>
-   * <li>`encoding` (default `UTF-8`): decodes the CSV files by the given 
encoding
-   * type.</li>
-   * <li>`quote` (default `"`): sets a single character used for escaping 
quoted values where
-   * the separator can be part of the value. If you would like to turn off 
quotations, you need to
-   * set not `null` but an empty string. This behaviour is different from
-   * `com.databricks.spark.csv`.</li>
-   * <li>`escape` (default `\`): sets a single character used for escaping 
quotes inside
-   * an already quoted value.</li>
-   * <li>`charToEscapeQuoteEscaping` (default `escape` or `\0`): sets a single 
character used for
-   * escaping the escape for the quote character. The default value is escape 
character when escape
-   * and quote characters are different, `\0` otherwise.</li>
-   * <li>`comment` (default empty string): sets a single character used for 
skipping lines
-   * beginning with this character. By default, it is disabled.</li>
-   * <li>`header` (default `false`): uses the first line as names of 
columns.</li>
-   * <li>`enforceSchema` (default `true`): If it is set to `true`, the 
specified or inferred schema
-   * will be forcibly applied to datasource files, and headers in CSV files 
will be ignored.
-   * If the option is set to `false`, the schema will be validated against all 
headers in CSV files
-   * in the case when the `header` option is set to `true`. Field names in the 
schema
-   * and column names in CSV headers are checked by their positions taking 
into account
-   * `spark.sql.caseSensitive`. Though the default value is true, it is 
recommended to disable
-   * the `enforceSchema` option to avoid incorrect results.</li>
-   * <li>`inferSchema` (default `false`): infers the input schema 
automatically from data. It
-   * requires one extra pass over the data.</li>
-   * <li>`samplingRatio` (default is 1.0): defines fraction of rows used for 
schema inferring.</li>
-   * <li>`ignoreLeadingWhiteSpace` (default `false`): a flag indicating 
whether or not leading
-   * whitespaces from values being read should be skipped.</li>
-   * <li>`ignoreTrailingWhiteSpace` (default `false`): a flag indicating 
whether or not trailing
-   * whitespaces from values being read should be skipped.</li>
-   * <li>`nullValue` (default empty string): sets the string representation of 
a null value. Since
-   * 2.0.1, this applies to all supported types including the string type.</li>
-   * <li>`emptyValue` (default empty string): sets the string representation 
of an empty value.</li>
-   * <li>`nanValue` (default `NaN`): sets the string representation of a 
non-number" value.</li>
-   * <li>`positiveInf` (default `Inf`): sets the string representation of a 
positive infinity
-   * value.</li>
-   * <li>`negativeInf` (default `-Inf`): sets the string representation of a 
negative infinity
-   * value.</li>
-   * <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a 
date format.
-   * Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to date type.</li>
-   * <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]`): sets 
the string that
-   * indicates a timestamp format. Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to timestamp type.</li>
-   * <li>`maxColumns` (default `20480`): defines a hard limit of how many 
columns
-   * a record can have.</li>
-   * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of 
characters allowed
-   * for any given value being read. By default, it is -1 meaning unlimited 
length</li>
-   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how 
the CsvParser
-   * will handle values with unescaped quotes.
-   *   <ul>
-   *     <li>`STOP_AT_CLOSING_QUOTE`: If unescaped quotes are found in the 
input, accumulate
-   *     the quote character and proceed parsing the value as a quoted value, 
until a closing
-   *     quote is found.</li>
-   *     <li>`BACK_TO_DELIMITER`: If unescaped quotes are found in the input, 
consider the value
-   *     as an unquoted value. This will make the parser accumulate all 
characters of the current
-   *     parsed value until the delimiter is found. If no
-   *     delimiter is found in the value, the parser will continue 
accumulating characters from
-   *     the input until a delimiter or line ending is found.</li>
-   *     <li>`STOP_AT_DELIMITER`: If unescaped quotes are found in the input, 
consider the value
-   *     as an unquoted value. This will make the parser accumulate all 
characters until the
-   *     delimiter or a line ending is found in the input.</li>
-   *     <li>`SKIP_VALUE`: If unescaped quotes are found in the input, the 
content parsed
-   *     for the given value will be skipped and the value set in nullValue 
will be produced
-   *     instead.</li>
-   *     <li>`RAISE_ERROR`: If unescaped quotes are found in the input, a 
TextParsingException
-   *     will be thrown.</li>
-   *   </ul>
-   * </li>
-   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt 
records
-   *    during parsing. It supports the following case-insensitive modes. Note 
that Spark tries
-   *    to parse only required columns in CSV under column pruning. Therefore, 
corrupt records
-   *    can be different based on required set of fields. This behavior can be 
controlled by
-   *    `spark.sql.csv.parser.columnPruning.enabled` (enabled by default).
-   *   <ul>
-   *     <li>`PERMISSIVE` : when it meets a corrupted record, puts the 
malformed string into a
-   *     field configured by `columnNameOfCorruptRecord`, and sets malformed 
fields to `null`.
-   *     To keep corrupt records, an user can set a string type field named
-   *     `columnNameOfCorruptRecord` in an user-defined schema. If a schema 
does not have
-   *     the field, it drops corrupt records during parsing. A record with 
less/more tokens
-   *     than schema is not a corrupted record to CSV. When it meets a record 
having fewer
-   *     tokens than the length of the schema, sets `null` to extra fields. 
When the record
-   *     has more tokens than the length of the schema, it drops extra 
tokens.</li>
-   *     <li>`DROPMALFORMED` : ignores the whole corrupted records.</li>
-   *     <li>`FAILFAST` : throws an exception when it meets corrupted 
records.</li>
-   *   </ul>
-   * </li>
-   * <li>`columnNameOfCorruptRecord` (default is the value specified in
-   * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field 
having malformed string
-   * created by `PERMISSIVE` mode. This overrides 
`spark.sql.columnNameOfCorruptRecord`.</li>
-   * <li>`multiLine` (default `false`): parse one record, which may span 
multiple lines.</li>
-   * <li>`locale` (default is `en-US`): sets a locale as language tag in IETF 
BCP 47 format.
-   * For instance, this is used while parsing dates and timestamps.</li>
-   * <li>`lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the 
line separator
-   * that should be used for parsing. Maximum length is 1 character.</li>
-   * <li>`pathGlobFilter`: an optional glob pattern to only include files with 
paths matching
-   * the pattern. The syntax follows 
<code>org.apache.hadoop.fs.GlobFilter</code>.
-   * It does not change the behavior of partition discovery.</li>
-   * <li>`modifiedBefore` (batch only): an optional timestamp to only include 
files with
-   * modification times  occurring before the specified Time. The provided 
timestamp
-   * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)</li>
-   * <li>`modifiedAfter` (batch only): an optional timestamp to only include 
files with
-   * modification times occurring after the specified Time. The provided 
timestamp
-   * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)</li>
-   * <li>`recursiveFileLookup`: recursively scan a directory for files. Using 
this option
-   * disables partition discovery</li>
-   * </ul>
+   * You can find the CSV-specific options for reading CSV files in
+   * <a 
href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *   Data Source Option</a> in the version you use.
    *
    * @since 2.0.0
    */
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index cb10295..a8af7c8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -850,48 +850,9 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
    *   format("csv").save(path)
    * }}}
    *
-   * You can set the following CSV-specific option(s) for writing CSV files:
-   * <ul>
-   * <li>`sep` (default `,`): sets a single character as a separator for each
-   * field and value.</li>
-   * <li>`quote` (default `"`): sets a single character used for escaping 
quoted values where
-   * the separator can be part of the value. If an empty string is set, it 
uses `u0000`
-   * (null character).</li>
-   * <li>`escape` (default `\`): sets a single character used for escaping 
quotes inside
-   * an already quoted value.</li>
-   * <li>`charToEscapeQuoteEscaping` (default `escape` or `\0`): sets a single 
character used for
-   * escaping the escape for the quote character. The default value is escape 
character when escape
-   * and quote characters are different, `\0` otherwise.</li>
-   * <li>`escapeQuotes` (default `true`): a flag indicating whether values 
containing
-   * quotes should always be enclosed in quotes. Default is to escape all 
values containing
-   * a quote character.</li>
-   * <li>`quoteAll` (default `false`): a flag indicating whether all values 
should always be
-   * enclosed in quotes. Default is to only escape values containing a quote 
character.</li>
-   * <li>`header` (default `false`): writes the names of columns as the first 
line.</li>
-   * <li>`nullValue` (default empty string): sets the string representation of 
a null value.</li>
-   * <li>`emptyValue` (default `""`): sets the string representation of an 
empty value.</li>
-   * <li>`encoding` (by default it is not set): specifies encoding (charset) 
of saved csv
-   * files. If it is not set, the UTF-8 charset will be used.</li>
-   * <li>`compression` (default `null`): compression codec to use when saving 
to file. This can be
-   * one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, 
`lz4`,
-   * `snappy` and `deflate`). </li>
-   * <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a 
date format.
-   * Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to date type.</li>
-   * <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]`): sets 
the string that
-   * indicates a timestamp format. Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to timestamp type.</li>
-   * <li>`ignoreLeadingWhiteSpace` (default `true`): a flag indicating whether 
or not leading
-   * whitespaces from values being written should be skipped.</li>
-   * <li>`ignoreTrailingWhiteSpace` (default `true`): a flag indicating 
defines whether or not
-   * trailing whitespaces from values being written should be skipped.</li>
-   * <li>`lineSep` (default `\n`): defines the line separator that should be 
used for writing.
-   * Maximum length is 1 character.</li>
-   * </ul>
+   * You can find the CSV-specific options for writing CSV files in
+   * <a 
href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *   Data Source Option</a> in the version you use.
    *
    * @since 2.0.0
    */
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 8a278a5..c446d6b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4607,6 +4607,7 @@ object functions {
   @scala.annotation.varargs
   def map_concat(cols: Column*): Column = withExpr { 
MapConcat(cols.map(_.expr)) }
 
+  // scalastyle:off line.size.limit
   /**
    * Parses a column containing a CSV string into a `StructType` with the 
specified schema.
    * Returns `null`, in the case of an unparseable string.
@@ -4615,15 +4616,21 @@ object functions {
    * @param schema the schema to use when parsing the CSV string
    * @param options options to control how the CSV is parsed. accepts the same 
options and the
    *                CSV data source.
+   *                See
+   *                <a href=
+   *                  
"https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *                  Data Source Option</a> in the version you use.
    *
    * @group collection_funcs
    * @since 3.0.0
    */
+  // scalastyle:on line.size.limit
   def from_csv(e: Column, schema: StructType, options: Map[String, String]): 
Column = withExpr {
     val replaced = 
CharVarcharUtils.failIfHasCharVarchar(schema).asInstanceOf[StructType]
     CsvToStructs(replaced, options, e.expr)
   }
 
+  // scalastyle:off line.size.limit
   /**
    * (Java-specific) Parses a column containing a CSV string into a 
`StructType`
    * with the specified schema. Returns `null`, in the case of an unparseable 
string.
@@ -4632,10 +4639,15 @@ object functions {
    * @param schema the schema to use when parsing the CSV string
    * @param options options to control how the CSV is parsed. accepts the same 
options and the
    *                CSV data source.
+   *                See
+   *                <a href=
+   *                  
"https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *                  Data Source Option</a> in the version you use.
    *
    * @group collection_funcs
    * @since 3.0.0
    */
+  // scalastyle:on line.size.limit
   def from_csv(e: Column, schema: Column, options: java.util.Map[String, 
String]): Column = {
     withExpr(new CsvToStructs(e.expr, schema.expr, options.asScala.toMap))
   }
@@ -4660,32 +4672,44 @@ object functions {
    */
   def schema_of_csv(csv: Column): Column = withExpr(new SchemaOfCsv(csv.expr))
 
+  // scalastyle:off line.size.limit
   /**
    * Parses a CSV string and infers its schema in DDL format using options.
    *
    * @param csv a foldable string column containing a CSV string.
    * @param options options to control how the CSV is parsed. accepts the same 
options and the
-   *                json data source. See [[DataFrameReader#csv]].
+   *                CSV data source.
+   *                See
+   *                <a href=
+   *                  
"https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *                  Data Source Option</a> in the version you use.
    * @return a column with string literal containing schema in DDL format.
    *
    * @group collection_funcs
    * @since 3.0.0
    */
+  // scalastyle:on line.size.limit
   def schema_of_csv(csv: Column, options: java.util.Map[String, String]): 
Column = {
     withExpr(SchemaOfCsv(csv.expr, options.asScala.toMap))
   }
 
+  // scalastyle:off line.size.limit
   /**
    * (Java-specific) Converts a column containing a `StructType` into a CSV 
string with
    * the specified schema. Throws an exception, in the case of an unsupported 
type.
    *
    * @param e a column containing a struct.
    * @param options options to control how the struct column is converted into 
a CSV string.
-   *                It accepts the same options and the json data source.
+   *                It accepts the same options and the CSV data source.
+   *                See
+   *                <a href=
+   *                  
"https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *                  Data Source Option</a> in the version you use.
    *
    * @group collection_funcs
    * @since 3.0.0
    */
+  // scalastyle:on line.size.limit
   def to_csv(e: Column, options: java.util.Map[String, String]): Column = 
withExpr {
     StructsToCsv(options.asScala.toMap, e.expr)
   }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
index 6c3fbaf..e6e65cd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
@@ -239,105 +239,16 @@ final class DataStreamReader private[sql](sparkSession: 
SparkSession) extends Lo
    * is enabled. To avoid going through the entire data once, disable 
`inferSchema` option or
    * specify the schema explicitly using `schema`.
    *
-   * You can set the following CSV-specific options to deal with CSV files:
+   * You can set the following option(s):
    * <ul>
    * <li>`maxFilesPerTrigger` (default: no max limit): sets the maximum number 
of new files to be
    * considered in every trigger.</li>
-   * <li>`sep` (default `,`): sets a single character as a separator for each
-   * field and value.</li>
-   * <li>`encoding` (default `UTF-8`): decodes the CSV files by the given 
encoding
-   * type.</li>
-   * <li>`quote` (default `"`): sets a single character used for escaping 
quoted values where
-   * the separator can be part of the value. If you would like to turn off 
quotations, you need to
-   * set not `null` but an empty string. This behaviour is different form
-   * `com.databricks.spark.csv`.</li>
-   * <li>`escape` (default `\`): sets a single character used for escaping 
quotes inside
-   * an already quoted value.</li>
-   * <li>`charToEscapeQuoteEscaping` (default `escape` or `\0`): sets a single 
character used for
-   * escaping the escape for the quote character. The default value is escape 
character when escape
-   * and quote characters are different, `\0` otherwise.</li>
-   * <li>`comment` (default empty string): sets a single character used for 
skipping lines
-   * beginning with this character. By default, it is disabled.</li>
-   * <li>`header` (default `false`): uses the first line as names of 
columns.</li>
-   * <li>`inferSchema` (default `false`): infers the input schema 
automatically from data. It
-   * requires one extra pass over the data.</li>
-   * <li>`ignoreLeadingWhiteSpace` (default `false`): a flag indicating 
whether or not leading
-   * whitespaces from values being read should be skipped.</li>
-   * <li>`ignoreTrailingWhiteSpace` (default `false`): a flag indicating 
whether or not trailing
-   * whitespaces from values being read should be skipped.</li>
-   * <li>`nullValue` (default empty string): sets the string representation of 
a null value. Since
-   * 2.0.1, this applies to all supported types including the string type.</li>
-   * <li>`emptyValue` (default empty string): sets the string representation 
of an empty value.</li>
-   * <li>`nanValue` (default `NaN`): sets the string representation of a 
non-number" value.</li>
-   * <li>`positiveInf` (default `Inf`): sets the string representation of a 
positive infinity
-   * value.</li>
-   * <li>`negativeInf` (default `-Inf`): sets the string representation of a 
negative infinity
-   * value.</li>
-   * <li>`dateFormat` (default `yyyy-MM-dd`): sets the string that indicates a 
date format.
-   * Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to date type.</li>
-   * <li>`timestampFormat` (default `yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]`): sets 
the string that
-   * indicates a timestamp format. Custom date formats follow the formats at
-   * <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>
-   *   Datetime Patterns</a>.
-   * This applies to timestamp type.</li>
-   * <li>`maxColumns` (default `20480`): defines a hard limit of how many 
columns
-   * a record can have.</li>
-   * <li>`maxCharsPerColumn` (default `-1`): defines the maximum number of 
characters allowed
-   * for any given value being read. By default, it is -1 meaning unlimited 
length</li>
-   * <li>`unescapedQuoteHandling` (default `STOP_AT_DELIMITER`): defines how 
the CsvParser
-   * will handle values with unescaped quotes.
-   *   <ul>
-   *     <li>`STOP_AT_CLOSING_QUOTE`: If unescaped quotes are found in the 
input, accumulate
-   *     the quote character and proceed parsing the value as a quoted value, 
until a closing
-   *     quote is found.</li>
-   *     <li>`BACK_TO_DELIMITER`: If unescaped quotes are found in the input, 
consider the value
-   *     as an unquoted value. This will make the parser accumulate all 
characters of the current
-   *     parsed value until the delimiter is found. If no delimiter is found 
in the value, the
-   *     parser will continue accumulating characters from the input until a 
delimiter or line
-   *     ending is found.</li>
-   *     <li>`STOP_AT_DELIMITER`: If unescaped quotes are found in the input, 
consider the value
-   *     as an unquoted value. This will make the parser accumulate all 
characters until the
-   *     delimiter or a line ending is found in the input.</li>
-   *     <li>`SKIP_VALUE`: If unescaped quotes are found in the input, the 
content parsed
-   *     for the given value will be skipped and the value set in nullValue 
will be produced
-   *     instead.</li>
-   *     <li>`RAISE_ERROR`: If unescaped quotes are found in the input, a 
TextParsingException
-   *     will be thrown.</li>
-   *   </ul>
-   * </li>
-   * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt 
records
-   *    during parsing. It supports the following case-insensitive modes.
-   *   <ul>
-   *     <li>`PERMISSIVE` : when it meets a corrupted record, puts the 
malformed string into a
-   *     field configured by `columnNameOfCorruptRecord`, and sets malformed 
fields to `null`.
-   *     To keep corrupt records, an user can set a string type field named
-   *     `columnNameOfCorruptRecord` in an user-defined schema. If a schema 
does not have
-   *     the field, it drops corrupt records during parsing. A record with 
less/more tokens
-   *     than schema is not a corrupted record to CSV. When it meets a record 
having fewer
-   *     tokens than the length of the schema, sets `null` to extra fields. 
When the record
-   *     has more tokens than the length of the schema, it drops extra 
tokens.</li>
-   *     <li>`DROPMALFORMED` : ignores the whole corrupted records.</li>
-   *     <li>`FAILFAST` : throws an exception when it meets corrupted 
records.</li>
-   *   </ul>
-   * </li>
-   * <li>`columnNameOfCorruptRecord` (default is the value specified in
-   * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field 
having malformed string
-   * created by `PERMISSIVE` mode. This overrides 
`spark.sql.columnNameOfCorruptRecord`.</li>
-   * <li>`multiLine` (default `false`): parse one record, which may span 
multiple lines.</li>
-   * <li>`locale` (default is `en-US`): sets a locale as language tag in IETF 
BCP 47 format.
-   * For instance, this is used while parsing dates and timestamps.</li>
-   * <li>`lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the 
line separator
-   * that should be used for parsing. Maximum length is 1 character.</li>
-   * <li>`pathGlobFilter`: an optional glob pattern to only include files with 
paths matching
-   * the pattern. The syntax follows 
<code>org.apache.hadoop.fs.GlobFilter</code>.
-   * It does not change the behavior of partition discovery.</li>
-   * <li>`recursiveFileLookup`: recursively scan a directory for files. Using 
this option
-   * disables partition discovery</li>
    * </ul>
    *
+   * You can find the CSV-specific options for reading CSV file stream in
+   * <a 
href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option";>
+   *   Data Source Option</a> in the version you use.
+   *
    * @since 2.0.0
    */
   def csv(path: String): DataFrame = format("csv").load(path)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

Reply via email to