Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22746#discussion_r225921716
  
    --- Diff: docs/sql-reference.md ---
    @@ -0,0 +1,641 @@
    +---
    +layout: global
    +title: Reference
    +displayTitle: Reference
    +---
    +
    +* Table of contents
    +{:toc}
    +
    +## Data Types
    +
    +Spark SQL and DataFrames support the following data types:
    +
    +* Numeric types
    +  - `ByteType`: Represents 1-byte signed integer numbers.
    +  The range of numbers is from `-128` to `127`.
    +  - `ShortType`: Represents 2-byte signed integer numbers.
    +  The range of numbers is from `-32768` to `32767`.
    +  - `IntegerType`: Represents 4-byte signed integer numbers.
    +  The range of numbers is from `-2147483648` to `2147483647`.
    +  - `LongType`: Represents 8-byte signed integer numbers.
    +  The range of numbers is from `-9223372036854775808` to 
`9223372036854775807`.
    +  - `FloatType`: Represents 4-byte single-precision floating point numbers.
    +  - `DoubleType`: Represents 8-byte double-precision floating point 
numbers.
    +  - `DecimalType`: Represents arbitrary-precision signed decimal numbers. 
Backed internally by `java.math.BigDecimal`. A `BigDecimal` consists of an 
arbitrary precision integer unscaled value and a 32-bit integer scale.
    +* String type
    +  - `StringType`: Represents character string values.
    +* Binary type
    +  - `BinaryType`: Represents byte sequence values.
    +* Boolean type
    +  - `BooleanType`: Represents boolean values.
    +* Datetime type
    +  - `TimestampType`: Represents values comprising values of fields year, 
month, day,
    +  hour, minute, and second.
    +  - `DateType`: Represents values comprising values of fields year, month, 
day.
    +* Complex types
    +  - `ArrayType(elementType, containsNull)`: Represents values comprising a 
sequence of
    +  elements with the type of `elementType`. `containsNull` is used to 
indicate if
    +  elements in a `ArrayType` value can have `null` values.
    +  - `MapType(keyType, valueType, valueContainsNull)`:
    +  Represents values comprising a set of key-value pairs. The data type of 
keys are
    +  described by `keyType` and the data type of values are described by 
`valueType`.
    +  For a `MapType` value, keys are not allowed to have `null` values. 
`valueContainsNull`
    +  is used to indicate if values of a `MapType` value can have `null` 
values.
    +  - `StructType(fields)`: Represents values with the structure described by
    +  a sequence of `StructField`s (`fields`).
    +    * `StructField(name, dataType, nullable)`: Represents a field in a 
`StructType`.
    +    The name of a field is indicated by `name`. The data type of a field 
is indicated
    +    by `dataType`. `nullable` is used to indicate if values of this fields 
can have
    +    `null` values.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +
    +All data types of Spark SQL are located in the package 
`org.apache.spark.sql.types`.
    +You can access them by doing
    +
    +{% include_example data_types 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
    +
    +<table class="table">
    +<tr>
    +  <th style="width:20%">Data type</th>
    +  <th style="width:40%">Value type in Scala</th>
    +  <th>API to access or create a data type</th></tr>
    +<tr>
    +  <td> <b>ByteType</b> </td>
    +  <td> Byte </td>
    +  <td>
    +  ByteType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ShortType</b> </td>
    +  <td> Short </td>
    +  <td>
    +  ShortType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>IntegerType</b> </td>
    +  <td> Int </td>
    +  <td>
    +  IntegerType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>LongType</b> </td>
    +  <td> Long </td>
    +  <td>
    +  LongType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>FloatType</b> </td>
    +  <td> Float </td>
    +  <td>
    +  FloatType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DoubleType</b> </td>
    +  <td> Double </td>
    +  <td>
    +  DoubleType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DecimalType</b> </td>
    +  <td> java.math.BigDecimal </td>
    +  <td>
    +  DecimalType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StringType</b> </td>
    +  <td> String </td>
    +  <td>
    +  StringType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BinaryType</b> </td>
    +  <td> Array[Byte] </td>
    +  <td>
    +  BinaryType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BooleanType</b> </td>
    +  <td> Boolean </td>
    +  <td>
    +  BooleanType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>TimestampType</b> </td>
    +  <td> java.sql.Timestamp </td>
    +  <td>
    +  TimestampType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DateType</b> </td>
    +  <td> java.sql.Date </td>
    +  <td>
    +  DateType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ArrayType</b> </td>
    +  <td> scala.collection.Seq </td>
    +  <td>
    +  ArrayType(<i>elementType</i>, [<i>containsNull</i>])<br />
    +  <b>Note:</b> The default value of <i>containsNull</i> is <i>true</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>MapType</b> </td>
    +  <td> scala.collection.Map </td>
    +  <td>
    +  MapType(<i>keyType</i>, <i>valueType</i>, [<i>valueContainsNull</i>])<br 
/>
    +  <b>Note:</b> The default value of <i>valueContainsNull</i> is 
<i>true</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructType</b> </td>
    +  <td> org.apache.spark.sql.Row </td>
    +  <td>
    +  StructType(<i>fields</i>)<br />
    +  <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields 
with the same
    +  name are not allowed.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructField</b> </td>
    +  <td> The value type in Scala of the data type of this field
    +  (For example, Int for a StructField with the data type IntegerType) </td>
    +  <td>
    +  StructField(<i>name</i>, <i>dataType</i>, [<i>nullable</i>])<br />
    +  <b>Note:</b> The default value of <i>nullable</i> is <i>true</i>.
    +  </td>
    +</tr>
    +</table>
    +
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +
    +All data types of Spark SQL are located in the package of
    +`org.apache.spark.sql.types`. To access or create a data type,
    +please use factory methods provided in
    +`org.apache.spark.sql.types.DataTypes`.
    +
    +<table class="table">
    +<tr>
    +  <th style="width:20%">Data type</th>
    +  <th style="width:40%">Value type in Java</th>
    +  <th>API to access or create a data type</th></tr>
    +<tr>
    +  <td> <b>ByteType</b> </td>
    +  <td> byte or Byte </td>
    +  <td>
    +  DataTypes.ByteType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ShortType</b> </td>
    +  <td> short or Short </td>
    +  <td>
    +  DataTypes.ShortType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>IntegerType</b> </td>
    +  <td> int or Integer </td>
    +  <td>
    +  DataTypes.IntegerType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>LongType</b> </td>
    +  <td> long or Long </td>
    +  <td>
    +  DataTypes.LongType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>FloatType</b> </td>
    +  <td> float or Float </td>
    +  <td>
    +  DataTypes.FloatType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DoubleType</b> </td>
    +  <td> double or Double </td>
    +  <td>
    +  DataTypes.DoubleType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DecimalType</b> </td>
    +  <td> java.math.BigDecimal </td>
    +  <td>
    +  DataTypes.createDecimalType()<br />
    +  DataTypes.createDecimalType(<i>precision</i>, <i>scale</i>).
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StringType</b> </td>
    +  <td> String </td>
    +  <td>
    +  DataTypes.StringType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BinaryType</b> </td>
    +  <td> byte[] </td>
    +  <td>
    +  DataTypes.BinaryType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BooleanType</b> </td>
    +  <td> boolean or Boolean </td>
    +  <td>
    +  DataTypes.BooleanType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>TimestampType</b> </td>
    +  <td> java.sql.Timestamp </td>
    +  <td>
    +  DataTypes.TimestampType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DateType</b> </td>
    +  <td> java.sql.Date </td>
    +  <td>
    +  DataTypes.DateType
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ArrayType</b> </td>
    +  <td> java.util.List </td>
    +  <td>
    +  DataTypes.createArrayType(<i>elementType</i>)<br />
    +  <b>Note:</b> The value of <i>containsNull</i> will be <i>true</i><br />
    +  DataTypes.createArrayType(<i>elementType</i>, <i>containsNull</i>).
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>MapType</b> </td>
    +  <td> java.util.Map </td>
    +  <td>
    +  DataTypes.createMapType(<i>keyType</i>, <i>valueType</i>)<br />
    +  <b>Note:</b> The value of <i>valueContainsNull</i> will be 
<i>true</i>.<br />
    +  DataTypes.createMapType(<i>keyType</i>, <i>valueType</i>, 
<i>valueContainsNull</i>)<br />
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructType</b> </td>
    +  <td> org.apache.spark.sql.Row </td>
    +  <td>
    +  DataTypes.createStructType(<i>fields</i>)<br />
    +  <b>Note:</b> <i>fields</i> is a List or an array of StructFields.
    +  Also, two fields with the same name are not allowed.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructField</b> </td>
    +  <td> The value type in Java of the data type of this field
    +  (For example, int for a StructField with the data type IntegerType) </td>
    +  <td>
    +  DataTypes.createStructField(<i>name</i>, <i>dataType</i>, 
<i>nullable</i>)
    +  </td>
    +</tr>
    +</table>
    +
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +
    +All data types of Spark SQL are located in the package of 
`pyspark.sql.types`.
    +You can access them by doing
    +{% highlight python %}
    +from pyspark.sql.types import *
    +{% endhighlight %}
    +
    +<table class="table">
    +<tr>
    +  <th style="width:20%">Data type</th>
    +  <th style="width:40%">Value type in Python</th>
    +  <th>API to access or create a data type</th></tr>
    +<tr>
    +  <td> <b>ByteType</b> </td>
    +  <td>
    +  int or long <br />
    +  <b>Note:</b> Numbers will be converted to 1-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of -128 to 127.
    +  </td>
    +  <td>
    +  ByteType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ShortType</b> </td>
    +  <td>
    +  int or long <br />
    +  <b>Note:</b> Numbers will be converted to 2-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of -32768 to 32767.
    +  </td>
    +  <td>
    +  ShortType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>IntegerType</b> </td>
    +  <td> int or long </td>
    +  <td>
    +  IntegerType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>LongType</b> </td>
    +  <td>
    +  long <br />
    +  <b>Note:</b> Numbers will be converted to 8-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of
    +  -9223372036854775808 to 9223372036854775807.
    +  Otherwise, please convert data to decimal.Decimal and use DecimalType.
    +  </td>
    +  <td>
    +  LongType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>FloatType</b> </td>
    +  <td>
    +  float <br />
    +  <b>Note:</b> Numbers will be converted to 4-byte single-precision 
floating
    +  point numbers at runtime.
    +  </td>
    +  <td>
    +  FloatType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DoubleType</b> </td>
    +  <td> float </td>
    +  <td>
    +  DoubleType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DecimalType</b> </td>
    +  <td> decimal.Decimal </td>
    +  <td>
    +  DecimalType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StringType</b> </td>
    +  <td> string </td>
    +  <td>
    +  StringType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BinaryType</b> </td>
    +  <td> bytearray </td>
    +  <td>
    +  BinaryType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BooleanType</b> </td>
    +  <td> bool </td>
    +  <td>
    +  BooleanType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>TimestampType</b> </td>
    +  <td> datetime.datetime </td>
    +  <td>
    +  TimestampType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DateType</b> </td>
    +  <td> datetime.date </td>
    +  <td>
    +  DateType()
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ArrayType</b> </td>
    +  <td> list, tuple, or array </td>
    +  <td>
    +  ArrayType(<i>elementType</i>, [<i>containsNull</i>])<br />
    +  <b>Note:</b> The default value of <i>containsNull</i> is <i>True</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>MapType</b> </td>
    +  <td> dict </td>
    +  <td>
    +  MapType(<i>keyType</i>, <i>valueType</i>, [<i>valueContainsNull</i>])<br 
/>
    +  <b>Note:</b> The default value of <i>valueContainsNull</i> is 
<i>True</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructType</b> </td>
    +  <td> list or tuple </td>
    +  <td>
    +  StructType(<i>fields</i>)<br />
    +  <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields 
with the same
    +  name are not allowed.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructField</b> </td>
    +  <td> The value type in Python of the data type of this field
    +  (For example, Int for a StructField with the data type IntegerType) </td>
    +  <td>
    +  StructField(<i>name</i>, <i>dataType</i>, [<i>nullable</i>])<br />
    +  <b>Note:</b> The default value of <i>nullable</i> is <i>True</i>.
    +  </td>
    +</tr>
    +</table>
    +
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +
    +<table class="table">
    +<tr>
    +  <th style="width:20%">Data type</th>
    +  <th style="width:40%">Value type in R</th>
    +  <th>API to access or create a data type</th></tr>
    +<tr>
    +  <td> <b>ByteType</b> </td>
    +  <td>
    +  integer <br />
    +  <b>Note:</b> Numbers will be converted to 1-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of -128 to 127.
    +  </td>
    +  <td>
    +  "byte"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ShortType</b> </td>
    +  <td>
    +  integer <br />
    +  <b>Note:</b> Numbers will be converted to 2-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of -32768 to 32767.
    +  </td>
    +  <td>
    +  "short"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>IntegerType</b> </td>
    +  <td> integer </td>
    +  <td>
    +  "integer"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>LongType</b> </td>
    +  <td>
    +  integer <br />
    +  <b>Note:</b> Numbers will be converted to 8-byte signed integer numbers 
at runtime.
    +  Please make sure that numbers are within the range of
    +  -9223372036854775808 to 9223372036854775807.
    +  Otherwise, please convert data to decimal.Decimal and use DecimalType.
    +  </td>
    +  <td>
    +  "long"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>FloatType</b> </td>
    +  <td>
    +  numeric <br />
    +  <b>Note:</b> Numbers will be converted to 4-byte single-precision 
floating
    +  point numbers at runtime.
    +  </td>
    +  <td>
    +  "float"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DoubleType</b> </td>
    +  <td> numeric </td>
    +  <td>
    +  "double"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DecimalType</b> </td>
    +  <td> Not supported </td>
    +  <td>
    +   Not supported
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StringType</b> </td>
    +  <td> character </td>
    +  <td>
    +  "string"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BinaryType</b> </td>
    +  <td> raw </td>
    +  <td>
    +  "binary"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>BooleanType</b> </td>
    +  <td> logical </td>
    +  <td>
    +  "bool"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>TimestampType</b> </td>
    +  <td> POSIXct </td>
    +  <td>
    +  "timestamp"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>DateType</b> </td>
    +  <td> Date </td>
    +  <td>
    +  "date"
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>ArrayType</b> </td>
    +  <td> vector or list </td>
    +  <td>
    +  list(type="array", elementType=<i>elementType</i>, 
containsNull=[<i>containsNull</i>])<br />
    +  <b>Note:</b> The default value of <i>containsNull</i> is <i>TRUE</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>MapType</b> </td>
    +  <td> environment </td>
    +  <td>
    +  list(type="map", keyType=<i>keyType</i>, valueType=<i>valueType</i>, 
valueContainsNull=[<i>valueContainsNull</i>])<br />
    +  <b>Note:</b> The default value of <i>valueContainsNull</i> is 
<i>TRUE</i>.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructType</b> </td>
    +  <td> named list</td>
    +  <td>
    +  list(type="struct", fields=<i>fields</i>)<br />
    +  <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields 
with the same
    +  name are not allowed.
    +  </td>
    +</tr>
    +<tr>
    +  <td> <b>StructField</b> </td>
    +  <td> The value type in R of the data type of this field
    +  (For example, integer for a StructField with the data type IntegerType) 
</td>
    +  <td>
    +  list(name=<i>name</i>, type=<i>dataType</i>, 
nullable=[<i>nullable</i>])<br />
    +  <b>Note:</b> The default value of <i>nullable</i> is <i>TRUE</i>.
    +  </td>
    +</tr>
    +</table>
    +
    +</div>
    +
    +</div>
    +
    +## NaN Semantics
    +
    +There is specially handling for not-a-number (NaN) when dealing with 
`float` or `double` types that
    +does not exactly match standard floating point semantics.
    +Specifically:
    +
    + - NaN = NaN returns true.
    + - In aggregations, all NaN values are grouped together.
    + - NaN is treated as a normal value in join keys.
    + - NaN values go last when in ascending order, larger than any other 
numeric value.
    + 
    + ## Arithmetic operations
    --- End diff --
    
    The space indent here is wrong.
    
    
![image](https://user-images.githubusercontent.com/1097932/47088617-7f04b900-d251-11e8-82db-c8762f80b9a7.png)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to