[spark] branch master updated: [SPARK-26651][SQL][DOC] Collapse notes related to java.time API

gurwls223 Fri, 01 Feb 2019 19:18:20 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b85974d  [SPARK-26651][SQL][DOC] Collapse notes related to java.time 
API
b85974d is described below

commit b85974db85a881a2e8aebc31cb4f578008648ab9
Author: Maxim Gekk <max.g...@gmail.com>
AuthorDate: Sat Feb 2 11:17:33 2019 +0800

    [SPARK-26651][SQL][DOC] Collapse notes related to java.time API
    
    ## What changes were proposed in this pull request?
    
    Collapsed notes about using Java 8 API for date/timestamp manipulations and 
Proleptic Gregorian calendar in the SQL migration guide.
    
    Closes #23722 from MaxGekk/collapse-notes.
    
    Authored-by: Maxim Gekk <max.g...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 docs/sql-migration-guide-upgrade.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index 41f27a3..dbf9df0 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -31,14 +31,10 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, the `SET` command works without any 
warnings even if the specified key is for `SparkConf` entries and it has no 
effect because the command does not update `SparkConf`, but the behavior might 
confuse users. Since 3.0, the command fails if a `SparkConf` key is used. You 
can disable such a check by setting 
`spark.sql.legacy.setCommandRejectsSparkCoreConfs` to `false`.
 
-  - Since Spark 3.0, CSV/JSON datasources use java.time API for parsing and 
generating CSV/JSON content. In Spark version 2.4 and earlier, 
java.text.SimpleDateFormat is used for the same purpose with fallbacks to the 
parsing mechanisms of Spark 2.0 and 1.x. For example, `2018-12-08 10:39:21.123` 
with the pattern `yyyy-MM-dd'T'HH:mm:ss.SSS` cannot be parsed since Spark 3.0 
because the timestamp does not match to the pattern but it can be parsed by 
earlier Spark versions due to a fallback  [...]
-
   - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV 
string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, the 
returned row can contain non-`null` fields if some of CSV column values were 
parsed and converted to desired types successfully.
 
   - In Spark version 2.4 and earlier, JSON datasource and JSON functions like 
`from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
 
-  - Since Spark 3.0, the `unix_timestamp`, `date_format`, `to_unix_timestamp`, 
`from_unixtime`, `to_date`, `to_timestamp` functions use java.time API for 
parsing and formatting dates/timestamps from/to strings by using ISO chronology 
(https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html) 
based on Proleptic Gregorian calendar. In Spark version 2.4 and earlier, 
java.text.SimpleDateFormat and java.util.GregorianCalendar (hybrid calendar 
that supports both the Julian [...]
-
   - Since Spark 3.0, JSON datasource and JSON function `schema_of_json` infer 
TimestampType from string values if they match to the pattern defined by the 
JSON option `timestampFormat`. Set JSON option `inferTimestamp` to `false` to 
disable such type inferring.
 
   - In PySpark, when Arrow optimization is enabled, if Arrow version is higher 
than 0.11.0, Arrow can perform safe type conversion when converting 
Pandas.Series to Arrow array during serialization. Arrow will raise errors when 
detecting unsafe type conversion like overflow. Setting 
`spark.sql.execution.pandas.arrowSafeTypeConversion` to true can enable it. The 
default setting is false. PySpark's behavior for Arrow versions is illustrated 
in the table below:
@@ -91,11 +87,15 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, if 
`org.apache.spark.sql.functions.udf(Any, DataType)` gets a Scala closure with 
primitive-type argument, the returned UDF will return null if the input values 
is null. Since Spark 3.0, the UDF will return the default value of the Java 
type if the input value is null. For example, `val f = udf((x: Int) => x, 
IntegerType)`, `f($"x")` will return null in Spark 2.4 and earlier if column 
`x` is null, and return 0 in Spark 3.0. This behavior change is int [...]
 
-  - Since Spark 3.0, the `weekofyear`, `weekday` and `dayofweek` functions use 
java.time API for calculation week number of year and day number of week based 
on Proleptic Gregorian calendar. In Spark version 2.4 and earlier, the hybrid 
calendar (Julian + Gregorian) is used for the same purpose. Results of the 
functions returned by Spark 3.0 and previous versions can be different for 
dates before October 15, 1582 (Gregorian).
+  - Since Spark 3.0, Proleptic Gregorian calendar is used in parsing, 
formatting, and converting dates and timestamps as well as in extracting 
sub-components like years, days and etc. Spark 3.0 uses Java 8 API classes from 
the java.time packages that based on ISO chronology 
(https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html).
 In Spark version 2.4 and earlier, those operations are performed by using the 
hybrid calendar (Julian + Gregorian, see https://docs.orac [...]
+
+    - CSV/JSON datasources use java.time API for parsing and generating 
CSV/JSON content. In Spark version 2.4 and earlier, java.text.SimpleDateFormat 
is used for the same purpose with fallbacks to the parsing mechanisms of Spark 
2.0 and 1.x. For example, `2018-12-08 10:39:21.123` with the pattern 
`yyyy-MM-dd'T'HH:mm:ss.SSS` cannot be parsed since Spark 3.0 because the 
timestamp does not match to the pattern but it can be parsed by earlier Spark 
versions due to a fallback to `Timestamp.v [...]
+
+    - The `unix_timestamp`, `date_format`, `to_unix_timestamp`, 
`from_unixtime`, `to_date`, `to_timestamp` functions. New implementation 
supports pattern formats as described here 
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
 and performs strict checking of its input. For example, the `2015-07-22 
10:00:00` timestamp cannot be parse if pattern is `yyyy-MM-dd` because the 
parser does not consume whole input. Another example is the `31/01/2015 00:00` 
inpu [...]
 
-  - Since Spark 3.0, the JDBC options `lowerBound` and `upperBound` are 
converted to TimestampType/DateType values in the same way as casting strings 
to TimestampType/DateType values. The conversion is based on Proleptic 
Gregorian calendar, and time zone defined by the SQL config 
`spark.sql.session.timeZone`. In Spark version 2.4 and earlier, the conversion 
is based on the hybrid calendar (Julian + Gregorian) and on default system time 
zone.
+    - The `weekofyear`, `weekday`, `dayofweek`, `date_trunc`, 
`from_utc_timestamp`, `to_utc_timestamp`, and `unix_timestamp` functions use 
java.time API for calculation week number of year, day number of week as well 
for conversion from/to TimestampType values in UTC time zone.
 
-  - Since Spark 3.0, the `date_trunc`, `from_utc_timestamp`, 
`to_utc_timestamp`, and `unix_timestamp` functions use java.time API based on 
Proleptic Gregorian calendar. In Spark version 2.4 and earlier, the hybrid 
calendar (Julian + Gregorian) is used for the same purpose. Results of the 
functions returned by Spark 3.0 and previous versions can be different for 
dates before October 15, 1582 (Gregorian).
+    - the JDBC options `lowerBound` and `upperBound` are converted to 
TimestampType/DateType values in the same way as casting strings to 
TimestampType/DateType values. The conversion is based on Proleptic Gregorian 
calendar, and time zone defined by the SQL config `spark.sql.session.timeZone`. 
In Spark version 2.4 and earlier, the conversion is based on the hybrid 
calendar (Julian + Gregorian) and on default system time zone.
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26651][SQL][DOC] Collapse notes related to java.time API

Reply via email to