[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with map type key f
[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with map type key f
[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with map type key f