spark git commit: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide.

wenchen Wed, 10 Oct 2018 06:08:23 -0700

Repository: spark
Updated Branches:
  refs/heads/master faf73dcd3 -> 3caab872d



[SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide.

## What changes were proposed in this pull request?

This is a follow-up pr of #18536 and #22545 to update the migration guide.

## How was this patch tested?

Build and check the doc locally.

Closes #22682 from ueshin/issues/SPARK-20946_25525/migration_guide.

Authored-by: Takuya UESHIN <ues...@databricks.com>
Signed-off-by: Wenchen Fan <wenc...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3caab872
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3caab872
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3caab872

Branch: refs/heads/master
Commit: 3caab872db22246c9ab5f3395498f05cb097c142
Parents: faf73dc
Author: Takuya UESHIN <ues...@databricks.com>
Authored: Wed Oct 10 21:07:59 2018 +0800
Committer: Wenchen Fan <wenc...@databricks.com>
Committed: Wed Oct 10 21:07:59 2018 +0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 6 ++++++
 1 file changed, 6 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3caab872/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index a1d7b11..0d29357 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.4 to 3.0
+
+  - In PySpark, when creating a `SparkSession` with 
`SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, 
the builder was trying to update the `SparkConf` of the existing `SparkContext` 
with configurations specified to the builder, but the `SparkContext` is shared 
by all `SparkSession`s, so we should not update them. Since 3.0, the builder 
come to not update the configurations. This is the same behavior as Java/Scala 
API in 2.3 and above. If you want to update them, you need to update them prior 
to creating a `SparkSession`.
+
 ## Upgrading From Spark SQL 2.3 to 2.4
 
   - In Spark version 2.3 and earlier, the second parameter to array_contains 
function is implicitly promoted to the element type of first array type 
parameter. This type promotion can be lossy and may cause `array_contains` 
function to return wrong result. This problem has been addressed in 2.4 by 
employing a safer type promotion mechanism. This can cause some change in 
behavior and are illustrated in the table below.
@@ -2135,6 +2139,8 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
   - In PySpark, `df.replace` does not allow to omit `value` when `to_replace` 
is not a dictionary. Previously, `value` could be omitted in the other cases 
and had `None` by default, which is counterintuitive and error-prone.
   - Un-aliased subquery's semantic has not been well defined with confusing 
behaviors. Since Spark 2.3, we invalidate such confusing cases, for example: 
`SELECT v.i from (SELECT i FROM v)`, Spark will throw an analysis exception in 
this case because users should not be able to use the qualifier inside a 
subquery. See [SPARK-20690](https://issues.apache.org/jira/browse/SPARK-20690) 
and [SPARK-21335](https://issues.apache.org/jira/browse/SPARK-21335) for more 
details.
 
+  - When creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, 
if there is an existing `SparkContext`, the builder was trying to update the 
`SparkConf` of the existing `SparkContext` with configurations specified to the 
builder, but the `SparkContext` is shared by all `SparkSession`s, so we should 
not update them. Since 2.3, the builder come to not update the configurations. 
If you want to update them, you need to update them prior to creating a 
`SparkSession`.
+
 ## Upgrading From Spark SQL 2.1 to 2.2
 
   - Spark 2.1.1 introduced a new configuration key: 
`spark.sql.hive.caseSensitiveInferenceMode`. It had a default setting of 
`NEVER_INFER`, which kept behavior identical to 2.1.0. However, Spark 2.2.0 
changes this setting's default value to `INFER_AND_SAVE` to restore 
compatibility with reading Hive metastore tables whose underlying file schema 
have mixed-case column names. With the `INFER_AND_SAVE` configuration value, on 
first access Spark will perform schema inference on any Hive metastore table 
for which it has not already saved an inferred schema. Note that schema 
inference can be a very time-consuming operation for tables with thousands of 
partitions. If compatibility with mixed-case column names is not a concern, you 
can safely set `spark.sql.hive.caseSensitiveInferenceMode` to `NEVER_INFER` to 
avoid the initial overhead of schema inference. Note that with the new default 
`INFER_AND_SAVE` setting, the results of the schema inference are saved as a 
metastore key for future use
 . Therefore, the initial schema inference occurs only at a table's first 
access.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide.

Reply via email to