svn commit: r30000 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_10_16_02-80813e1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Oct 10 23:16:48 2018 New Revision: 3 Log: Apache Spark 3.0.0-SNAPSHOT-2018_10_10_16_02-80813e1 docs [This commit notification would consist of 1482 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29997 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_10_12_02-6df2345-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Oct 10 19:16:46 2018 New Revision: 29997 Log: Apache Spark 3.0.0-SNAPSHOT-2018_10_10_12_02-6df2345 docs [This commit notification would consist of 1482 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6
Repository: spark Updated Branches: refs/heads/master 6df234579 -> 80813e198 [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6 ## What changes were proposed in this pull request? Remove Hadoop 2.6 references and make 2.7 the default. Obviously, this is for master/3.0.0 only. After this we can also get rid of the separate test jobs for Hadoop 2.6. ## How was this patch tested? Existing tests Closes #22615 from srowen/SPARK-25016. Authored-by: Sean Owen Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/80813e19 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/80813e19 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/80813e19 Branch: refs/heads/master Commit: 80813e198033cd63cc6100ee6ffe7d1eb1dff27b Parents: 6df2345 Author: Sean Owen Authored: Wed Oct 10 12:07:53 2018 -0700 Committer: Sean Owen Committed: Wed Oct 10 12:07:53 2018 -0700 -- dev/appveyor-install-dependencies.ps1 | 3 +- dev/create-release/release-build.sh | 43 ++-- dev/deps/spark-deps-hadoop-2.6 | 198 --- dev/run-tests.py| 15 +- dev/test-dependencies.sh| 1 - docs/building-spark.md | 11 +- docs/index.md | 3 - docs/running-on-yarn.md | 3 +- hadoop-cloud/pom.xml| 59 +++--- pom.xml | 14 +- .../dev/dev-run-integration-tests.sh| 2 +- .../org/apache/spark/deploy/yarn/Client.scala | 13 +- .../org/apache/spark/sql/hive/TableReader.scala | 2 +- .../sql/hive/client/IsolatedClientLoader.scala | 11 +- 14 files changed, 68 insertions(+), 310 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/80813e19/dev/appveyor-install-dependencies.ps1 -- diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index 8a04b62..c918828 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -95,7 +95,8 @@ $env:MAVEN_OPTS = "-Xmx2g -XX:ReservedCodeCacheSize=512m" Pop-Location # == Hadoop bin package -$hadoopVer = "2.6.4" +# This must match the version at https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1 +$hadoopVer = "2.7.1" $hadoopPath = "$tools\hadoop" if (!(Test-Path $hadoopPath)) { New-Item -ItemType Directory -Force -Path $hadoopPath | Out-Null http://git-wip-us.apache.org/repos/asf/spark/blob/80813e19/dev/create-release/release-build.sh -- diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index cce5f8b..89593cf 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -191,9 +191,19 @@ if [[ "$1" == "package" ]]; then make_binary_release() { NAME=$1 FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2" +# BUILD_PACKAGE can be "withpip", "withr", or both as "withpip,withr" BUILD_PACKAGE=$3 SCALA_VERSION=$4 +PIP_FLAG="" +if [[ $BUILD_PACKAGE == *"withpip"* ]]; then + PIP_FLAG="--pip" +fi +R_FLAG="" +if [[ $BUILD_PACKAGE == *"withr"* ]]; then + R_FLAG="--r" +fi + # We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds # share the same Zinc server. ZINC_PORT=$((ZINC_PORT + 1)) @@ -217,18 +227,13 @@ if [[ "$1" == "package" ]]; then # Get maven home set by MVN MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'` +echo "Creating distribution" +./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz \ + $PIP_FLAG $R_FLAG $FLAGS \ + -DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log +cd .. -if [ -z "$BUILD_PACKAGE" ]; then - echo "Creating distribution without PIP/R package" - ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz $FLAGS \ --DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log - cd .. -elif [[ "$BUILD_PACKAGE" == "withr" ]]; then - echo "Creating distribution with R package" - ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --r $FLAGS \ --DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log - cd .. - +if [[ -n $R_FLAG ]]; then echo "Copying and signing R source package" R_DIST_NAME=SparkR_$SPARK_VERSION.tar.gz cp spark-$SPARK_VERSION-bin-$NAME/R/$R_DIST_NAME . @@ -239,12 +244,9 @@ if [[ "$1" == "package" ]]; then echo
spark git commit: [SPARK-25699][SQL] Partially push down conjunctive predicated in ORC
Repository: spark Updated Branches: refs/heads/master 8a7872dc2 -> 6df234579 [SPARK-25699][SQL] Partially push down conjunctive predicated in ORC ## What changes were proposed in this pull request? Inspired by https://github.com/apache/spark/pull/22574 . We can partially push down top level conjunctive predicates to Orc. This PR improves Orc predicate push down in both SQL and Hive module. ## How was this patch tested? New unit test. Closes #22684 from gengliangwang/pushOrcFilters. Authored-by: Gengliang Wang Signed-off-by: DB Tsai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6df23457 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6df23457 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6df23457 Branch: refs/heads/master Commit: 6df2345794614c33c95fa453cabac755cf94d131 Parents: 8a7872d Author: Gengliang Wang Authored: Wed Oct 10 18:18:56 2018 + Committer: DB Tsai Committed: Wed Oct 10 18:18:56 2018 + -- .../execution/datasources/orc/OrcFilters.scala | 69 +++- .../datasources/orc/OrcFilterSuite.scala| 37 ++- .../apache/spark/sql/hive/orc/OrcFilters.scala | 69 +++- .../spark/sql/hive/orc/HiveOrcFilterSuite.scala | 45 - 4 files changed, 186 insertions(+), 34 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6df23457/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala index dbafc46..2b17b47 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala @@ -138,6 +138,23 @@ private[sql] object OrcFilters { dataTypeMap: Map[String, DataType], expression: Filter, builder: Builder): Option[Builder] = { +createBuilder(dataTypeMap, expression, builder, canPartialPushDownConjuncts = true) + } + + /** + * @param dataTypeMap a map from the attribute name to its data type. + * @param expression the input filter predicates. + * @param builder the input SearchArgument.Builder. + * @param canPartialPushDownConjuncts whether a subset of conjuncts of predicates can be pushed + *down safely. Pushing ONLY one side of AND down is safe to + *do at the top level or none of its ancestors is NOT and OR. + * @return the builder so far. + */ + private def createBuilder( + dataTypeMap: Map[String, DataType], + expression: Filter, + builder: Builder, + canPartialPushDownConjuncts: Boolean): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = getPredicateLeafType(dataTypeMap(attribute)) @@ -145,32 +162,52 @@ private[sql] object OrcFilters { expression match { case And(left, right) => -// At here, it is not safe to just convert one side if we do not understand the -// other side. Here is an example used to explain the reason. +// At here, it is not safe to just convert one side and remove the other side +// if we do not understand what the parent filters are. +// +// Here is an example used to explain the reason. // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to // convert b in ('1'). If we only convert a = 2, we will end up with a filter // NOT(a = 2), which will generate wrong results. -// Pushing one side of AND down is only safe to do at the top level. -// You can see ParquetRelation's initializeLocalJobFunc method as an example. -for { - _ <- buildSearchArgument(dataTypeMap, left, newBuilder) - _ <- buildSearchArgument(dataTypeMap, right, newBuilder) - lhs <- buildSearchArgument(dataTypeMap, left, builder.startAnd()) - rhs <- buildSearchArgument(dataTypeMap, right, lhs) -} yield rhs.end() +// +// Pushing one side of AND down is only safe to do at the top level or in the child +// AND before hitting NOT or OR conditions, and in this case, the unsupported predicate +// can be safely removed. +val leftBuilderOption = + createBuilder(dataTypeMap, left, newBuilder, canPartialPushDownConjuncts) +val rightBuilderOption = + createBuilder(dataTypeMap, right, newBuilder, canPartialPushDownConjuncts) +(leftBuilderOption, rightBuilderOption) match {
svn commit: r29994 - in /dev/spark/2.4.1-SNAPSHOT-2018_10_10_10_02-cd40655-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Oct 10 17:17:08 2018 New Revision: 29994 Log: Apache Spark 2.4.1-SNAPSHOT-2018_10_10_10_02-cd40655 docs [This commit notification would consist of 1472 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master
Repository: spark Updated Branches: refs/heads/branch-2.4 71b8739fe -> cd4065596 [SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master ## What changes were proposed in this pull request? Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception. ## How was this patch tested? Verified it manually by checking the cause of the error, it gives the error as shown below. ### Without the PR change ``` [apache-spark]$ ./bin/spark-submit --verbose --master spark://** Error: Exception thrown in awaitResult: Run with --help for usage help or --verbose for debug output ``` ### With the PR change ``` [apache-spark]$ ./bin/spark-submit --verbose --master spark://** Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 more Caused by: java.net.ConnectException: Connection refused ... 11 more ``` Closes #22623 from devaraj-kavali/SPARK-25636. Authored-by: Devaraj K Signed-off-by: Marcelo Vanzin (cherry picked from commit 8a7872dc254710f9b29fdfdb2915a949ef606871) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cd406559 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cd406559 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cd406559 Branch: refs/heads/branch-2.4 Commit: cd40655965072051dfae65eabd979edff0e4d398 Parents: 71b8739 Author: Devaraj K Authored: Wed Oct 10 09:24:36 2018 -0700 Committer: Marcelo Vanzin Committed: Wed Oct 10 09:24:50 2018 -0700 -- .../org/apache/spark/deploy/SparkSubmit.scala | 2 -- .../org/apache/spark/deploy/SparkSubmitSuite.scala | 17 +++-- 2 files changed, 11 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cd406559/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index cf902db..1d32d96 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -925,8 +925,6 @@ object SparkSubmit extends CommandLineUtils with Logging { } catch { case e: SparkUserAppException => exitFn(e.exitCode) - case e: SparkException => -printErrorAndExit(e.getMessage()) } } http://git-wip-us.apache.org/repos/asf/spark/blob/cd406559/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala index 9eae360..652c36f 100644 --- a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala @@ -74,20 +74,25 @@ trait TestPrematureExit { @volatile var exitedCleanly = false mainObject.exitFn = (_) => exitedCleanly = true +@volatile var exception: Exception = null val thread = new Thread { override def run() = try { mainObject.main(input) } catch { -// If exceptions occur after the "exit" has happened, fine to ignore them. -// These represent code paths not reachable during normal execution. -case e: Exception => if (!exitedCleanly) throw e +// Capture the exception to check whether the exception contains searchString or not +case e: Exception => exception = e } } thread.start() thread.join() -val
spark git commit: [SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master
Repository: spark Updated Branches: refs/heads/master 3528c08be -> 8a7872dc2 [SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master ## What changes were proposed in this pull request? Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception. ## How was this patch tested? Verified it manually by checking the cause of the error, it gives the error as shown below. ### Without the PR change ``` [apache-spark]$ ./bin/spark-submit --verbose --master spark://** Error: Exception thrown in awaitResult: Run with --help for usage help or --verbose for debug output ``` ### With the PR change ``` [apache-spark]$ ./bin/spark-submit --verbose --master spark://** Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 more Caused by: java.net.ConnectException: Connection refused ... 11 more ``` Closes #22623 from devaraj-kavali/SPARK-25636. Authored-by: Devaraj K Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a7872dc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a7872dc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a7872dc Branch: refs/heads/master Commit: 8a7872dc254710f9b29fdfdb2915a949ef606871 Parents: 3528c08 Author: Devaraj K Authored: Wed Oct 10 09:24:36 2018 -0700 Committer: Marcelo Vanzin Committed: Wed Oct 10 09:24:36 2018 -0700 -- .../org/apache/spark/deploy/SparkSubmit.scala | 2 -- .../org/apache/spark/deploy/SparkSubmitSuite.scala | 17 +++-- 2 files changed, 11 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8a7872dc/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index d5f2865..61b379f 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -927,8 +927,6 @@ object SparkSubmit extends CommandLineUtils with Logging { } catch { case e: SparkUserAppException => exitFn(e.exitCode) - case e: SparkException => -printErrorAndExit(e.getMessage()) } } http://git-wip-us.apache.org/repos/asf/spark/blob/8a7872dc/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala index 9eae360..652c36f 100644 --- a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala @@ -74,20 +74,25 @@ trait TestPrematureExit { @volatile var exitedCleanly = false mainObject.exitFn = (_) => exitedCleanly = true +@volatile var exception: Exception = null val thread = new Thread { override def run() = try { mainObject.main(input) } catch { -// If exceptions occur after the "exit" has happened, fine to ignore them. -// These represent code paths not reachable during normal execution. -case e: Exception => if (!exitedCleanly) throw e +// Capture the exception to check whether the exception contains searchString or not +case e: Exception => exception = e } } thread.start() thread.join() -val joined = printStream.lineBuffer.mkString("\n") -if (!joined.contains(searchString)) { - fail(s"Search
spark git commit: [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run time of CompressionCodecSuite
Repository: spark Updated Branches: refs/heads/master eaafcd8a2 -> 3528c08be [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run time of CompressionCodecSuite ## What changes were proposed in this pull request? Reduced the combination of codecs from 9 to 3 to improve the test runtime. ## How was this patch tested? This is a test fix. Closes #22641 from dilipbiswal/SPARK-25611. Authored-by: Dilip Biswal Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3528c08b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3528c08b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3528c08b Branch: refs/heads/master Commit: 3528c08bebbcad3dee7557945ddcd31c99deb50e Parents: eaafcd8 Author: Dilip Biswal Authored: Wed Oct 10 08:51:16 2018 -0700 Committer: Sean Owen Committed: Wed Oct 10 08:51:16 2018 -0700 -- .../spark/sql/hive/CompressionCodecSuite.scala | 54 1 file changed, 21 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3528c08b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala index 1bd7e52..398f4d2 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala @@ -229,8 +229,8 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo tableCompressionCodecs: List[String]) (assertionCompressionCodec: (Option[String], String, String, Long) => Unit): Unit = { withSQLConf(getConvertMetastoreConfName(format) -> convertMetastore.toString) { - tableCompressionCodecs.foreach { tableCompression => -compressionCodecs.foreach { sessionCompressionCodec => + tableCompressionCodecs.zipAll(compressionCodecs, null, "SNAPPY").foreach { +case (tableCompression, sessionCompressionCodec) => withSQLConf(getSparkCompressionConfName(format) -> sessionCompressionCodec) { // 'tableCompression = null' means no table-level compression val compression = Option(tableCompression) @@ -240,7 +240,6 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo compression, sessionCompressionCodec, realCompressionCodec, tableSize) } } -} } } } @@ -262,7 +261,10 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo } } - def checkForTableWithCompressProp(format: String, compressCodecs: List[String]): Unit = { + def checkForTableWithCompressProp( + format: String, + tableCompressCodecs: List[String], + sessionCompressCodecs: List[String]): Unit = { Seq(true, false).foreach { isPartitioned => Seq(true, false).foreach { convertMetastore => Seq(true, false).foreach { usingCTAS => @@ -271,10 +273,10 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo isPartitioned, convertMetastore, usingCTAS, -compressionCodecs = compressCodecs, -tableCompressionCodecs = compressCodecs) { +compressionCodecs = sessionCompressCodecs, +tableCompressionCodecs = tableCompressCodecs) { case (tableCodec, sessionCodec, realCodec, tableSize) => - val expectCodec = tableCodec.get + val expectCodec = tableCodec.getOrElse(sessionCodec) assert(expectCodec == realCodec) assert(checkTableSize( format, expectCodec, isPartitioned, convertMetastore, usingCTAS, tableSize)) @@ -284,36 +286,22 @@ class CompressionCodecSuite extends TestHiveSingleton with ParquetTest with Befo } } - def checkForTableWithoutCompressProp(format: String, compressCodecs: List[String]): Unit = { -Seq(true, false).foreach { isPartitioned => - Seq(true, false).foreach { convertMetastore => -Seq(true, false).foreach { usingCTAS => - checkTableCompressionCodecForCodecs( -format, -isPartitioned, -convertMetastore, -usingCTAS, -compressionCodecs = compressCodecs, -tableCompressionCodecs = List(null)) { -case (tableCodec, sessionCodec, realCodec, tableSize) => - // Always expect session-level take effect - assert(sessionCodec == realCodec) - assert(checkTableSize( -
spark git commit: [SPARK-25605][TESTS] Alternate take. Run cast string to timestamp tests for a subset of timezones
Repository: spark Updated Branches: refs/heads/master 3caab872d -> eaafcd8a2 [SPARK-25605][TESTS] Alternate take. Run cast string to timestamp tests for a subset of timezones ## What changes were proposed in this pull request? Try testing timezones in parallel instead in CastSuite, instead of random sampling. See also #22631 ## How was this patch tested? Existing test. Closes #22672 from srowen/SPARK-25605.2. Authored-by: Sean Owen Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaafcd8a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eaafcd8a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eaafcd8a Branch: refs/heads/master Commit: eaafcd8a22db187e87f09966826dcf677c4c38ea Parents: 3caab87 Author: Sean Owen Authored: Wed Oct 10 08:25:12 2018 -0700 Committer: Sean Owen Committed: Wed Oct 10 08:25:12 2018 -0700 -- .../org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/eaafcd8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index 90c0bf7..94dee7e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -112,7 +112,7 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { } test("cast string to timestamp") { -for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) { +ALL_TIMEZONES.par.foreach { tz => def checkCastStringToTimestamp(str: String, expected: Timestamp): Unit = { checkEvaluation(cast(Literal(str), TimestampType, Option(tz.getID)), expected) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29990 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_10_08_02-3caab87-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Oct 10 15:17:21 2018 New Revision: 29990 Log: Apache Spark 3.0.0-SNAPSHOT-2018_10_10_08_02-3caab87 docs [This commit notification would consist of 1482 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29989 - in /dev/spark/v2.4.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: wenchen Date: Wed Oct 10 14:49:52 2018 New Revision: 29989 Log: Apache Spark v2.4.0-rc3 docs [This commit notification would consist of 1474 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29988 - /dev/spark/v2.4.0-rc3-bin/
Author: wenchen Date: Wed Oct 10 14:30:18 2018 New Revision: 29988 Log: Apache Spark v2.4.0-rc3 Added: dev/spark/v2.4.0-rc3-bin/ dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz (with props) dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.asc dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.sha512 dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz (with props) dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz.asc dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz.sha512 dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.6.tgz (with props) dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.6.tgz.asc dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.6.tgz.sha512 dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.7.tgz (with props) dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.7.tgz.asc dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-hadoop2.7.tgz.sha512 dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop-scala-2.12.tgz (with props) dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop-scala-2.12.tgz.asc dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop-scala-2.12.tgz.sha512 dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop.tgz (with props) dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop.tgz.asc dev/spark/v2.4.0-rc3-bin/spark-2.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v2.4.0-rc3-bin/spark-2.4.0.tgz (with props) dev/spark/v2.4.0-rc3-bin/spark-2.4.0.tgz.asc dev/spark/v2.4.0-rc3-bin/spark-2.4.0.tgz.sha512 Added: dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.asc == --- dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.asc (added) +++ dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.asc Wed Oct 10 14:30:18 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbvglgAAoJEGuscolPT9yKGUAP/jin9W23RsTESt5iJ1UmtKyF +iEsSLvfjhnkA2hpbyYmWLbn7/NxW8xXpSkypOfOBht16DBTOdYF02hl4nk1Ydrsm +pRlPaiV8IgzDptT4HKIRF3QG6m+sTntoEBwiFGjsFSjdM585YZDiIv/H5T+Y8pKH +jzBE69MI1HcMOZlgIMpsR6H3ZxAqpZncYh2SY9nFvvlhjKrcG9fQTPfuoG+0Q62F +FSCMW36Rzt7DusN6dtlhbCTGW66I0oXbKddT4aoK/lqRXgc3esFcIe8UyGFELRQw +5tPdyWPy5YpgKu9fHZZjhZmh1AJQzB+/i3Szh1yAXlSkqgLdvA7wGjIIKO3cyspf +l4FTAl5LMQKF6fhnplon3vdC1x8UX89Ip1pwhYFwHex8fOGFREyp5w/B7A2IflhR +id/U71w1vdi9xWANoyKVhAYDTZpE9AMGEvh5ACY+jpnw14b6omlqI+zhv+/Gmibi +dJE6FlpmrI25xxN7t48+Qj59YlXx06C+2JIUvs0LrJUT7M/yFuosJLNPHn3gTamE +28ZjhiJ0co5JLXcCkuVUfIlnej5B5rjjqanQAN/mibil8invXSVn7Kddn2CVveyt +vIeD2h/W7WwruzANwAsoTKlYn76S+chDD0biPtK60BfWddcOTRTTZT+PvtfD2fjp +hrp0mF1QcvGE+c77bmnn +=hCFv +-END PGP SIGNATURE- Added: dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.sha512 == --- dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.sha512 (added) +++ dev/spark/v2.4.0-rc3-bin/SparkR_2.4.0.tar.gz.sha512 Wed Oct 10 14:30:18 2018 @@ -0,0 +1,3 @@ +SparkR_2.4.0.tar.gz: 1530EB56 B6FC9627 1CBEFA2B 918A6D1B A901299E FCC6B396 + 74319B4D C5063ABA A91DB157 5DBD6299 E28E2D02 126EE70B + A1166CA6 09C903C8 F9A14DF5 C657346E Added: dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz.asc == --- dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz.asc (added) +++ dev/spark/v2.4.0-rc3-bin/pyspark-2.4.0.tar.gz.asc Wed Oct 10 14:30:18 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbvgahAAoJEGuscolPT9yKU04P/2N8ZrNc9OhmhqUfTrgcoP7w +xBby+wWsr1LgT4onxToZnRCMGsMVUFsUibFYCvGj+GJknuHLFPn2C6mceXetZpim +jYdbIZSWOFOBHfoVPwWqjZiRWhN11wMJnf5O2ZDm+LBKVd8uG1h+bzBkIJz9nlT3 +f7y5JvHf9g8F3imSMhdE1MNJttQMMhKR+4mMWbIlWnvGcMU7+R8Qf7I4ycq0Oam4 +IUdJfxFtpg0YquC12WZ1i5zbq/B/4mCa/LMb6pjYpxH3ifVgFgejIbMKMZbZ4ngQ +3GcxZHunxD/2EYZJeDoY72m4c9xAHx2aXtgmadBq75hRrdGO2U/QDklyju5VxCnt +O1F6jlLNGmvsJSJ7+G8IFlzYH87KcdGJMSAIuxEska5B4dPH4dlh+r8w+I4X/37k +q6Z/sT55eDXA5URhWBe6PmZT7GYHJmkaZQtt72Pvem40btYt1Q9I9xr/elbBzt0P +KEEzxg4UQZRge3m9s4uzPwNcstPenoELpK7lPmNFlix3cAECJqGDU4ct7bL1qVnk +tOCLJrLfudAd86enr/Urxi04tL1eHJ1VdRHOgdolNKuw0LavN5PVFcZR2X1jJPDH +3JPx4mM8qt9BGtkwI5HZzXp6LeLrk6/zOw68f2QBHVZeuf+EE0YzqOP83czNc/rA +EcduOQbg7TAuPOmdaGcM +=PSYJ +-END PGP SIGNATURE- Added:
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.4.0-rc3 [created] 8e4a99bd2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Preparing development version 2.4.1-SNAPSHOT
Preparing development version 2.4.1-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/71b8739f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/71b8739f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/71b8739f Branch: refs/heads/branch-2.4 Commit: 71b8739fe0f6d63775ee799e5867295ff6637c8c Parents: 8e4a99b Author: Wenchen Fan Authored: Wed Oct 10 13:26:16 2018 + Committer: Wenchen Fan Committed: Wed Oct 10 13:26:16 2018 + -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml| 2 +- external/flume-sink/pom.xml| 2 +- external/flume/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10/pom.xml| 2 +- external/kafka-0-8-assembly/pom.xml| 2 +- external/kafka-0-8/pom.xml | 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 44 insertions(+), 44 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/71b8739f/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index f52d785..714b6f1 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.4.0 +Version: 2.4.1 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/71b8739f/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 63ab510..ee0de73 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.4.0 +2.4.1-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/71b8739f/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index b10e118..b89e0fe 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.4.0 +2.4.1-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/71b8739f/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml
[1/2] spark git commit: Preparing Spark release v2.4.0-rc3
Repository: spark Updated Branches: refs/heads/branch-2.4 404c84039 -> 71b8739fe Preparing Spark release v2.4.0-rc3 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e4a99bd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e4a99bd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e4a99bd Branch: refs/heads/branch-2.4 Commit: 8e4a99bd201b9204fec52580f19ae70a229ed94e Parents: 404c840 Author: Wenchen Fan Authored: Wed Oct 10 13:26:12 2018 + Committer: Wenchen Fan Committed: Wed Oct 10 13:26:12 2018 + -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml| 2 +- external/flume-sink/pom.xml| 2 +- external/flume/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10/pom.xml| 2 +- external/kafka-0-8-assembly/pom.xml| 2 +- external/kafka-0-8/pom.xml | 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 44 insertions(+), 44 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8e4a99bd/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 714b6f1..f52d785 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.4.1 +Version: 2.4.0 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/8e4a99bd/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index ee0de73..63ab510 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.4.1-SNAPSHOT +2.4.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8e4a99bd/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index b89e0fe..b10e118 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.4.1-SNAPSHOT +2.4.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8e4a99bd/common/network-common/pom.xml
spark git commit: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide.
Repository: spark Updated Branches: refs/heads/master faf73dcd3 -> 3caab872d [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide. ## What changes were proposed in this pull request? This is a follow-up pr of #18536 and #22545 to update the migration guide. ## How was this patch tested? Build and check the doc locally. Closes #22682 from ueshin/issues/SPARK-20946_25525/migration_guide. Authored-by: Takuya UESHIN Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3caab872 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3caab872 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3caab872 Branch: refs/heads/master Commit: 3caab872db22246c9ab5f3395498f05cb097c142 Parents: faf73dc Author: Takuya UESHIN Authored: Wed Oct 10 21:07:59 2018 +0800 Committer: Wenchen Fan Committed: Wed Oct 10 21:07:59 2018 +0800 -- docs/sql-programming-guide.md | 6 ++ 1 file changed, 6 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3caab872/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index a1d7b11..0d29357 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see # Migration Guide +## Upgrading From Spark SQL 2.4 to 3.0 + + - In PySpark, when creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, the builder was trying to update the `SparkConf` of the existing `SparkContext` with configurations specified to the builder, but the `SparkContext` is shared by all `SparkSession`s, so we should not update them. Since 3.0, the builder come to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a `SparkSession`. + ## Upgrading From Spark SQL 2.3 to 2.4 - In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below. @@ -2135,6 +2139,8 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In PySpark, `df.replace` does not allow to omit `value` when `to_replace` is not a dictionary. Previously, `value` could be omitted in the other cases and had `None` by default, which is counterintuitive and error-prone. - Un-aliased subquery's semantic has not been well defined with confusing behaviors. Since Spark 2.3, we invalidate such confusing cases, for example: `SELECT v.i from (SELECT i FROM v)`, Spark will throw an analysis exception in this case because users should not be able to use the qualifier inside a subquery. See [SPARK-20690](https://issues.apache.org/jira/browse/SPARK-20690) and [SPARK-21335](https://issues.apache.org/jira/browse/SPARK-21335) for more details. + - When creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, the builder was trying to update the `SparkConf` of the existing `SparkContext` with configurations specified to the builder, but the `SparkContext` is shared by all `SparkSession`s, so we should not update them. Since 2.3, the builder come to not update the configurations. If you want to update them, you need to update them prior to creating a `SparkSession`. + ## Upgrading From Spark SQL 2.1 to 2.2 - Spark 2.1.1 introduced a new configuration key: `spark.sql.hive.caseSensitiveInferenceMode`. It had a default setting of `NEVER_INFER`, which kept behavior identical to 2.1.0. However, Spark 2.2.0 changes this setting's default value to `INFER_AND_SAVE` to restore compatibility with reading Hive metastore tables whose underlying file schema have mixed-case column names. With the `INFER_AND_SAVE` configuration value, on first access Spark will perform schema inference on any Hive metastore table for which it has not already saved an inferred schema. Note that schema inference can be a very time-consuming operation for tables with thousands of partitions. If compatibility with mixed-case column names is not a concern, you can safely set `spark.sql.hive.caseSensitiveInferenceMode` to `NEVER_INFER` to avoid the initial overhead of schema inference. Note that with the new default `INFER_AND_SAVE` setting, the results of the