svn commit: r27222 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_16_01-8ef167a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Jun 1 23:16:03 2018 New Revision: 27222 Log: Apache Spark 2.4.0-SNAPSHOT-2018_06_01_16_01-8ef167a docs [This commit notification would consist of 1466 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27221 - in /dev/spark/2.3.2-SNAPSHOT-2018_06_01_14_01-21800b8-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Jun 1 21:16:28 2018 New Revision: 27221 Log: Apache Spark 2.3.2-SNAPSHOT-2018_06_01_14_01-21800b8 docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27220 - in /dev/spark/v2.3.1-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: vanzin Date: Fri Jun 1 21:12:21 2018 New Revision: 27220 Log: Apache Spark v2.3.1-rc4 docs [This commit notification would consist of 1446 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27219 - /dev/spark/v2.3.1-rc4-bin/
Author: vanzin Date: Fri Jun 1 20:59:55 2018 New Revision: 27219 Log: Apache Spark v2.3.1-rc4 Added: dev/spark/v2.3.1-rc4-bin/ dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz (with props) dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512 dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz (with props) dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512 dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz (with props) dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz.asc dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512 dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz (with props) dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz.asc dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz.sha512 Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc Fri Jun 1 20:59:55 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbEbIGAAoJEP2P/Uw6DVVkzN8QAJHLuR/Gq2QNKPSsF1w3GFKs +oG95fvqcTwpyoZjiXTUNKzLNRG8Q3GYaXENawYBIVA/b9dyLN8gauFrX+/rQrvSI +xL3wwimyZSCnPvU8OpO0bdmhD6elGd+Kw+CshPy7dNkt4zpFTuYclOlChygPOB5i +WH+85yQ8FM2kIzniJfvq5stAMrIgjoC03i/rogbf9Ys+K7Ll70NBNC7uKlTR5m8Q +OpZisj1mT4uu0pr5CLpk8pJLN4YhFLTXPvFHZQQjbb0wBsR2y2+Fj7aSvEgQp/eM +SA60pVcKRhM0GAvS9wDnEZA6M2HPeEzVWnVa+Vom9z0icIX8HOslRK0hVAKvMgqE +FkWYdHuH8K0MM6SoNRPNmw8eIpnshfIVrtOzD33vDggaVWPFfFP9d+ILb9k+1aff +zJJAyAYzIRy5mAQcOks/i1BOaC/lwrk6tkR1VAZGQkietFl1GUZO6YXiXTY8AbND +fm25iu+22a2nPdxd20IVvOYfR5xzy6axOoItEyCmtrLqXc32KOzl0kpp+C9Hiu/K +wEezkSDBC0BlDUFZKvZtMjJgBEjsAL7qiSuulMbpmmYeXqKmxjKHCb+gKawwKvez +ZVSiyhjiZdzfbNCg1I54gDSYlNUdTUQzdZors5WxR7BwpVZ776OwNSDWSdPa/NTZ +KxnPZLZnph2TI1QG2Krw +=kKom +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 Fri Jun 1 20:59:55 2018 @@ -0,0 +1,3 @@ +SparkR_2.3.1.tar.gz: 0633ABE5 66B8FC97 0B27B185 6674C267 1B1BE109 7F33116A + 259549CC 8B0B34D9 0DDB6039 86C3E626 44CE725B 63DB9613 + B43F192C 3C8F6285 B57A26FD 3F56DA23 Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc Fri Jun 1 20:59:55 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbEbFNAAoJEP2P/Uw6DVVk4RwQAJph3d2xbZDF9xWYoS9TXnq3 +uwvaMouzP4GJqBffwDydPXOq6BjDtA7631BSOoYiH49C6SLBjUGkhbpNMDe9eWwu +3+QjYDH+IU2yytO9D9Quj+iVdZdzGmcwFrUclZhMt41uMbDZMFfercXf5aqahG+v +57DBsVIdkbHpgZz317/GeHplyZIVV/GW49U/+u9NceCU+XUWtv6VOxXF5eBZb9+E +MbF03McAaw346FRZJfaaGiOgZrTD81LxMYQL/qPsP6J+suPbC8P6DeQGmUbAidQm +ER8getpuU0jCBrmitulh/hW+n6ygUhPc+0w4QO3W57xKcWi3by+Z2sSnZohQVhoI +kTQu+GkQ3wCx3uSkY+4Tglco8SmD1KztrVz6tOc6dj7SMeRpc4zioBVbywWznUMC +SM922xetmSwnZuLB+wpDmXOULqCOjro2S7qTjMkCbOL7jxOOtkS765IijvZtYqPt +uyN/wZ2kVXSvYCu9meba37YiukRrwfbCBwhUWP7ubYQugT71IPDn/KNnluXpR6sX +yBoKPS8iBHb+U1IWO15lAKS4Bd8h4/Bt04nRnDkF1Kh7wY2CwtRW4IAWg6X0sBnG +Xm4UeCV5Os2dDrFATNA0GenkBeCUzXRT6i3QlhVifogx7FQzBI4c0R4W5RuQ76W/ +EVqoelwlLEWK5VADeJUt +=W1F5 +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 Fri
spark git commit: [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster
Repository: spark Updated Branches: refs/heads/master 09e78c1ea -> 8ef167a5f [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster ## What changes were proposed in this pull request? Currently we only clean up the local directories on application removed. However, when executors die and restart repeatedly, many temp files are left untouched in the local directories, which is undesired behavior and could cause disk space used up gradually. We can detect executor death in the Worker, and clean up the non-shuffle files (files not ended with ".index" or ".data") in the local directories, we should not touch the shuffle files since they are expected to be used by the external shuffle service. Scope of this PR is limited to only implement the cleanup logic on a Standalone cluster, we defer to experts familiar with other cluster managers(YARN/Mesos/K8s) to determine whether it's worth to add similar support. ## How was this patch tested? Add new test suite to cover. Author: Xingbo Jiang Closes #21390 from jiangxb1987/cleanupNonshuffleFiles. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ef167a5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ef167a5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ef167a5 Branch: refs/heads/master Commit: 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060 Parents: 09e78c1 Author: Xingbo Jiang Authored: Fri Jun 1 13:46:05 2018 -0700 Committer: Xiao Li Committed: Fri Jun 1 13:46:05 2018 -0700 -- .../apache/spark/network/util/JavaUtils.java| 45 ++-- .../shuffle/ExternalShuffleBlockHandler.java| 7 + .../shuffle/ExternalShuffleBlockResolver.java | 43 .../shuffle/NonShuffleFilesCleanupSuite.java| 221 +++ .../network/shuffle/TestShuffleDataContext.java | 15 ++ .../spark/deploy/ExternalShuffleService.scala | 5 + .../org/apache/spark/deploy/worker/Worker.scala | 17 +- .../spark/deploy/worker/WorkerSuite.scala | 55 - docs/spark-standalone.md| 12 + 9 files changed, 400 insertions(+), 20 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8ef167a5/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java -- diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java index afc59ef..b549708 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java @@ -17,10 +17,7 @@ package org.apache.spark.network.util; -import java.io.Closeable; -import java.io.EOFException; -import java.io.File; -import java.io.IOException; +import java.io.*; import java.nio.ByteBuffer; import java.nio.channels.ReadableByteChannel; import java.nio.charset.StandardCharsets; @@ -91,11 +88,24 @@ public class JavaUtils { * @throws IOException if deletion is unsuccessful */ public static void deleteRecursively(File file) throws IOException { +deleteRecursively(file, null); + } + + /** + * Delete a file or directory and its contents recursively. + * Don't follow directories if they are symlinks. + * + * @param file Input file / dir to be deleted + * @param filter A filename filter that make sure only files / dirs with the satisfied filenames + * are deleted. + * @throws IOException if deletion is unsuccessful + */ + public static void deleteRecursively(File file, FilenameFilter filter) throws IOException { if (file == null) { return; } // On Unix systems, use operating system command to run faster // If that does not work out, fallback to the Java IO way -if (SystemUtils.IS_OS_UNIX) { +if (SystemUtils.IS_OS_UNIX && filter == null) { try { deleteRecursivelyUsingUnixNative(file); return; @@ -105,15 +115,17 @@ public class JavaUtils { } } -deleteRecursivelyUsingJavaIO(file); +deleteRecursivelyUsingJavaIO(file, filter); } - private static void deleteRecursivelyUsingJavaIO(File file) throws IOException { + private static void deleteRecursivelyUsingJavaIO( + File file, + FilenameFilter filter) throws IOException { if (file.isDirectory() && !isSymlink(file)) { IOException savedIOException = null; - for (File child : listFilesSafely(file)) { + for (File child : listFilesSafely(file, filter)) { try { - deleteRecursively(child); + deleteRecursively(child, filter); } catch (IOException e) { //
[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT
Preparing development version 2.3.2-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21800b87 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21800b87 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21800b87 Branch: refs/heads/branch-2.3 Commit: 21800b878605598988b54bcc4ef5b24a546ba9cc Parents: 30aaa5a Author: Marcelo Vanzin Authored: Fri Jun 1 13:34:24 2018 -0700 Committer: Marcelo Vanzin Committed: Fri Jun 1 13:34:24 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 632bcb3..8df2635 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.1 +Version: 2.3.2 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index d744c8b..02bf39b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 3a41e16..646fdfb 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index f02108f..76c7dcf 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 4430487..f2661fe 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -
[1/2] spark git commit: Preparing Spark release v2.3.1-rc4
Repository: spark Updated Branches: refs/heads/branch-2.3 e4e96f929 -> 21800b878 Preparing Spark release v2.3.1-rc4 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30aaa5a3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30aaa5a3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30aaa5a3 Branch: refs/heads/branch-2.3 Commit: 30aaa5a3a1076ca52439a905274b1fcf498bc562 Parents: e4e96f9 Author: Marcelo Vanzin Authored: Fri Jun 1 13:34:19 2018 -0700 Committer: Marcelo Vanzin Committed: Fri Jun 1 13:34:19 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 8df2635..632bcb3 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.2 +Version: 2.3.1 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 02bf39b..d744c8b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 646fdfb..3a41e16 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 76c7dcf..f02108f 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index f2661fe..4430487 100644 --- a/common/network-shuffle/pom.xml +++
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.3.1-rc4 [created] 30aaa5a3a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27216 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_12_01-09e78c1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Jun 1 19:16:14 2018 New Revision: 27216 Log: Apache Spark 2.4.0-SNAPSHOT-2018_06_01_12_01-09e78c1 docs [This commit notification would consist of 1466 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [INFRA] Close stale PRs.
Repository: spark Updated Branches: refs/heads/master d2c3de7ef -> 09e78c1ea [INFRA] Close stale PRs. Closes #21444 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/09e78c1e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/09e78c1e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/09e78c1e Branch: refs/heads/master Commit: 09e78c1eaa742b9cab4564928e5a5401fe0198a9 Parents: d2c3de7 Author: Marcelo Vanzin Authored: Fri Jun 1 11:55:34 2018 -0700 Committer: Marcelo Vanzin Committed: Fri Jun 1 11:55:34 2018 -0700 -- -- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27213 - /dev/spark/v2.3.1-rc3-bin/
Author: vanzin Date: Fri Jun 1 18:19:42 2018 New Revision: 27213 Log: Apache Spark v2.3.1-rc3 Added: dev/spark/v2.3.1-rc3-bin/ dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz (with props) dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512 dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz (with props) dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512 dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz (with props) dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz.asc dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512 dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz (with props) dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz.asc dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz.sha512 Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc Fri Jun 1 18:19:42 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbEYzmAAoJEP2P/Uw6DVVkf6AP/1TRDdrMUZfuz+vs4/3MP6iQ +yItx3bTGn0iQVuozjv6JDnTFVKY6i+JF6/QlGWcuSyPvICpo7NwKVmvejlvUSG9p +wjjOIU6q7wI/N975fnMhCQ3WX1ePYkt3ZYM0flHqLiUIJ/wbMKookMD/IYNeRTpC +hYAa7XVx1/YK8m1OMzqA8u3vT8gBzyIndJ8yc9teKqc9ultwLMJ7sMXNqjZoSxsV +iIarWlUGw0f8GHyrV30PPCqp91aOJMuHv60UqP32d2D6RUuLuUTIxXClQjB/CaKo ++fzQcl6OYQvOePxYlPYZEekWT1cjZfPykUVElyMbZNRT7uQcxlbcvJVzlmxAT8Gh +awAjPF5U5Cy3UycmyL1a/Cd6xNhaUltPrZGxZKlOsVUXLHZKSTV/pi+SfvZgQ1Au +8Die+7y0STGXFWlIeZjYyWc3i5bfgaW0KeO96IZzYlLO+IIYGL0doRoH1spTjw+U +j0mHlQ42Gk+PpVRAUEB42fnDMka6Fx4jbx1N8Qpy8NzG9RF2iCYKOjiYmHrGShPD +9383B0zs0qxiCPFYUTYgrOfXpD6pFTb106BTiVZXH4kdhaChruv42nBMklg1Qwjg ++nf1I26OEnEwSB7fr6/MEcyPNVQdqB/Qih0yyjTfCp6iEvT+Q37jVSp9uMmWHouu ++ucIUlfxqWG4O2+JoW7p +=E5sq +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 Fri Jun 1 18:19:42 2018 @@ -0,0 +1,3 @@ +SparkR_2.3.1.tar.gz: 939BFB6B 78FD2D8A 86D47A20 4F715846 443FAD2A AF48BB8D + 4DEF4817 23D2467E B278EC74 19E4868E BDC0E2F7 4B55F2C0 + 1ADC37CA 59CE2691 1EC5CECD 5964CCB5 Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc Fri Jun 1 18:19:42 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbEYwmAAoJEP2P/Uw6DVVkYXUP/0QJn+CYF1PzBPuKfYhvAxsd +AoklrmND65PwC7Tf6vmXAOzP0P7nYeRUg4kvARFMZ7H+LuWSKT4JcGnC9+HYRVV9 +j0A9Hcc+wZ1uSU6gZXG8DhmGN6/gDJ+uX3bYQ8+sp4Dz5umWgZxliXZW8u/GIkrz +9U+RStR2DJG69xnqdH290Ww2k2F2RYSMldGp1JlqHa/fPNXDYEXuhNQMXTsOHRob +u8Hn6WJFT95pRuEpAbi31oYF8zgQoc+HoxkhPTHlYZsnaUojGb90YoV6vYGxdf8B +S4eEsoPY9a8Wqi0iLf+R/5Hm8tZNd/ZkRIPSttijA/yIMlhBM4fdx0dfOlu5novZ +H7gexJ5jV4icsrKOwPQvjv0EY3ANACuwY1B6w+XyzQYitTeUeqeArtk3ygp/4m00 +dav6Mz9+a4DJmEhDGoA0B6zs3z08hwlmHisS3eDw+WgKQVu7o3gxOZWgTl6zWtIR +5f2viCMOyakoHdw9yiM2TQvNN14DUccQsXmZsmHGez/LTSdms4gehzS5uLiUSxQY +dO9vwH0bV2xilfjm3Cx0awG+Z0M27v+wD/EaLNCM4ExGwTDw6Yq0ZFhuD2AlAdWD +r1rP67gDTayFsMmLhlnbyE3wDfSQQlaT5ioL5sxkYPfr+iJ9D5iAYhP0nt9HVxG3 +f70J/OVKMEaINb41kIrt +=sA1k +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 Fri
[1/2] spark git commit: Preparing Spark release v2.3.1-rc3
Repository: spark Updated Branches: refs/heads/branch-2.3 e56266ad7 -> 2e0c3469d Preparing Spark release v2.3.1-rc3 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1cc5f68b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1cc5f68b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1cc5f68b Branch: refs/heads/branch-2.3 Commit: 1cc5f68be73a980111ce0443413356f2b7634bd1 Parents: e56266a Author: Marcelo Vanzin Authored: Fri Jun 1 10:56:26 2018 -0700 Committer: Marcelo Vanzin Committed: Fri Jun 1 10:56:26 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 8df2635..632bcb3 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.2 +Version: 2.3.1 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 02bf39b..d744c8b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 646fdfb..3a41e16 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 76c7dcf..f02108f 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index f2661fe..4430487 100644 --- a/common/network-shuffle/pom.xml +++
[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT
Preparing development version 2.3.2-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e0c3469 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e0c3469 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e0c3469 Branch: refs/heads/branch-2.3 Commit: 2e0c3469d0d0d7b351d50c937b9862bd80b946ba Parents: 1cc5f68 Author: Marcelo Vanzin Authored: Fri Jun 1 10:56:29 2018 -0700 Committer: Marcelo Vanzin Committed: Fri Jun 1 10:56:29 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 632bcb3..8df2635 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.1 +Version: 2.3.2 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index d744c8b..02bf39b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 3a41e16..646fdfb 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index f02108f..76c7dcf 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 4430487..f2661fe 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.3.1-rc3 [created] 1cc5f68be - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode
Repository: spark Updated Branches: refs/heads/master 98909c398 -> 6039b1323 [SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode ## What changes were proposed in this pull request? Compute the thresholdBatchId to purge metadata based on current committed epoch instead of currentBatchId in CP mode to avoid cleaning all the committed metadata in some case as described in the jira [SPARK-24351](https://issues.apache.org/jira/browse/SPARK-24351). ## How was this patch tested? Add new unit test. Author: Huang Tengfei Closes #21400 from ivoson/branch-cp-meta. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6039b132 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6039b132 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6039b132 Branch: refs/heads/master Commit: 6039b132304cc77ed39e4ca7813850507ae0b440 Parents: 98909c3 Author: Huang Tengfei Authored: Fri Jun 1 10:47:53 2018 -0700 Committer: Shixiong Zhu Committed: Fri Jun 1 10:47:53 2018 -0700 -- .../continuous/ContinuousExecution.scala| 11 +++-- .../streaming/continuous/ContinuousSuite.scala | 46 2 files changed, 54 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6039b132/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala index d16b24c..e3d0cea 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala @@ -318,9 +318,14 @@ class ContinuousExecution( } } -if (minLogEntriesToMaintain < currentBatchId) { - offsetLog.purge(currentBatchId - minLogEntriesToMaintain) - commitLog.purge(currentBatchId - minLogEntriesToMaintain) +// Since currentBatchId increases independently in cp mode, the current committed epoch may +// be far behind currentBatchId. It is not safe to discard the metadata with thresholdBatchId +// computed based on currentBatchId. As minLogEntriesToMaintain is used to keep the minimum +// number of batches that must be retained and made recoverable, so we should keep the +// specified number of metadata that have been committed. +if (minLogEntriesToMaintain <= epoch) { + offsetLog.purge(epoch + 1 - minLogEntriesToMaintain) + commitLog.purge(epoch + 1 - minLogEntriesToMaintain) } awaitProgressLock.lock() http://git-wip-us.apache.org/repos/asf/spark/blob/6039b132/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala index cd1704a..4980b0c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala @@ -297,3 +297,49 @@ class ContinuousStressSuite extends ContinuousSuiteBase { CheckAnswerRowsContains(scala.Range(0, 25000).map(Row(_ } } + +class ContinuousMetaSuite extends ContinuousSuiteBase { + import testImplicits._ + + // We need to specify spark.sql.streaming.minBatchesToRetain to do the following test. + override protected def createSparkSession = new TestSparkSession( +new SparkContext( + "local[10]", + "continuous-stream-test-sql-context", + sparkConf.set("spark.sql.testkey", "true") +.set("spark.sql.streaming.minBatchesToRetain", "2"))) + + test("SPARK-24351: check offsetLog/commitLog retained in the checkpoint directory") { +withTempDir { checkpointDir => + val input = ContinuousMemoryStream[Int] + val df = input.toDF().mapPartitions(iter => { +// Sleep the task thread for 300 ms to make sure epoch processing time 3 times +// longer than epoch creating interval. So the gap between last committed +// epoch and currentBatchId grows over time. +Thread.sleep(300) +iter.map(row => row.getInt(0) * 2) + }) + + testStream(df)( +StartStream(trigger = Trigger.Continuous(100), + checkpointLocation =
svn commit: r27210 - in /dev/spark/2.3.2-SNAPSHOT-2018_06_01_02_01-e56266a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Jun 1 09:16:28 2018 New Revision: 27210 Log: Apache Spark 2.3.2-SNAPSHOT-2018_06_01_02_01-e56266a docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27209 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_00_01-98909c3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Fri Jun 1 07:16:46 2018 New Revision: 27209 Log: Apache Spark 2.4.0-SNAPSHOT-2018_06_01_00_01-98909c3 docs [This commit notification would consist of 1466 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain column assignment
Repository: spark Updated Branches: refs/heads/branch-2.3 b37e76fa4 -> e56266ad7 [SPARK-2][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain column assignment ## What changes were proposed in this pull request? Added sections to pandas_udf docs, in the grouped map section, to indicate columns are assigned by position. Backported to branch-2.3. ## How was this patch tested? NA Author: Bryan Cutler Closes #21478 from BryanCutler/arrow-doc-pandas_udf-column_by_pos-2_3_1-SPARK-21427. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e56266ad Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e56266ad Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e56266ad Branch: refs/heads/branch-2.3 Commit: e56266ad719488d3887fb7ea0985b3760b3ece12 Parents: b37e76f Author: Bryan Cutler Authored: Fri Jun 1 14:27:10 2018 +0800 Committer: hyukjinkwon Committed: Fri Jun 1 14:27:10 2018 +0800 -- docs/sql-programming-guide.md | 9 + python/pyspark/sql/functions.py | 9 - 2 files changed, 17 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e56266ad/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 14bc5e6..461806a 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1737,6 +1737,15 @@ To use `groupBy().apply()`, the user needs to define the following: * A Python function that defines the computation for each group. * A `StructType` object or a string that defines the schema of the output `DataFrame`. +The output schema will be applied to the columns of the returned `pandas.DataFrame` in order by position, +not by name. This means that the columns in the `pandas.DataFrame` must be indexed so that their +position matches the corresponding field in the schema. + +Note that when creating a new `pandas.DataFrame` using a dictionary, the actual position of the column +can differ from the order that it was placed in the dictionary. It is recommended in this case to +explicitly define the column order using the `columns` keyword, e.g. +`pandas.DataFrame({'id': ids, 'a': data}, columns=['id', 'a'])`, or alternatively use an `OrderedDict`. + Note that all data for a group will be loaded into memory before the function is applied. This can lead to out of memory exceptons, especially if the group sizes are skewed. The configuration for [maxRecordsPerBatch](#setting-arrow-batch-size) is not applied on groups and it is up to the user http://git-wip-us.apache.org/repos/asf/spark/blob/e56266ad/python/pyspark/sql/functions.py -- diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index cf26523..9c02982 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -2216,7 +2216,8 @@ def pandas_udf(f=None, returnType=None, functionType=None): A grouped map UDF defines transformation: A `pandas.DataFrame` -> A `pandas.DataFrame` The returnType should be a :class:`StructType` describing the schema of the returned `pandas.DataFrame`. - The length of the returned `pandas.DataFrame` can be arbitrary. + The length of the returned `pandas.DataFrame` can be arbitrary and the columns must be + indexed so that their position matches the corresponding field in the schema. Grouped map UDFs are used with :meth:`pyspark.sql.GroupedData.apply`. @@ -2239,6 +2240,12 @@ def pandas_udf(f=None, returnType=None, functionType=None): | 2| 1.1094003924504583| +---+---+ + .. note:: If returning a new `pandas.DataFrame` constructed with a dictionary, it is + recommended to explicitly index the columns by name to ensure the positions are correct, + or alternatively use an `OrderedDict`. + For example, `pd.DataFrame({'id': ids, 'a': data}, columns=['id', 'a'])` or + `pd.DataFrame(OrderedDict([('id', ids), ('a', data)]))`. + .. seealso:: :meth:`pyspark.sql.GroupedData.apply` .. note:: The user-defined functions are considered deterministic by default. Due to - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org