svn commit: r27222 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_16_01-8ef167a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-06-01 Thread pwendell
Author: pwendell
Date: Fri Jun  1 23:16:03 2018
New Revision: 27222

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_06_01_16_01-8ef167a docs


[This commit notification would consist of 1466 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27221 - in /dev/spark/2.3.2-SNAPSHOT-2018_06_01_14_01-21800b8-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-06-01 Thread pwendell
Author: pwendell
Date: Fri Jun  1 21:16:28 2018
New Revision: 27221

Log:
Apache Spark 2.3.2-SNAPSHOT-2018_06_01_14_01-21800b8 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27220 - in /dev/spark/v2.3.1-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2018-06-01 Thread vanzin
Author: vanzin
Date: Fri Jun  1 21:12:21 2018
New Revision: 27220

Log:
Apache Spark v2.3.1-rc4 docs


[This commit notification would consist of 1446 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27219 - /dev/spark/v2.3.1-rc4-bin/

2018-06-01 Thread vanzin
Author: vanzin
Date: Fri Jun  1 20:59:55 2018
New Revision: 27219

Log:
Apache Spark v2.3.1-rc4

Added:
dev/spark/v2.3.1-rc4-bin/
dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz.asc
dev/spark/v2.3.1-rc4-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz   (with props)
dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz.asc
dev/spark/v2.3.1-rc4-bin/spark-2.3.1.tgz.sha512

Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.asc Fri Jun  1 20:59:55 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbEbIGAAoJEP2P/Uw6DVVkzN8QAJHLuR/Gq2QNKPSsF1w3GFKs
+oG95fvqcTwpyoZjiXTUNKzLNRG8Q3GYaXENawYBIVA/b9dyLN8gauFrX+/rQrvSI
+xL3wwimyZSCnPvU8OpO0bdmhD6elGd+Kw+CshPy7dNkt4zpFTuYclOlChygPOB5i
+WH+85yQ8FM2kIzniJfvq5stAMrIgjoC03i/rogbf9Ys+K7Ll70NBNC7uKlTR5m8Q
+OpZisj1mT4uu0pr5CLpk8pJLN4YhFLTXPvFHZQQjbb0wBsR2y2+Fj7aSvEgQp/eM
+SA60pVcKRhM0GAvS9wDnEZA6M2HPeEzVWnVa+Vom9z0icIX8HOslRK0hVAKvMgqE
+FkWYdHuH8K0MM6SoNRPNmw8eIpnshfIVrtOzD33vDggaVWPFfFP9d+ILb9k+1aff
+zJJAyAYzIRy5mAQcOks/i1BOaC/lwrk6tkR1VAZGQkietFl1GUZO6YXiXTY8AbND
+fm25iu+22a2nPdxd20IVvOYfR5xzy6axOoItEyCmtrLqXc32KOzl0kpp+C9Hiu/K
+wEezkSDBC0BlDUFZKvZtMjJgBEjsAL7qiSuulMbpmmYeXqKmxjKHCb+gKawwKvez
+ZVSiyhjiZdzfbNCg1I54gDSYlNUdTUQzdZors5WxR7BwpVZ776OwNSDWSdPa/NTZ
+KxnPZLZnph2TI1QG2Krw
+=kKom
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc4-bin/SparkR_2.3.1.tar.gz.sha512 Fri Jun  1 20:59:55 2018
@@ -0,0 +1,3 @@
+SparkR_2.3.1.tar.gz: 0633ABE5 66B8FC97 0B27B185 6674C267 1B1BE109 7F33116A
+ 259549CC 8B0B34D9 0DDB6039 86C3E626 44CE725B 63DB9613
+ B43F192C 3C8F6285 B57A26FD 3F56DA23

Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.asc Fri Jun  1 20:59:55 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbEbFNAAoJEP2P/Uw6DVVk4RwQAJph3d2xbZDF9xWYoS9TXnq3
+uwvaMouzP4GJqBffwDydPXOq6BjDtA7631BSOoYiH49C6SLBjUGkhbpNMDe9eWwu
+3+QjYDH+IU2yytO9D9Quj+iVdZdzGmcwFrUclZhMt41uMbDZMFfercXf5aqahG+v
+57DBsVIdkbHpgZz317/GeHplyZIVV/GW49U/+u9NceCU+XUWtv6VOxXF5eBZb9+E
+MbF03McAaw346FRZJfaaGiOgZrTD81LxMYQL/qPsP6J+suPbC8P6DeQGmUbAidQm
+ER8getpuU0jCBrmitulh/hW+n6ygUhPc+0w4QO3W57xKcWi3by+Z2sSnZohQVhoI
+kTQu+GkQ3wCx3uSkY+4Tglco8SmD1KztrVz6tOc6dj7SMeRpc4zioBVbywWznUMC
+SM922xetmSwnZuLB+wpDmXOULqCOjro2S7qTjMkCbOL7jxOOtkS765IijvZtYqPt
+uyN/wZ2kVXSvYCu9meba37YiukRrwfbCBwhUWP7ubYQugT71IPDn/KNnluXpR6sX
+yBoKPS8iBHb+U1IWO15lAKS4Bd8h4/Bt04nRnDkF1Kh7wY2CwtRW4IAWg6X0sBnG
+Xm4UeCV5Os2dDrFATNA0GenkBeCUzXRT6i3QlhVifogx7FQzBI4c0R4W5RuQ76W/
+EVqoelwlLEWK5VADeJUt
+=W1F5
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc4-bin/pyspark-2.3.1.tar.gz.sha512 Fri 

spark git commit: [SPARK-24340][CORE] Clean up non-shuffle disk block manager files following executor exits on a Standalone cluster

2018-06-01 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master 09e78c1ea -> 8ef167a5f


[SPARK-24340][CORE] Clean up non-shuffle disk block manager files following 
executor exits on a Standalone cluster

## What changes were proposed in this pull request?

Currently we only clean up the local directories on application removed. 
However, when executors die and restart repeatedly, many temp files are left 
untouched in the local directories, which is undesired behavior and could cause 
disk space used up gradually.

We can detect executor death in the Worker, and clean up the non-shuffle files 
(files not ended with ".index" or ".data") in the local directories, we should 
not touch the shuffle files since they are expected to be used by the external 
shuffle service.

Scope of this PR is limited to only implement the cleanup logic on a Standalone 
cluster, we defer to experts familiar with other cluster 
managers(YARN/Mesos/K8s) to determine whether it's worth to add similar support.

## How was this patch tested?

Add new test suite to cover.

Author: Xingbo Jiang 

Closes #21390 from jiangxb1987/cleanupNonshuffleFiles.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ef167a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ef167a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ef167a5

Branch: refs/heads/master
Commit: 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060
Parents: 09e78c1
Author: Xingbo Jiang 
Authored: Fri Jun 1 13:46:05 2018 -0700
Committer: Xiao Li 
Committed: Fri Jun 1 13:46:05 2018 -0700

--
 .../apache/spark/network/util/JavaUtils.java|  45 ++--
 .../shuffle/ExternalShuffleBlockHandler.java|   7 +
 .../shuffle/ExternalShuffleBlockResolver.java   |  43 
 .../shuffle/NonShuffleFilesCleanupSuite.java| 221 +++
 .../network/shuffle/TestShuffleDataContext.java |  15 ++
 .../spark/deploy/ExternalShuffleService.scala   |   5 +
 .../org/apache/spark/deploy/worker/Worker.scala |  17 +-
 .../spark/deploy/worker/WorkerSuite.scala   |  55 -
 docs/spark-standalone.md|  12 +
 9 files changed, 400 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8ef167a5/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
--
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
index afc59ef..b549708 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
@@ -17,10 +17,7 @@
 
 package org.apache.spark.network.util;
 
-import java.io.Closeable;
-import java.io.EOFException;
-import java.io.File;
-import java.io.IOException;
+import java.io.*;
 import java.nio.ByteBuffer;
 import java.nio.channels.ReadableByteChannel;
 import java.nio.charset.StandardCharsets;
@@ -91,11 +88,24 @@ public class JavaUtils {
* @throws IOException if deletion is unsuccessful
*/
   public static void deleteRecursively(File file) throws IOException {
+deleteRecursively(file, null);
+  }
+
+  /**
+   * Delete a file or directory and its contents recursively.
+   * Don't follow directories if they are symlinks.
+   *
+   * @param file Input file / dir to be deleted
+   * @param filter A filename filter that make sure only files / dirs with the 
satisfied filenames
+   *   are deleted.
+   * @throws IOException if deletion is unsuccessful
+   */
+  public static void deleteRecursively(File file, FilenameFilter filter) 
throws IOException {
 if (file == null) { return; }
 
 // On Unix systems, use operating system command to run faster
 // If that does not work out, fallback to the Java IO way
-if (SystemUtils.IS_OS_UNIX) {
+if (SystemUtils.IS_OS_UNIX && filter == null) {
   try {
 deleteRecursivelyUsingUnixNative(file);
 return;
@@ -105,15 +115,17 @@ public class JavaUtils {
   }
 }
 
-deleteRecursivelyUsingJavaIO(file);
+deleteRecursivelyUsingJavaIO(file, filter);
   }
 
-  private static void deleteRecursivelyUsingJavaIO(File file) throws 
IOException {
+  private static void deleteRecursivelyUsingJavaIO(
+  File file,
+  FilenameFilter filter) throws IOException {
 if (file.isDirectory() && !isSymlink(file)) {
   IOException savedIOException = null;
-  for (File child : listFilesSafely(file)) {
+  for (File child : listFilesSafely(file, filter)) {
 try {
-  deleteRecursively(child);
+  deleteRecursively(child, filter);
 } catch (IOException e) {
   // 

[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT

2018-06-01 Thread vanzin
Preparing development version 2.3.2-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21800b87
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21800b87
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21800b87

Branch: refs/heads/branch-2.3
Commit: 21800b878605598988b54bcc4ef5b24a546ba9cc
Parents: 30aaa5a
Author: Marcelo Vanzin 
Authored: Fri Jun 1 13:34:24 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Jun 1 13:34:24 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 632bcb3..8df2635 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.1
+Version: 2.3.2
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d744c8b..02bf39b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 3a41e16..646fdfb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index f02108f..76c7dcf 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/21800b87/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 4430487..f2661fe 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-   

[1/2] spark git commit: Preparing Spark release v2.3.1-rc4

2018-06-01 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 e4e96f929 -> 21800b878


Preparing Spark release v2.3.1-rc4


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30aaa5a3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30aaa5a3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30aaa5a3

Branch: refs/heads/branch-2.3
Commit: 30aaa5a3a1076ca52439a905274b1fcf498bc562
Parents: e4e96f9
Author: Marcelo Vanzin 
Authored: Fri Jun 1 13:34:19 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Jun 1 13:34:19 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 8df2635..632bcb3 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.2
+Version: 2.3.1
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 02bf39b..d744c8b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 646fdfb..3a41e16 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 76c7dcf..f02108f 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/30aaa5a3/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f2661fe..4430487 100644
--- a/common/network-shuffle/pom.xml
+++ 

[spark] Git Push Summary

2018-06-01 Thread vanzin
Repository: spark
Updated Tags:  refs/tags/v2.3.1-rc4 [created] 30aaa5a3a

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27216 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_12_01-09e78c1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-06-01 Thread pwendell
Author: pwendell
Date: Fri Jun  1 19:16:14 2018
New Revision: 27216

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_06_01_12_01-09e78c1 docs


[This commit notification would consist of 1466 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [INFRA] Close stale PRs.

2018-06-01 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/master d2c3de7ef -> 09e78c1ea


[INFRA] Close stale PRs.

Closes #21444


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/09e78c1e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/09e78c1e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/09e78c1e

Branch: refs/heads/master
Commit: 09e78c1eaa742b9cab4564928e5a5401fe0198a9
Parents: d2c3de7
Author: Marcelo Vanzin 
Authored: Fri Jun 1 11:55:34 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Jun 1 11:55:34 2018 -0700

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27213 - /dev/spark/v2.3.1-rc3-bin/

2018-06-01 Thread vanzin
Author: vanzin
Date: Fri Jun  1 18:19:42 2018
New Revision: 27213

Log:
Apache Spark v2.3.1-rc3

Added:
dev/spark/v2.3.1-rc3-bin/
dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz.asc
dev/spark/v2.3.1-rc3-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz   (with props)
dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz.asc
dev/spark/v2.3.1-rc3-bin/spark-2.3.1.tgz.sha512

Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.asc Fri Jun  1 18:19:42 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbEYzmAAoJEP2P/Uw6DVVkf6AP/1TRDdrMUZfuz+vs4/3MP6iQ
+yItx3bTGn0iQVuozjv6JDnTFVKY6i+JF6/QlGWcuSyPvICpo7NwKVmvejlvUSG9p
+wjjOIU6q7wI/N975fnMhCQ3WX1ePYkt3ZYM0flHqLiUIJ/wbMKookMD/IYNeRTpC
+hYAa7XVx1/YK8m1OMzqA8u3vT8gBzyIndJ8yc9teKqc9ultwLMJ7sMXNqjZoSxsV
+iIarWlUGw0f8GHyrV30PPCqp91aOJMuHv60UqP32d2D6RUuLuUTIxXClQjB/CaKo
++fzQcl6OYQvOePxYlPYZEekWT1cjZfPykUVElyMbZNRT7uQcxlbcvJVzlmxAT8Gh
+awAjPF5U5Cy3UycmyL1a/Cd6xNhaUltPrZGxZKlOsVUXLHZKSTV/pi+SfvZgQ1Au
+8Die+7y0STGXFWlIeZjYyWc3i5bfgaW0KeO96IZzYlLO+IIYGL0doRoH1spTjw+U
+j0mHlQ42Gk+PpVRAUEB42fnDMka6Fx4jbx1N8Qpy8NzG9RF2iCYKOjiYmHrGShPD
+9383B0zs0qxiCPFYUTYgrOfXpD6pFTb106BTiVZXH4kdhaChruv42nBMklg1Qwjg
++nf1I26OEnEwSB7fr6/MEcyPNVQdqB/Qih0yyjTfCp6iEvT+Q37jVSp9uMmWHouu
++ucIUlfxqWG4O2+JoW7p
+=E5sq
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc3-bin/SparkR_2.3.1.tar.gz.sha512 Fri Jun  1 18:19:42 2018
@@ -0,0 +1,3 @@
+SparkR_2.3.1.tar.gz: 939BFB6B 78FD2D8A 86D47A20 4F715846 443FAD2A AF48BB8D
+ 4DEF4817 23D2467E B278EC74 19E4868E BDC0E2F7 4B55F2C0
+ 1ADC37CA 59CE2691 1EC5CECD 5964CCB5

Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.asc Fri Jun  1 18:19:42 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbEYwmAAoJEP2P/Uw6DVVkYXUP/0QJn+CYF1PzBPuKfYhvAxsd
+AoklrmND65PwC7Tf6vmXAOzP0P7nYeRUg4kvARFMZ7H+LuWSKT4JcGnC9+HYRVV9
+j0A9Hcc+wZ1uSU6gZXG8DhmGN6/gDJ+uX3bYQ8+sp4Dz5umWgZxliXZW8u/GIkrz
+9U+RStR2DJG69xnqdH290Ww2k2F2RYSMldGp1JlqHa/fPNXDYEXuhNQMXTsOHRob
+u8Hn6WJFT95pRuEpAbi31oYF8zgQoc+HoxkhPTHlYZsnaUojGb90YoV6vYGxdf8B
+S4eEsoPY9a8Wqi0iLf+R/5Hm8tZNd/ZkRIPSttijA/yIMlhBM4fdx0dfOlu5novZ
+H7gexJ5jV4icsrKOwPQvjv0EY3ANACuwY1B6w+XyzQYitTeUeqeArtk3ygp/4m00
+dav6Mz9+a4DJmEhDGoA0B6zs3z08hwlmHisS3eDw+WgKQVu7o3gxOZWgTl6zWtIR
+5f2viCMOyakoHdw9yiM2TQvNN14DUccQsXmZsmHGez/LTSdms4gehzS5uLiUSxQY
+dO9vwH0bV2xilfjm3Cx0awG+Z0M27v+wD/EaLNCM4ExGwTDw6Yq0ZFhuD2AlAdWD
+r1rP67gDTayFsMmLhlnbyE3wDfSQQlaT5ioL5sxkYPfr+iJ9D5iAYhP0nt9HVxG3
+f70J/OVKMEaINb41kIrt
+=sA1k
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc3-bin/pyspark-2.3.1.tar.gz.sha512 Fri 

[1/2] spark git commit: Preparing Spark release v2.3.1-rc3

2018-06-01 Thread vanzin
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 e56266ad7 -> 2e0c3469d


Preparing Spark release v2.3.1-rc3


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1cc5f68b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1cc5f68b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1cc5f68b

Branch: refs/heads/branch-2.3
Commit: 1cc5f68be73a980111ce0443413356f2b7634bd1
Parents: e56266a
Author: Marcelo Vanzin 
Authored: Fri Jun 1 10:56:26 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Jun 1 10:56:26 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 8df2635..632bcb3 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.2
+Version: 2.3.1
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 02bf39b..d744c8b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 646fdfb..3a41e16 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 76c7dcf..f02108f 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1cc5f68b/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f2661fe..4430487 100644
--- a/common/network-shuffle/pom.xml
+++ 

[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT

2018-06-01 Thread vanzin
Preparing development version 2.3.2-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e0c3469
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e0c3469
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e0c3469

Branch: refs/heads/branch-2.3
Commit: 2e0c3469d0d0d7b351d50c937b9862bd80b946ba
Parents: 1cc5f68
Author: Marcelo Vanzin 
Authored: Fri Jun 1 10:56:29 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Jun 1 10:56:29 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 632bcb3..8df2635 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.1
+Version: 2.3.2
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d744c8b..02bf39b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 3a41e16..646fdfb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index f02108f..76c7dcf 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2e0c3469/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 4430487..f2661fe 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-   

[spark] Git Push Summary

2018-06-01 Thread vanzin
Repository: spark
Updated Tags:  refs/tags/v2.3.1-rc3 [created] 1cc5f68be

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode

2018-06-01 Thread zsxwing
Repository: spark
Updated Branches:
  refs/heads/master 98909c398 -> 6039b1323


[SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed 
with current committed epoch but not currentBatchId in CP mode

## What changes were proposed in this pull request?
Compute the thresholdBatchId to purge metadata based on current committed epoch 
instead of currentBatchId in CP mode to avoid cleaning all the committed 
metadata in some case as described in the jira 
[SPARK-24351](https://issues.apache.org/jira/browse/SPARK-24351).

## How was this patch tested?
Add new unit test.

Author: Huang Tengfei 

Closes #21400 from ivoson/branch-cp-meta.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6039b132
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6039b132
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6039b132

Branch: refs/heads/master
Commit: 6039b132304cc77ed39e4ca7813850507ae0b440
Parents: 98909c3
Author: Huang Tengfei 
Authored: Fri Jun 1 10:47:53 2018 -0700
Committer: Shixiong Zhu 
Committed: Fri Jun 1 10:47:53 2018 -0700

--
 .../continuous/ContinuousExecution.scala| 11 +++--
 .../streaming/continuous/ContinuousSuite.scala  | 46 
 2 files changed, 54 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6039b132/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
index d16b24c..e3d0cea 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
@@ -318,9 +318,14 @@ class ContinuousExecution(
   }
 }
 
-if (minLogEntriesToMaintain < currentBatchId) {
-  offsetLog.purge(currentBatchId - minLogEntriesToMaintain)
-  commitLog.purge(currentBatchId - minLogEntriesToMaintain)
+// Since currentBatchId increases independently in cp mode, the current 
committed epoch may
+// be far behind currentBatchId. It is not safe to discard the metadata 
with thresholdBatchId
+// computed based on currentBatchId. As minLogEntriesToMaintain is used to 
keep the minimum
+// number of batches that must be retained and made recoverable, so we 
should keep the
+// specified number of metadata that have been committed.
+if (minLogEntriesToMaintain <= epoch) {
+  offsetLog.purge(epoch + 1 - minLogEntriesToMaintain)
+  commitLog.purge(epoch + 1 - minLogEntriesToMaintain)
 }
 
 awaitProgressLock.lock()

http://git-wip-us.apache.org/repos/asf/spark/blob/6039b132/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
index cd1704a..4980b0c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
@@ -297,3 +297,49 @@ class ContinuousStressSuite extends ContinuousSuiteBase {
   CheckAnswerRowsContains(scala.Range(0, 25000).map(Row(_
   }
 }
+
+class ContinuousMetaSuite extends ContinuousSuiteBase {
+  import testImplicits._
+
+  // We need to specify spark.sql.streaming.minBatchesToRetain to do the 
following test.
+  override protected def createSparkSession = new TestSparkSession(
+new SparkContext(
+  "local[10]",
+  "continuous-stream-test-sql-context",
+  sparkConf.set("spark.sql.testkey", "true")
+.set("spark.sql.streaming.minBatchesToRetain", "2")))
+
+  test("SPARK-24351: check offsetLog/commitLog retained in the checkpoint 
directory") {
+withTempDir { checkpointDir =>
+  val input = ContinuousMemoryStream[Int]
+  val df = input.toDF().mapPartitions(iter => {
+// Sleep the task thread for 300 ms to make sure epoch processing time 
3 times
+// longer than epoch creating interval. So the gap between last 
committed
+// epoch and currentBatchId grows over time.
+Thread.sleep(300)
+iter.map(row => row.getInt(0) * 2)
+  })
+
+  testStream(df)(
+StartStream(trigger = Trigger.Continuous(100),
+  checkpointLocation = 

svn commit: r27210 - in /dev/spark/2.3.2-SNAPSHOT-2018_06_01_02_01-e56266a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-06-01 Thread pwendell
Author: pwendell
Date: Fri Jun  1 09:16:28 2018
New Revision: 27210

Log:
Apache Spark 2.3.2-SNAPSHOT-2018_06_01_02_01-e56266a docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r27209 - in /dev/spark/2.4.0-SNAPSHOT-2018_06_01_00_01-98909c3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-06-01 Thread pwendell
Author: pwendell
Date: Fri Jun  1 07:16:46 2018
New Revision: 27209

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_06_01_00_01-98909c3 docs


[This commit notification would consist of 1466 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain column assignment

2018-06-01 Thread gurwls223
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 b37e76fa4 -> e56266ad7


[SPARK-2][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain 
column assignment

## What changes were proposed in this pull request?
Added sections to pandas_udf docs, in the grouped map section, to indicate 
columns are assigned by position. Backported to branch-2.3.

## How was this patch tested?
NA

Author: Bryan Cutler 

Closes #21478 from 
BryanCutler/arrow-doc-pandas_udf-column_by_pos-2_3_1-SPARK-21427.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e56266ad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e56266ad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e56266ad

Branch: refs/heads/branch-2.3
Commit: e56266ad719488d3887fb7ea0985b3760b3ece12
Parents: b37e76f
Author: Bryan Cutler 
Authored: Fri Jun 1 14:27:10 2018 +0800
Committer: hyukjinkwon 
Committed: Fri Jun 1 14:27:10 2018 +0800

--
 docs/sql-programming-guide.md   | 9 +
 python/pyspark/sql/functions.py | 9 -
 2 files changed, 17 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e56266ad/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 14bc5e6..461806a 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1737,6 +1737,15 @@ To use `groupBy().apply()`, the user needs to define the 
following:
 * A Python function that defines the computation for each group.
 * A `StructType` object or a string that defines the schema of the output 
`DataFrame`.
 
+The output schema will be applied to the columns of the returned 
`pandas.DataFrame` in order by position,
+not by name. This means that the columns in the `pandas.DataFrame` must be 
indexed so that their
+position matches the corresponding field in the schema.
+
+Note that when creating a new `pandas.DataFrame` using a dictionary, the 
actual position of the column
+can differ from the order that it was placed in the dictionary. It is 
recommended in this case to
+explicitly define the column order using the `columns` keyword, e.g.
+`pandas.DataFrame({'id': ids, 'a': data}, columns=['id', 'a'])`, or 
alternatively use an `OrderedDict`.
+
 Note that all data for a group will be loaded into memory before the function 
is applied. This can
 lead to out of memory exceptons, especially if the group sizes are skewed. The 
configuration for
 [maxRecordsPerBatch](#setting-arrow-batch-size) is not applied on groups and 
it is up to the user

http://git-wip-us.apache.org/repos/asf/spark/blob/e56266ad/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index cf26523..9c02982 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2216,7 +2216,8 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
A grouped map UDF defines transformation: A `pandas.DataFrame` -> A 
`pandas.DataFrame`
The returnType should be a :class:`StructType` describing the schema of 
the returned
`pandas.DataFrame`.
-   The length of the returned `pandas.DataFrame` can be arbitrary.
+   The length of the returned `pandas.DataFrame` can be arbitrary and the 
columns must be
+   indexed so that their position matches the corresponding field in the 
schema.
 
Grouped map UDFs are used with :meth:`pyspark.sql.GroupedData.apply`.
 
@@ -2239,6 +2240,12 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
|  2| 1.1094003924504583|
+---+---+
 
+   .. note:: If returning a new `pandas.DataFrame` constructed with a 
dictionary, it is
+   recommended to explicitly index the columns by name to ensure the 
positions are correct,
+   or alternatively use an `OrderedDict`.
+   For example, `pd.DataFrame({'id': ids, 'a': data}, columns=['id', 
'a'])` or
+   `pd.DataFrame(OrderedDict([('id', ids), ('a', data)]))`.
+
.. seealso:: :meth:`pyspark.sql.GroupedData.apply`
 
 .. note:: The user-defined functions are considered deterministic by 
default. Due to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org