svn commit: r31810 - in /dev/spark/v2.2.3-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2019-01-07 Thread dongjoon
Author: dongjoon
Date: Tue Jan  8 06:42:46 2019
New Revision: 31810

Log:
Apache Spark 2.2.3-rc1 docs


[This commit notification would consist of 1347 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31809 - /dev/spark/v2.2.3-rc1-bin/

2019-01-07 Thread dongjoon
Author: dongjoon
Date: Tue Jan  8 06:03:23 2019
New Revision: 31809

Log:
Apache Spark 2.2.3-rc1

Added:
dev/spark/v2.2.3-rc1-bin/
dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz   (with props)
dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc
dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512
dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz   (with props)
dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc
dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz.asc
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz.sha
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz.asc
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz.sha512
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz   (with props)
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz.asc
dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz.sha512
dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz   (with props)
dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz.asc
dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz.sha512

Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc
==
--- dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc (added)
+++ dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc Tue Jan  8 06:03:23 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJcNDHmAAoJEO2gDOg08PxcnegQAKrCZ/otSLCrOT5x1p1WwGwj
+FVhiMDTSBsRogKhauXb8VVBVSEP/TCm5SL7ZNNkdvvCeAg9T0wv70b0r1dAmXiDy
+IPwGNxwrEF5tglx9VfPULQ/dWN4LrEUtqIMBXN9xlqPfavHqQ+FaPtBvAZbMYq9W
+Z72xgDDBqbsPI6KkVFYbNxA1fRJtFZ9fjinfeLL9EYcUmqAU6mNy/9ex4nwgwXhG
+Lzcclb0eXyjY9ze+LdoH+04o0/W8oB6Yr3TSuqva8ZpX3axBw4RviBxbssJ4k2Iu
+YqD6rNmPHLzrUdStHWHl8dVVAGpB7uVeGiJfVOVaRMFzy+zAMg6tqCwR1/NflO/B
+/2Rokewh8DotYQIm9skyrtsRIdQB3h7D5COENhTueQo3TtPnnkSyPG4S/7ixZsQF
+HdUqWwd501RE6NEnv+EDYUDDiiGf0+UdOPwvzZfKQO/LrxUzzUmpdhdVO3wZ9/5y
++RlgReoRznTFOFXwDxvExqyBBROciw8aeNfTr4fXeuK28Cf7j52haU9XmeqEHcLa
+Z8UaMEkFPiyTwooGT2PL79a4XXb2VdJK13HV1bLfFTYL7KygwgFv+pc8Wfi65DSD
+UTok6nt2xQ8cPCN7n7rCJfavD/IU9wyxAksbOBZB1Ut06bK4M35+W8VHRVVxSZx5
+/fPF8tDeFVlmkiDUSaRQ
+=tX+a
+-END PGP SIGNATURE-

Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512
==
--- dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 (added)
+++ dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 Tue Jan  8 06:03:23 2019
@@ -0,0 +1,3 @@
+SparkR_2.2.3.tar.gz: C216AF56 0918FBEA BA5FFFB5 72F6FC60 2BF9E8E3 D5C77122
+ 1764D2A8 FA6AD322 D5AB1E4E A6B5CD8A DBD8F4A1 66EF023A
+ B58A7E9A 80F2357C D3859FA5 50F22465

Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc
==
--- dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc (added)
+++ dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc Tue Jan  8 06:03:23 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAlwzylwUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FwCNg/9Hx/hE9TMnTXzVMfWjuoIx4glWxJK
+sjnUS4SVffbPgC1P4cKHUuoPFoKDsU7/JCPXRY88SezoMedYbrjSxSKDZPE9VER3
+C9R+F/bNjifxGu+/2+3PXrzMvS1Y+3M4GFHU5p/iAe01un9JN00FXsi1zXJt58Qp
+eBpBR4UhoqapWLM1jqqkz5vEHCSwOHf7gFmLk+Vd1jRX1+siWiGfpmgd1qzzS9kb
+SKXIjUgQMzVb9vu/ja7j/vjEs6CjuKvO4Eruz3qJ1A2RnWfcAd1x/Z2dOENuoccs
+TBR4pVr4C0/ZO8RbaKgr1pIPpMQB5eEkzbJi2cWBQhearfTyCpgIVJFqIrpY8xer
+d5wkZziRdo4JG/cAd2GgNb3MiWi5ghHFTPEKJyMi9ZpqrS/R+CJkasyMEa8VPek7
+iQB6+GklFpr9KNCCKU3TQK4vkawuUJUIFdLBlxkO4yWRI3AcUS66YyaiSJQkar5t
+zWpAZI6NDIKIVv1ktS+QgkZl14jf8cVKSC1L4rz96Xhdecv8QId0wAEwSkuUSPaZ
+5iCHdwNgIm0N1sDGAG+I9I81GM0jiWNwxlexasMOy0YTaEG4jw7xjyZalWXPW8wo
+stNfdcDf6CfDxZr13r5nCuQEshODNV81IsaJFlVOg0cGPLg5HhyY+ZasdKOE6jfG
+U54wuQLSPOlWY8c=
+=TZPm
+-END PGP SIGNATURE-

Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512
==
--- dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512 (added)
+++ 

svn commit: r31808 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_21_31-3ece0aa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Tue Jan  8 05:45:36 2019
New Revision: 31808

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_07_21_31-3ece0aa docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31806 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_19_24-29a7d2d-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Tue Jan  8 03:36:53 2019
New Revision: 31806

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_07_19_24-29a7d2d docs


[This commit notification would consist of 1774 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation

2019-01-07 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 29a7d2d  [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation
29a7d2d is described below

commit 29a7d2da44585d91a9e94bf88dc7b1f42a0e5674
Author: Yuming Wang 
AuthorDate: Mon Jan 7 18:59:43 2019 -0800

[SPARK-24196][SQL] Implement Spark's own GetSchemasOperation

## What changes were proposed in this pull request?

This PR fix SQL Client tools can't show DBs by implementing Spark's own 
`GetSchemasOperation`.

## How was this patch tested?
unit tests and manual tests

![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png)

![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png)

Closes #22903 from wangyum/SPARK-24196.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../service/cli/operation/GetSchemasOperation.java |   2 +-
 .../thriftserver/SparkGetSchemasOperation.scala|  66 +
 .../server/SparkSQLOperationManager.scala  |  17 +++-
 .../thriftserver/HiveThriftServer2Suites.scala |  16 
 .../thriftserver/SparkMetadataOperationSuite.scala | 103 +
 5 files changed, 201 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
index d6f6280..3516bc2 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
@@ -41,7 +41,7 @@ public class GetSchemasOperation extends MetadataOperation {
   .addStringColumn("TABLE_SCHEM", "Schema name.")
   .addStringColumn("TABLE_CATALOG", "Catalog name.");
 
-  private RowSet rowSet;
+  protected RowSet rowSet;
 
   protected GetSchemasOperation(HiveSession parentSession,
   String catalogName, String schemaName) {
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala
new file mode 100644
index 000..d585049
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetSchemasOperation
+import 
org.apache.hive.service.cli.operation.MetadataOperation.DEFAULT_HIVE_CATALOG
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.sql.SQLContext
+
+/**
+ * Spark's own GetSchemasOperation
+ *
+ * @param sqlContext SQLContext to use
+ * @param parentSession a HiveSession from SessionManager
+ * @param catalogName catalog name. null if not applicable.
+ * @param schemaName database name, null or a concrete database name
+ */
+private[hive] class SparkGetSchemasOperation(
+sqlContext: SQLContext,
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String)
+  extends GetSchemasOperation(parentSession, catalogName, schemaName) {
+
+  override def runInternal(): Unit = {
+setState(OperationState.RUNNING)
+// Always use the latest class loader provided by executionHive's state.
+val executionHiveClassLoader = sqlContext.sharedState.jarClassLoader
+Thread.currentThread().setContextClassLoader(executionHiveClassLoader)
+
+if (isAuthV2Enabled) {
+  val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName"
+  authorizeMetaGets(HiveOperationType.GET_TABLES, null, cmdStr)
+}
+
+try {
+  

[spark] branch branch-2.4 updated: [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 3ece0aa  [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox 
to check HEADER
3ece0aa is described below

commit 3ece0aa479bd32732742d1d8e607de25520a9f5a
Author: Dongjoon Hyun 
AuthorDate: Mon Jan 7 17:54:05 2019 -0800

[SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER

## What changes were proposed in this pull request?

This PR uses GitHub repository instead of GitBox because GitHub repo 
returns HTTP header status correctly.

## How was this patch tested?

Manual.

```
$ ./do-release-docker.sh -d /tmp/test -n
Branch [branch-2.4]:
Current branch version is 2.4.1-SNAPSHOT.
Release [2.4.1]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.1-rc1]:
```

Closes #23482 from dongjoon-hyun/SPARK-26554-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 6f35ede31cc72a81e3852b1ac7454589d1897bfc)
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/release-util.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/dev/create-release/release-util.sh 
b/dev/create-release/release-util.sh
index 9a34052..5486c18 100755
--- a/dev/create-release/release-util.sh
+++ b/dev/create-release/release-util.sh
@@ -21,6 +21,7 @@ DRY_RUN=${DRY_RUN:-0}
 GPG="gpg --no-tty --batch"
 ASF_REPO="https://gitbox.apache.org/repos/asf/spark.git;
 ASF_REPO_WEBUI="https://gitbox.apache.org/repos/asf?p=spark.git;
+ASF_GITHUB_REPO="https://github.com/apache/spark;
 
 function error {
   echo "$*"
@@ -73,9 +74,7 @@ function fcreate_secure {
 }
 
 function check_for_tag {
-  # Check HTML body messages instead of header status codes. Apache GitBox 
returns
-  # a header with `200 OK` status code for both existing and non-existing tag 
URLs
-  ! curl -s --fail "$ASF_REPO_WEBUI;a=commit;h=$1" | grep '404 Not Found' > 
/dev/null
+  curl -s --head --fail "$ASF_GITHUB_REPO/releases/tag/$1" > /dev/null
 }
 
 function get_release_info {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f35ede  [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox 
to check HEADER
6f35ede is described below

commit 6f35ede31cc72a81e3852b1ac7454589d1897bfc
Author: Dongjoon Hyun 
AuthorDate: Mon Jan 7 17:54:05 2019 -0800

[SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER

## What changes were proposed in this pull request?

This PR uses GitHub repository instead of GitBox because GitHub repo 
returns HTTP header status correctly.

## How was this patch tested?

Manual.

```
$ ./do-release-docker.sh -d /tmp/test -n
Branch [branch-2.4]:
Current branch version is 2.4.1-SNAPSHOT.
Release [2.4.1]:
RC # [1]:
This is a dry run. Please confirm the ref that will be built for testing.
Ref [v2.4.1-rc1]:
```

Closes #23482 from dongjoon-hyun/SPARK-26554-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/release-util.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/dev/create-release/release-util.sh 
b/dev/create-release/release-util.sh
index 9a34052..5486c18 100755
--- a/dev/create-release/release-util.sh
+++ b/dev/create-release/release-util.sh
@@ -21,6 +21,7 @@ DRY_RUN=${DRY_RUN:-0}
 GPG="gpg --no-tty --batch"
 ASF_REPO="https://gitbox.apache.org/repos/asf/spark.git;
 ASF_REPO_WEBUI="https://gitbox.apache.org/repos/asf?p=spark.git;
+ASF_GITHUB_REPO="https://github.com/apache/spark;
 
 function error {
   echo "$*"
@@ -73,9 +74,7 @@ function fcreate_secure {
 }
 
 function check_for_tag {
-  # Check HTML body messages instead of header status codes. Apache GitBox 
returns
-  # a header with `200 OK` status code for both existing and non-existing tag 
URLs
-  ! curl -s --fail "$ASF_REPO_WEBUI;a=commit;h=$1" | grep '404 Not Found' > 
/dev/null
+  curl -s --head --fail "$ASF_GITHUB_REPO/releases/tag/$1" > /dev/null
 }
 
 function get_release_info {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31805 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_17_16-faa4c28-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Tue Jan  8 01:31:28 2019
New Revision: 31805

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_07_17_16-faa4c28 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][K8S] add missing docs for podTemplateContainerName properties

2019-01-07 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5fb5a02  [MINOR][K8S] add missing docs for podTemplateContainerName 
properties
5fb5a02 is described below

commit 5fb5a0292d9ced48860abe712a10cbb8e513b75a
Author: Adrian Tanase 
AuthorDate: Mon Jan 7 19:03:38 2019 -0600

[MINOR][K8S] add missing docs for podTemplateContainerName properties

## What changes were proposed in this pull request?

Adding docs for an enhancement that came in late in this PR: #22146
Currently the docs state that we're going to use the first container in a 
pod template, which was the implementation for some time, until it was improved 
with 2 new properties.

## How was this patch tested?

I tested that the properties work by combining pod templates with 
client-mode and a simple pod template.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #23155 from aditanase/k8s-readme.

Authored-by: Adrian Tanase 
Signed-off-by: Sean Owen 
---
 docs/running-on-kubernetes.md | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 3172b1b..3453ee9 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -229,8 +229,11 @@ pod template that will always be overwritten by Spark. 
Therefore, users of this
 the pod template file only lets Spark start with a template pod instead of an 
empty pod during the pod-building process.
 For details, see the [full list](#pod-template-properties) of pod template 
values that will be overwritten by spark.
 
-Pod template files can also define multiple containers. In such cases, Spark 
will always assume that the first container in
-the list will be the driver or executor container.
+Pod template files can also define multiple containers. In such cases, you can 
use the spark properties
+`spark.kubernetes.driver.podTemplateContainerName` and 
`spark.kubernetes.executor.podTemplateContainerName`
+to indicate which container should be used as a basis for the driver or 
executor.
+If not specified, or if the container name is not valid, Spark will assume 
that the first container in the list
+will be the driver or executor container.
 
 ## Using Kubernetes Volumes
 
@@ -932,16 +935,32 @@ specific to Spark on Kubernetes.
   spark.kubernetes.driver.podTemplateFile
   (none)
   
-   Specify the local file that contains the driver [pod 
template](#pod-template). For example
-   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml`
+   Specify the local file that contains the driver pod 
template. For example
+   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml
+  
+
+
+  spark.kubernetes.driver.podTemplateContainerName
+  (none)
+  
+   Specify the container name to be used as a basis for the driver in the 
given pod template.
+   For example 
spark.kubernetes.driver.podTemplateContainerName=spark-driver
   
 
 
   spark.kubernetes.executor.podTemplateFile
   (none)
   
-   Specify the local file that contains the executor [pod 
template](#pod-template). For example
-   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml`
+   Specify the local file that contains the executor pod template. For example
+   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml
+  
+
+
+  spark.kubernetes.executor.podTemplateContainerName
+  (none)
+  
+   Specify the container name to be used as a basis for the executor in the 
given pod template.
+   For example 
spark.kubernetes.executor.podTemplateContainerName=spark-executor
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka (2.4)

2019-01-07 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new faa4c28  [SPARK-26267][SS] Retry when detecting incorrect offsets from 
Kafka (2.4)
faa4c28 is described below

commit faa4c2823b69c1643d7678ee1cb0b7295c611334
Author: Shixiong Zhu 
AuthorDate: Mon Jan 7 16:53:07 2019 -0800

[SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka (2.4)

## What changes were proposed in this pull request?

Backport #23324 to branch-2.4.

## How was this patch tested?

Jenkins

Closes #23365 from zsxwing/SPARK-26267-2.4.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
---
 .../spark/sql/kafka010/KafkaContinuousReader.scala |  4 +-
 .../spark/sql/kafka010/KafkaMicroBatchReader.scala | 20 --
 .../sql/kafka010/KafkaOffsetRangeCalculator.scala  |  2 +
 .../spark/sql/kafka010/KafkaOffsetReader.scala | 80 --
 .../apache/spark/sql/kafka010/KafkaSource.scala|  5 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  | 48 +
 6 files changed, 146 insertions(+), 13 deletions(-)

diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala
index 8ce56a2..561d501 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala
@@ -73,7 +73,7 @@ class KafkaContinuousReader(
 offset = start.orElse {
   val offsets = initialOffsets match {
 case EarliestOffsetRangeLimit => 
KafkaSourceOffset(offsetReader.fetchEarliestOffsets())
-case LatestOffsetRangeLimit => 
KafkaSourceOffset(offsetReader.fetchLatestOffsets())
+case LatestOffsetRangeLimit => 
KafkaSourceOffset(offsetReader.fetchLatestOffsets(None))
 case SpecificOffsetRangeLimit(p) => 
offsetReader.fetchSpecificOffsets(p, reportDataLoss)
   }
   logInfo(s"Initial offsets: $offsets")
@@ -128,7 +128,7 @@ class KafkaContinuousReader(
   }
 
   override def needsReconfiguration(): Boolean = {
-knownPartitions != null && offsetReader.fetchLatestOffsets().keySet != 
knownPartitions
+knownPartitions != null && offsetReader.fetchLatestOffsets(None).keySet != 
knownPartitions
   }
 
   override def toString(): String = s"KafkaSource[$offsetReader]"
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala
index 8cc989f..b6c8035 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala
@@ -93,7 +93,8 @@ private[kafka010] class KafkaMicroBatchReader(
 endPartitionOffsets = Option(end.orElse(null))
 .map(_.asInstanceOf[KafkaSourceOffset].partitionToOffsets)
 .getOrElse {
-  val latestPartitionOffsets = kafkaOffsetReader.fetchLatestOffsets()
+  val latestPartitionOffsets =
+kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets))
   maxOffsetsPerTrigger.map { maxOffsets =>
 rateLimit(maxOffsets, startPartitionOffsets, 
latestPartitionOffsets)
   }.getOrElse {
@@ -132,10 +133,21 @@ private[kafka010] class KafkaMicroBatchReader(
 }.toSeq
 logDebug("TopicPartitions: " + topicPartitions.mkString(", "))
 
+val fromOffsets = startPartitionOffsets ++ newPartitionInitialOffsets
+val untilOffsets = endPartitionOffsets
+untilOffsets.foreach { case (tp, untilOffset) =>
+  fromOffsets.get(tp).foreach { fromOffset =>
+if (untilOffset < fromOffset) {
+  reportDataLoss(s"Partition $tp's offset was changed from " +
+s"$fromOffset to $untilOffset, some data may have been missed")
+}
+  }
+}
+
 // Calculate offset ranges
 val offsetRanges = rangeCalculator.getRanges(
-  fromOffsets = startPartitionOffsets ++ newPartitionInitialOffsets,
-  untilOffsets = endPartitionOffsets,
+  fromOffsets = fromOffsets,
+  untilOffsets = untilOffsets,
   executorLocations = getSortedExecutorList())
 
 // Reuse Kafka consumers only when all the offset ranges have distinct 
TopicPartitions,
@@ -192,7 +204,7 @@ private[kafka010] class KafkaMicroBatchReader(
 case EarliestOffsetRangeLimit =>
   KafkaSourceOffset(kafkaOffsetReader.fetchEarliestOffsets())
 case LatestOffsetRangeLimit =>
-  KafkaSourceOffset(kafkaOffsetReader.fetchLatestOffsets())
+   

[spark] branch master updated: [SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files

2019-01-07 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5102ccc  [SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of 
throwing an exception for underscore files
5102ccc is described below

commit 5102ccc4ab6e30caa5510131dee7098b4f3ad32e
Author: Hyukjin Kwon 
AuthorDate: Mon Jan 7 15:48:54 2019 -0800

[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an 
exception for underscore files

## What changes were proposed in this pull request?

The PR https://github.com/apache/spark/pull/23446 happened to introduce a 
behaviour change - empty dataframes can't be read anymore from underscore 
files. It looks controversial to allow or disallow this case so this PR targets 
to fix to issue warning instead of throwing an exception to be more 
conservative.

**Before**

```scala
scala> spark.read.schema("a int").parquet("_tmp*").show()
org.apache.spark.sql.AnalysisException: All paths were ignored:
file:/.../_tmp
  file:/.../_tmp1;
  at 
org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:651)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:635)
  ... 49 elided

scala> spark.read.text("_tmp*").show()
org.apache.spark.sql.AnalysisException: All paths were ignored:
file:/.../_tmp
  file:/.../_tmp1;
  at 
org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:723)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:695)
  ... 49 elided
```

**After**

```scala
scala> spark.read.schema("a int").parquet("_tmp*").show()
19/01/07 15:14:43 WARN DataSource: All paths were ignored:
  file:/.../_tmp
  file:/.../_tmp1
+---+
|  a|
+---+
+---+

scala> spark.read.text("_tmp*").show()
19/01/07 15:14:51 WARN DataSource: All paths were ignored:
  file:/.../_tmp
  file:/.../_tmp1
+-+
|value|
+-+
+-+
```

## How was this patch tested?

Manually tested as above.

Closes #23481 from HyukjinKwon/SPARK-26339.

Authored-by: Hyukjin Kwon 
Signed-off-by: gatorsmile 
---
 .../spark/sql/execution/datasources/DataSource.scala |  6 +++---
 sql/core/src/test/resources/test-data/_cars.csv  |  7 ---
 .../sql/execution/datasources/csv/CSVSuite.scala | 20 
 3 files changed, 3 insertions(+), 30 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
index 2a438a5..5dad784 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
@@ -567,11 +567,11 @@ case class DataSource(
   }
   if (filteredOut.nonEmpty) {
 if (filteredIn.isEmpty) {
-  throw new AnalysisException(
-s"All paths were ignored:\n${filteredOut.mkString("\n  ")}")
+  logWarning(
+s"All paths were ignored:\n  ${filteredOut.mkString("\n  ")}")
 } else {
   logDebug(
-s"Some paths were ignored:\n${filteredOut.mkString("\n  ")}")
+s"Some paths were ignored:\n  ${filteredOut.mkString("\n  ")}")
 }
   }
 }
diff --git a/sql/core/src/test/resources/test-data/_cars.csv 
b/sql/core/src/test/resources/test-data/_cars.csv
deleted file mode 100644
index 40ded57..000
--- a/sql/core/src/test/resources/test-data/_cars.csv
+++ /dev/null
@@ -1,7 +0,0 @@
-
-year,make,model,comment,blank
-"2012","Tesla","S","No comment",
-
-1997,Ford,E350,"Go get one now they are going fast",
-2015,Chevy,Volt
-
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index fb1bedf..d9e5d7a 100644
--- 

[spark] branch master updated: [SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test categories

2019-01-07 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a64152  [SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded 
configs for test categories
1a64152 is described below

commit 1a641525e60039cc6b10816e946cb6f44b3e2696
Author: Marco Gaido 
AuthorDate: Mon Jan 7 15:35:33 2019 -0800

[SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test 
categories

## What changes were proposed in this pull request?

The PR makes hardcoded `spark.test` and `spark.testing` configs to use 
`ConfigEntry` and put them in the config package.

## How was this patch tested?

existing UTs

Closes #23413 from mgaido91/SPARK-26491.

Authored-by: Marco Gaido 
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/ExecutorAllocationManager.scala   |  4 +-
 .../main/scala/org/apache/spark/SparkContext.scala |  3 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|  4 +-
 .../spark/executor/ProcfsMetricsGetter.scala   |  2 +-
 .../org/apache/spark/executor/TaskMetrics.scala|  3 +-
 .../org/apache/spark/internal/config/Tests.scala   | 56 ++
 .../apache/spark/memory/StaticMemoryManager.scala  |  5 +-
 .../apache/spark/memory/UnifiedMemoryManager.scala |  7 +--
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  3 +-
 .../cluster/StandaloneSchedulerBackend.scala   |  3 +-
 .../org/apache/spark/util/SizeEstimator.scala  |  5 +-
 .../main/scala/org/apache/spark/util/Utils.scala   |  5 +-
 .../scala/org/apache/spark/DistributedSuite.scala  |  5 +-
 .../spark/ExecutorAllocationManagerSuite.scala |  3 +-
 .../test/scala/org/apache/spark/ShuffleSuite.scala |  5 +-
 .../scala/org/apache/spark/SparkFunSuite.scala |  3 +-
 .../history/HistoryServerArgumentsSuite.scala  |  6 +--
 .../spark/deploy/history/HistoryServerSuite.scala  |  7 +--
 .../spark/memory/StaticMemoryManagerSuite.scala|  5 +-
 .../spark/memory/UnifiedMemoryManagerSuite.scala   | 33 ++---
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  7 +--
 .../scheduler/BlacklistIntegrationSuite.scala  | 19 
 .../shuffle/sort/ShuffleExternalSorterSuite.scala  |  5 +-
 .../storage/BlockManagerReplicationSuite.scala |  9 ++--
 .../apache/spark/storage/BlockManagerSuite.scala   | 10 ++--
 .../apache/spark/storage/MemoryStoreSuite.scala|  1 -
 .../org/apache/spark/util/SizeEstimatorSuite.scala |  5 +-
 .../collection/ExternalAppendOnlyMapSuite.scala|  3 +-
 .../util/collection/ExternalSorterSuite.scala  |  3 +-
 .../integrationtest/KubernetesTestComponents.scala |  3 +-
 .../mesos/MesosCoarseGrainedSchedulerBackend.scala |  3 +-
 .../apache/spark/sql/execution/SQLExecution.scala  |  4 +-
 .../org/apache/spark/sql/BenchmarkQueryTest.scala  |  3 +-
 .../sql/execution/UnsafeRowSerializerSuite.scala   |  3 +-
 35 files changed, 165 insertions(+), 83 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index d966582..0807e65 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -27,6 +27,7 @@ import com.codahale.metrics.{Gauge, MetricRegistry}
 
 import org.apache.spark.internal.{config, Logging}
 import org.apache.spark.internal.config._
+import org.apache.spark.internal.config.Tests.TEST_SCHEDULE_INTERVAL
 import org.apache.spark.metrics.source.Source
 import org.apache.spark.scheduler._
 import org.apache.spark.storage.BlockManagerMaster
@@ -157,7 +158,7 @@ private[spark] class ExecutorAllocationManager(
 
   // Polling loop interval (ms)
   private val intervalMillis: Long = if (Utils.isTesting) {
-  conf.getLong(TESTING_SCHEDULE_INTERVAL_KEY, 100)
+  conf.get(TEST_SCHEDULE_INTERVAL)
 } else {
   100
 }
@@ -899,5 +900,4 @@ private[spark] class ExecutorAllocationManager(
 
 private object ExecutorAllocationManager {
   val NOT_SET = Long.MaxValue
-  val TESTING_SCHEDULE_INTERVAL_KEY = 
"spark.testing.dynamicAllocation.scheduleInterval"
 }
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 89be9de..3a1e1b9 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -45,6 +45,7 @@ import org.apache.spark.deploy.{LocalSparkCluster, 
SparkHadoopUtil}
 import org.apache.spark.input.{FixedLengthBinaryInputFormat, 
PortableDataStream, StreamInputFormat, WholeTextFileInputFormat}
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
+import 

svn commit: r31803 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_15_09-98be895-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Mon Jan  7 23:22:22 2019
New Revision: 31803

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_07_15_09-98be895 docs


[This commit notification would consist of 1774 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource

2019-01-07 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new b4202e7  [SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have 
same blacklist behaviour with yarn to maxmize use of cluster resource
b4202e7 is described below

commit b4202e79833f3adc00afe00f43e8d9165c9c8e48
Author: wuyi 
AuthorDate: Mon Jan 7 16:22:28 2019 -0600

[SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist 
behaviour with yarn to maxmize use of cluster resource

## What changes were proposed in this pull request?

As I mentioned in jira 
[SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to 
maxmize the use of cluster resource,  this pr try to make `YarnAllocator` have 
the same blacklist behaviour with YARN.

## How was this patch tested?

Added.

Closes #23368 from 
Ngone51/dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN-branch-2.4.

Lead-authored-by: wuyi 
Co-authored-by: Ngone51 
Signed-off-by: Thomas Graves 
---
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 31 ++--
 .../yarn/YarnAllocatorBlacklistTracker.scala   |  4 +-
 .../yarn/YarnAllocatorBlacklistTrackerSuite.scala  |  2 +-
 .../spark/deploy/yarn/YarnAllocatorSuite.scala | 83 --
 4 files changed, 107 insertions(+), 13 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index f4dc80a..3357084 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -578,13 +578,23 @@ private[yarn] class YarnAllocator(
 (true, memLimitExceededLogMessage(
   completedContainer.getDiagnostics,
   PMEM_EXCEEDED_PATTERN))
-  case _ =>
-// all the failures which not covered above, like:
-// disk failure, kill by app master or resource manager, ...
-allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt)
-(true, "Container marked as failed: " + containerId + onHostStr +
-  ". Exit status: " + completedContainer.getExitStatus +
-  ". Diagnostics: " + completedContainer.getDiagnostics)
+  case other_exit_status =>
+// SPARK-26269: follow YARN's blacklisting behaviour(see 
https://github
+// 
.com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/had
+// 
oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
+// ache/hadoop/yarn/util/Apps.java#L273 for details)
+if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+  (false, s"Container marked as failed: $containerId$onHostStr" +
+s". Exit status: ${completedContainer.getExitStatus}" +
+s". Diagnostics: ${completedContainer.getDiagnostics}.")
+} else {
+  // completed container from a bad node
+  
allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt)
+  (true, s"Container from a bad node: $containerId$onHostStr" +
+s". Exit status: ${completedContainer.getExitStatus}" +
+s". Diagnostics: ${completedContainer.getDiagnostics}.")
+}
+
 
 }
 if (exitCausedByApp) {
@@ -722,4 +732,11 @@ private object YarnAllocator {
   "Consider boosting spark.yarn.executor.memoryOverhead or " +
   "disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714."
   }
+  val NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS = Set(
+ContainerExitStatus.KILLED_BY_RESOURCEMANAGER,
+ContainerExitStatus.KILLED_BY_APPMASTER,
+ContainerExitStatus.KILLED_AFTER_APP_COMPLETION,
+ContainerExitStatus.ABORTED,
+ContainerExitStatus.DISKS_FAILED
+  )
 }
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
index ceac7cd..268976b 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
@@ -120,7 +120,9 @@ private[spark] class YarnAllocatorBlacklistTracker(
 if (removals.nonEmpty) {
   logInfo(s"removing nodes from YARN application master's blacklist: 
$removals")
 }
-amClient.updateBlacklist(additions.asJava, removals.asJava)
+if (additions.nonEmpty || removals.nonEmpty) {
+ 

[spark] branch master updated: [SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field

2019-01-07 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 98be895  [SPARK-26065][SQL] Change query hint from a `LogicalPlan` to 
a field
98be895 is described below

commit 98be8953c75c026c1cb432cc8f66dd312feed0c6
Author: maryannxue 
AuthorDate: Mon Jan 7 13:59:40 2019 -0800

[SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field

## What changes were proposed in this pull request?

The existing query hint implementation relies on a logical plan node 
`ResolvedHint` to store query hints in logical plans, and on `Statistics` in 
physical plans. Since `ResolvedHint` is not really a logical operator and can 
break the pattern matching for existing and future optimization rules, it is a 
issue to the Optimizer as the old `AnalysisBarrier` was to the Analyzer.

Given the fact that all our query hints are either 1) a join hint, i.e., 
broadcast hint; or 2) a re-partition hint, which is indeed an operator, we only 
need to add a hint field on the Join plan and that will be a good enough 
solution for the current hint usage.

This PR is to let `Join` node have a hint for its left sub-tree and another 
hint for its right sub-tree and each hint is a merged result of all the 
effective hints specified in the corresponding sub-tree. The "effectiveness" of 
a hint, i.e., whether that hint should be propagated to the `Join` node, is 
currently consistent with the hint propagation rules originally implemented in 
the `Statistics` approach. Note that the `ResolvedHint` node still has to live 
through the analysis stage  [...]

This PR also introduces a change in how hints work with join reordering. 
Before this PR, hints would stop join reordering. For example, in 
"a.join(b).join(c).hint("broadcast").join(d)", the broadcast hint would stop d 
from participating in the cost-based join reordering while still allowing 
reordering from under the hint node. After this PR, though, the broadcast hint 
will not interfere with join reordering at all, and after reordering if a 
relation associated with a hint stays unchan [...]

## How was this patch tested?

Added new tests.

Closes #23036 from maryannxue/query-hint.

Authored-by: maryannxue 
Signed-off-by: gatorsmile 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  16 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   4 +-
 .../catalyst/analysis/StreamingJoinHelper.scala|   2 +-
 .../analysis/UnsupportedOperationChecker.scala |   2 +-
 .../apache/spark/sql/catalyst/dsl/package.scala|   2 +-
 .../catalyst/optimizer/CostBasedJoinReorder.scala  |  84 ++---
 .../catalyst/optimizer/EliminateResolvedHint.scala |  59 +++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  36 ++--
 .../optimizer/PropagateEmptyRelation.scala |   2 +-
 .../ReplaceNullWithFalseInPredicate.scala  |   2 +-
 .../spark/sql/catalyst/optimizer/expressions.scala |   3 +-
 .../spark/sql/catalyst/optimizer/joins.scala   |  27 +--
 .../spark/sql/catalyst/optimizer/subquery.scala|  14 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |   4 +-
 .../spark/sql/catalyst/planning/patterns.scala |  41 +++--
 .../plans/logical/LogicalPlanVisitor.scala |   3 -
 .../sql/catalyst/plans/logical/Statistics.scala|   7 +-
 .../plans/logical/basicLogicalOperators.scala  |  14 +-
 .../spark/sql/catalyst/plans/logical/hints.scala   |  27 ++-
 .../statsEstimation/AggregateEstimation.scala  |   3 +-
 .../statsEstimation/BasicStatsPlanVisitor.scala|   2 -
 .../logical/statsEstimation/JoinEstimation.scala   |   2 +-
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  22 +--
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |   8 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala  |   4 +-
 .../sql/catalyst/analysis/ResolveHintsSuite.scala  |   2 +-
 .../catalyst/optimizer/ColumnPruningSuite.scala|   6 +-
 .../catalyst/optimizer/FilterPushdownSuite.scala   |  14 --
 .../catalyst/optimizer/JoinOptimizationSuite.scala |  28 +--
 .../sql/catalyst/optimizer/JoinReorderSuite.scala  |  83 -
 .../catalyst/optimizer/ReplaceOperatorSuite.scala  |   8 +-
 .../apache/spark/sql/catalyst/plans/PlanTest.scala |  10 +-
 .../spark/sql/catalyst/plans/SameResultSuite.scala |  16 +-
 .../BasicStatsEstimationSuite.scala|  18 --
 .../statsEstimation/FilterEstimationSuite.scala|   2 +-
 .../statsEstimation/JoinEstimationSuite.scala  |  31 ++--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  16 +-
 .../spark/sql/execution/SparkStrategies.scala  |  52 +++---
 .../sql/execution/columnar/InMemoryRelation.scala  |   9 +-
 .../org/apache/spark/sql/CachedTableSuite.scala|  21 +++
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  

[spark-website] branch asf-site updated (222263d -> eb0aa14)

2019-01-07 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git.


from 63d  Hotfix site links in sitemap.xml
 add eb0aa14  Suggest new Apache repo for committers

No new revisions were added by this update.

Summary of changes:
 committers.md| 4 ++--
 site/committers.html | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-25689][YARN] Make driver, not AM, manage delegation tokens.

2019-01-07 Thread irashid
This is an automated email from the ASF dual-hosted git repository.

irashid pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 669e8a1  [SPARK-25689][YARN] Make driver, not AM, manage delegation 
tokens.
669e8a1 is described below

commit 669e8a155987995a1a5d49a96b88c05f39e41723
Author: Marcelo Vanzin 
AuthorDate: Mon Jan 7 14:40:08 2019 -0600

[SPARK-25689][YARN] Make driver, not AM, manage delegation tokens.

This change modifies the behavior of the delegation token code when running
on YARN, so that the driver controls the renewal, in both client and cluster
mode. For that, a few different things were changed:

* The AM code only runs code that needs DTs when DTs are available.

In a way, this restores the AM behavior to what it was pre-SPARK-23361, but
keeping the fix added in that bug. Basically, all the AM code is run in a
"UGI.doAs()" block; but code that needs to talk to HDFS (basically the
distributed cache handling code) was delayed to the point where the driver
is up and running, and thus when valid delegation tokens are available.

* SparkSubmit / ApplicationMaster now handle user login, not the token 
manager.

The previous AM code was relying on the token manager to keep the user
logged in when keytabs are used. This required some odd APIs in the token
manager and the AM so that the right UGI was exposed and used in the right
places.

After this change, the logged in user is handled separately from the token
manager, so the API was cleaned up, and, as explained above, the whole AM
runs under the logged in user, which also helps with simplifying some more 
code.

* Distributed cache configs are sent separately to the AM.

Because of the delayed initialization of the cached resources in the AM, it
became easier to write the cache config to a separate properties file 
instead
of bundling it with the rest of the Spark config. This also avoids having
to modify the SparkConf to hide things from the UI.

* Finally, the AM doesn't manage the token manager anymore.

The above changes allow the token manager to be completely handled by the
driver's scheduler backend code also in YARN mode (whether client or 
cluster),
making it similar to other RMs. To maintain the fix added in SPARK-23361 
also
in client mode, the AM now sends an extra message to the driver on 
initialization
to fetch delegation tokens; and although it might not really be needed, the
driver also keeps the running AM updated when new tokens are created.

Tested in a kerberized cluster with the same tests used to validate 
SPARK-23361,
in both client and cluster mode. Also tested with a non-kerberized cluster.

Closes #23338 from vanzin/SPARK-25689.

Authored-by: Marcelo Vanzin 
Signed-off-by: Imran Rashid 
---
 .../security/HadoopDelegationTokenManager.scala| 110 ++---
 .../security/HiveDelegationTokenProvider.scala |  16 ++-
 .../cluster/CoarseGrainedClusterMessage.scala  |   3 +
 .../cluster/CoarseGrainedSchedulerBackend.scala|  40 --
 .../HadoopDelegationTokenManagerSuite.scala|   8 +-
 .../features/KerberosConfDriverFeatureStep.scala   |   2 +-
 .../k8s/KubernetesClusterSchedulerBackend.scala|   7 +-
 .../mesos/MesosCoarseGrainedSchedulerBackend.scala |   7 +-
 .../spark/deploy/yarn/ApplicationMaster.scala  | 135 ++---
 .../deploy/yarn/ApplicationMasterArguments.scala   |   5 +
 .../org/apache/spark/deploy/yarn/Client.scala  | 100 ---
 .../apache/spark/deploy/yarn/YarnRMClient.scala|   8 +-
 .../org/apache/spark/deploy/yarn/config.scala  |  10 --
 .../YARNHadoopDelegationTokenManager.scala |   7 +-
 .../cluster/YarnClientSchedulerBackend.scala   |   6 +
 .../scheduler/cluster/YarnSchedulerBackend.scala   |  17 ++-
 .../YARNHadoopDelegationTokenManagerSuite.scala|   2 +-
 17 files changed, 246 insertions(+), 237 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
index f7e3dde..d97857a 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
@@ -21,7 +21,6 @@ import java.io.File
 import java.net.URI
 import java.security.PrivilegedExceptionAction
 import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
-import java.util.concurrent.atomic.AtomicReference
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.FileSystem
@@ -39,32 +38,24 @@ import org.apache.spark.util.ThreadUtils
 /**
  * Manager for 

svn commit: r31801 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_10_46-71183b2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Mon Jan  7 18:59:19 2019
New Revision: 31801

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_07_10_46-71183b2 docs


[This commit notification would consist of 1774 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 2.2.4-SNAPSHOT

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 209af563d61d985f7d95621614c34b4995db9318
Author: Dongjoon Hyun 
AuthorDate: Mon Jan 7 17:48:40 2019 +

Preparing development version 2.2.4-SNAPSHOT
---
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 38 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index ad72330..f3af8d4 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.2.3
+Version: 2.2.4
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index af02caa..fc5d51d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index cf5cb38..5be1ee5 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 9f30485..f12c239 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 9a68884..ae952a9 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 8f5ec41..9d756fd 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 6c3e22c..d11d9c7 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 46dfab2..43db301 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/core/pom.xml b/core/pom.xml
index be29a61..05a3abf 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3
+2.2.4-SNAPSHOT
 ../pom.xml
   
 
diff --git a/docs/_config.yml b/docs/_config.yml
index b9145cc..2cb4dd0 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -14,8 +14,8 @@ include:
 
 # 

[spark] branch branch-2.2 updated (a642a6d -> 209af56)

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a642a6d  [MINOR][BUILD] Fix script name in `release-tag.sh` usage 
message
 add 4acb6ba  Preparing Spark release v2.2.3-rc1
 new 209af56  Preparing development version 2.2.4-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 38 files changed, 39 insertions(+), 39 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v2.2.3-rc1

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v2.2.3-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 4acb6ba37b94b90aac445e6546426145a5f9eba2
Author: Dongjoon Hyun 
AuthorDate: Mon Jan 7 17:48:24 2019 +

Preparing Spark release v2.2.3-rc1
---
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 37 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index f9ec6e7..af02caa 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 55d29d5..cf5cb38 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6d84766..9f30485 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 6228be6..9a68884 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index d511d40..8f5ec41 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 1c6f6ac..6c3e22c 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index dee2b01..46dfab2 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../../pom.xml
   
 
diff --git a/core/pom.xml b/core/pom.xml
index a300a59..be29a61 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.3-SNAPSHOT
+2.2.3
 ../pom.xml
   
 
diff --git a/docs/_config.yml b/docs/_config.yml
index 320b077..b9145cc 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -14,7 +14,7 @@ include:
 
 # These allow the documentation to be updated with newer releases
 # of Spark, Scala, and Mesos.
-SPARK_VERSION: 2.2.3-SNAPSHOT
+SPARK_VERSION: 2.2.3
 SPARK_VERSION_SHORT: 2.2.3
 SCALA_BINARY_VERSION: "2.11"
 SCALA_VERSION: "2.11.8"
diff --git a/examples/pom.xml b/examples/pom.xml
index 89f86f5..4acdba2 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-

[spark] tag v2.2.3-rc1 created (now 4acb6ba)

2019-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v2.2.3-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 4acb6ba  (commit)
This tag includes the following new commits:

 new 4acb6ba  Preparing Spark release v2.2.3-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-24489][ML] Check for invalid input type of weight data in ml.PowerIterationClustering

2019-01-07 Thread holden
This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 71183b2  [SPARK-24489][ML] Check for invalid input type of weight data 
in ml.PowerIterationClustering
71183b2 is described below

commit 71183b283343a99c6fa99a41268dae412598067f
Author: Shahid 
AuthorDate: Mon Jan 7 09:15:50 2019 -0800

[SPARK-24489][ML] Check for invalid input type of weight data in 
ml.PowerIterationClustering

## What changes were proposed in this pull request?
The test case will result the following failure. currently in ml.PIC, there 
is no check for the data type of weight column.
 ```
 test("invalid input types for weight") {
val invalidWeightData = spark.createDataFrame(Seq(
  (0L, 1L, "a"),
  (2L, 3L, "b")
)).toDF("src", "dst", "weight")

val pic = new PowerIterationClustering()
  .setWeightCol("weight")

val result = pic.assignClusters(invalidWeightData)
  }
```
```
Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, 
most recent failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, 
executor driver): scala.MatchError: [0,1,null] (of class 
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
at 
org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
at 
org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
```
In this PR, added check types for weight column.
## How was this patch tested?
UT added

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #21509 from shahidki31/testCasePic.

Authored-by: Shahid 
Signed-off-by: Holden Karau 
---
 .../spark/ml/clustering/PowerIterationClustering.scala|  1 +
 .../ml/clustering/PowerIterationClusteringSuite.scala | 15 +++
 2 files changed, 16 insertions(+)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
index d9a330f..149e99d 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
@@ -166,6 +166,7 @@ class PowerIterationClustering private[clustering] (
 val w = if (!isDefined(weightCol) || $(weightCol).isEmpty) {
   lit(1.0)
 } else {
+  SchemaUtils.checkNumericType(dataset.schema, $(weightCol))
   col($(weightCol)).cast(DoubleType)
 }
 
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
index 55b460f..0ba3ffa 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
@@ -145,6 +145,21 @@ class PowerIterationClusteringSuite extends SparkFunSuite
 assert(msg.contains("Similarity must be nonnegative"))
   }
 
+  test("check for invalid input types of weight") {
+val invalidWeightData = spark.createDataFrame(Seq(
+  (0L, 1L, "a"),
+  (2L, 3L, "b")
+)).toDF("src", "dst", "weight")
+
+val msg = intercept[IllegalArgumentException] {
+  new PowerIterationClustering()
+.setWeightCol("weight")
+.assignClusters(invalidWeightData)
+}.getMessage
+assert(msg.contains("requirement failed: Column weight must be of type 
numeric" +
+  " but was actually of type string."))
+  }
+
   test("test default weight") {
 val dataWithoutWeight = data.sample(0.5, 1L).select('src, 'dst)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a927c764 -> 868e025)

2019-01-07 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a927c764 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy 
versions prior to 1.9
 add 868e025  [SPARK-26383][CORE] NPE when use DataFrameReader.jdbc with 
wrong URL

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala|  7 ++-
 .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala| 13 +
 2 files changed, 19 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31798 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_06_17-a927c76-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Mon Jan  7 14:30:21 2019
New Revision: 31798

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_07_06_17-a927c76 docs


[This commit notification would consist of 1774 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31790 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_04_04-cb1aad6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Mon Jan  7 12:18:44 2019
New Revision: 31790

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_07_04_04-cb1aad6 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9

2019-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new cb1aad6  [SPARK-26559][ML][PYSPARK] ML image can't work with numpy 
versions prior to 1.9
cb1aad6 is described below

commit cb1aad69b781bf9612b9b14f5338b338344365f4
Author: Liang-Chi Hsieh 
AuthorDate: Mon Jan 7 18:36:52 2019 +0800

[SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 
1.9

## What changes were proposed in this pull request?

Due to [API 
change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc)
 at 1.9, PySpark image doesn't work with numpy version prior to 1.9.

When running image test with numpy version prior to 1.9, we can see error:
```
test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR
test_read_images_multiple_times 
(pyspark.ml.tests.test_image.ImageReaderTest2) ... ok

==
ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest)
--
Traceback (most recent call last):
  File 
"/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", 
line 36, in test_read_images
self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), 
first_row)
  File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", 
line 193, in toImage
data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'

--
Ran 2 tests in 29.040s

FAILED (errors=1)
```

## How was this patch tested?

Manually test with numpy version prior and after 1.9.

Closes #23484 from viirya/fix-pyspark-image.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Hyukjin Kwon 

(cherry picked from commit a927c764c1eee066efc1c2c713dfee411de79245)

Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/ml/image.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py
index edb90a3..a1aacea 100644
--- a/python/pyspark/ml/image.py
+++ b/python/pyspark/ml/image.py
@@ -28,6 +28,7 @@ import sys
 import warnings
 
 import numpy as np
+from distutils.version import LooseVersion
 
 from pyspark import SparkContext
 from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string
@@ -190,7 +191,11 @@ class _ImageSchema(object):
 # Running `bytearray(numpy.array([1]))` fails in specific Python 
versions
 # with a specific Numpy version, for example in Python 3.6.0 and NumPy 
1.13.3.
 # Here, it avoids it by converting it to bytes.
-data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+if LooseVersion(np.__version__) >= LooseVersion('1.9'):
+data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+else:
+# Numpy prior to 1.9 don't have `tobytes` method.
+data = bytearray(array.astype(dtype=np.uint8).ravel())
 
 # Creating new Row with _create_row(), because Row(name = value, ... )
 # orders fields by name, which conflicts with expected schema order


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9

2019-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a927c764 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy 
versions prior to 1.9
a927c764 is described below

commit a927c764c1eee066efc1c2c713dfee411de79245
Author: Liang-Chi Hsieh 
AuthorDate: Mon Jan 7 18:36:52 2019 +0800

[SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 
1.9

## What changes were proposed in this pull request?

Due to [API 
change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc)
 at 1.9, PySpark image doesn't work with numpy version prior to 1.9.

When running image test with numpy version prior to 1.9, we can see error:
```
test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR
test_read_images_multiple_times 
(pyspark.ml.tests.test_image.ImageReaderTest2) ... ok

==
ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest)
--
Traceback (most recent call last):
  File 
"/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", 
line 36, in test_read_images
self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), 
first_row)
  File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", 
line 193, in toImage
data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'

--
Ran 2 tests in 29.040s

FAILED (errors=1)
```

## How was this patch tested?

Manually test with numpy version prior and after 1.9.

Closes #23484 from viirya/fix-pyspark-image.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/ml/image.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py
index edb90a3..a1aacea 100644
--- a/python/pyspark/ml/image.py
+++ b/python/pyspark/ml/image.py
@@ -28,6 +28,7 @@ import sys
 import warnings
 
 import numpy as np
+from distutils.version import LooseVersion
 
 from pyspark import SparkContext
 from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string
@@ -190,7 +191,11 @@ class _ImageSchema(object):
 # Running `bytearray(numpy.array([1]))` fails in specific Python 
versions
 # with a specific Numpy version, for example in Python 3.6.0 and NumPy 
1.13.3.
 # Here, it avoids it by converting it to bytes.
-data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+if LooseVersion(np.__version__) >= LooseVersion('1.9'):
+data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+else:
+# Numpy prior to 1.9 don't have `tobytes` method.
+data = bytearray(array.astype(dtype=np.uint8).ravel())
 
 # Creating new Row with _create_row(), because Row(name = value, ... )
 # orders fields by name, which conflicts with expected schema order


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r31789 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_01_53-468d25e-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-07 Thread pwendell
Author: pwendell
Date: Mon Jan  7 10:05:48 2019
New Revision: 31789

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_07_01_53-468d25e docs


[This commit notification would consist of 1774 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org