svn commit: r31810 - in /dev/spark/v2.2.3-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: dongjoon Date: Tue Jan 8 06:42:46 2019 New Revision: 31810 Log: Apache Spark 2.2.3-rc1 docs [This commit notification would consist of 1347 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31809 - /dev/spark/v2.2.3-rc1-bin/
Author: dongjoon Date: Tue Jan 8 06:03:23 2019 New Revision: 31809 Log: Apache Spark 2.2.3-rc1 Added: dev/spark/v2.2.3-rc1-bin/ dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz (with props) dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz (with props) dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512 dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz (with props) dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz.asc dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.6.tgz.sha dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz (with props) dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz.asc dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-hadoop2.7.tgz.sha512 dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz (with props) dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz.asc dev/spark/v2.2.3-rc1-bin/spark-2.2.3-bin-without-hadoop.tgz.sha512 dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz (with props) dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz.asc dev/spark/v2.2.3-rc1-bin/spark-2.2.3.tgz.sha512 Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc == --- dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc (added) +++ dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.asc Tue Jan 8 06:03:23 2019 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJcNDHmAAoJEO2gDOg08PxcnegQAKrCZ/otSLCrOT5x1p1WwGwj +FVhiMDTSBsRogKhauXb8VVBVSEP/TCm5SL7ZNNkdvvCeAg9T0wv70b0r1dAmXiDy +IPwGNxwrEF5tglx9VfPULQ/dWN4LrEUtqIMBXN9xlqPfavHqQ+FaPtBvAZbMYq9W +Z72xgDDBqbsPI6KkVFYbNxA1fRJtFZ9fjinfeLL9EYcUmqAU6mNy/9ex4nwgwXhG +Lzcclb0eXyjY9ze+LdoH+04o0/W8oB6Yr3TSuqva8ZpX3axBw4RviBxbssJ4k2Iu +YqD6rNmPHLzrUdStHWHl8dVVAGpB7uVeGiJfVOVaRMFzy+zAMg6tqCwR1/NflO/B +/2Rokewh8DotYQIm9skyrtsRIdQB3h7D5COENhTueQo3TtPnnkSyPG4S/7ixZsQF +HdUqWwd501RE6NEnv+EDYUDDiiGf0+UdOPwvzZfKQO/LrxUzzUmpdhdVO3wZ9/5y ++RlgReoRznTFOFXwDxvExqyBBROciw8aeNfTr4fXeuK28Cf7j52haU9XmeqEHcLa +Z8UaMEkFPiyTwooGT2PL79a4XXb2VdJK13HV1bLfFTYL7KygwgFv+pc8Wfi65DSD +UTok6nt2xQ8cPCN7n7rCJfavD/IU9wyxAksbOBZB1Ut06bK4M35+W8VHRVVxSZx5 +/fPF8tDeFVlmkiDUSaRQ +=tX+a +-END PGP SIGNATURE- Added: dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 == --- dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 (added) +++ dev/spark/v2.2.3-rc1-bin/SparkR_2.2.3.tar.gz.sha512 Tue Jan 8 06:03:23 2019 @@ -0,0 +1,3 @@ +SparkR_2.2.3.tar.gz: C216AF56 0918FBEA BA5FFFB5 72F6FC60 2BF9E8E3 D5C77122 + 1764D2A8 FA6AD322 D5AB1E4E A6B5CD8A DBD8F4A1 66EF023A + B58A7E9A 80F2357C D3859FA5 50F22465 Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc == --- dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc (added) +++ dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.asc Tue Jan 8 06:03:23 2019 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAlwzylwUHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FwCNg/9Hx/hE9TMnTXzVMfWjuoIx4glWxJK +sjnUS4SVffbPgC1P4cKHUuoPFoKDsU7/JCPXRY88SezoMedYbrjSxSKDZPE9VER3 +C9R+F/bNjifxGu+/2+3PXrzMvS1Y+3M4GFHU5p/iAe01un9JN00FXsi1zXJt58Qp +eBpBR4UhoqapWLM1jqqkz5vEHCSwOHf7gFmLk+Vd1jRX1+siWiGfpmgd1qzzS9kb +SKXIjUgQMzVb9vu/ja7j/vjEs6CjuKvO4Eruz3qJ1A2RnWfcAd1x/Z2dOENuoccs +TBR4pVr4C0/ZO8RbaKgr1pIPpMQB5eEkzbJi2cWBQhearfTyCpgIVJFqIrpY8xer +d5wkZziRdo4JG/cAd2GgNb3MiWi5ghHFTPEKJyMi9ZpqrS/R+CJkasyMEa8VPek7 +iQB6+GklFpr9KNCCKU3TQK4vkawuUJUIFdLBlxkO4yWRI3AcUS66YyaiSJQkar5t +zWpAZI6NDIKIVv1ktS+QgkZl14jf8cVKSC1L4rz96Xhdecv8QId0wAEwSkuUSPaZ +5iCHdwNgIm0N1sDGAG+I9I81GM0jiWNwxlexasMOy0YTaEG4jw7xjyZalWXPW8wo +stNfdcDf6CfDxZr13r5nCuQEshODNV81IsaJFlVOg0cGPLg5HhyY+ZasdKOE6jfG +U54wuQLSPOlWY8c= +=TZPm +-END PGP SIGNATURE- Added: dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512 == --- dev/spark/v2.2.3-rc1-bin/pyspark-2.2.3.tar.gz.sha512 (added) +++
svn commit: r31808 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_21_31-3ece0aa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 8 05:45:36 2019 New Revision: 31808 Log: Apache Spark 2.4.1-SNAPSHOT-2019_01_07_21_31-3ece0aa docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31806 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_19_24-29a7d2d-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 8 03:36:53 2019 New Revision: 31806 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_07_19_24-29a7d2d docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 29a7d2d [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation 29a7d2d is described below commit 29a7d2da44585d91a9e94bf88dc7b1f42a0e5674 Author: Yuming Wang AuthorDate: Mon Jan 7 18:59:43 2019 -0800 [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation ## What changes were proposed in this pull request? This PR fix SQL Client tools can't show DBs by implementing Spark's own `GetSchemasOperation`. ## How was this patch tested? unit tests and manual tests ![image](https://user-images.githubusercontent.com/5399861/47782885-3dd5d400-dd3c-11e8-8586-59a8c15c7020.png) ![image](https://user-images.githubusercontent.com/5399861/47782899-4928ff80-dd3c-11e8-9d2d-ba9580ba4301.png) Closes #22903 from wangyum/SPARK-24196. Authored-by: Yuming Wang Signed-off-by: gatorsmile --- .../service/cli/operation/GetSchemasOperation.java | 2 +- .../thriftserver/SparkGetSchemasOperation.scala| 66 + .../server/SparkSQLOperationManager.scala | 17 +++- .../thriftserver/HiveThriftServer2Suites.scala | 16 .../thriftserver/SparkMetadataOperationSuite.scala | 103 + 5 files changed, 201 insertions(+), 3 deletions(-) diff --git a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java index d6f6280..3516bc2 100644 --- a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java +++ b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java @@ -41,7 +41,7 @@ public class GetSchemasOperation extends MetadataOperation { .addStringColumn("TABLE_SCHEM", "Schema name.") .addStringColumn("TABLE_CATALOG", "Catalog name."); - private RowSet rowSet; + protected RowSet rowSet; protected GetSchemasOperation(HiveSession parentSession, String catalogName, String schemaName) { diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala new file mode 100644 index 000..d585049 --- /dev/null +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.thriftserver + +import org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType +import org.apache.hive.service.cli._ +import org.apache.hive.service.cli.operation.GetSchemasOperation +import org.apache.hive.service.cli.operation.MetadataOperation.DEFAULT_HIVE_CATALOG +import org.apache.hive.service.cli.session.HiveSession + +import org.apache.spark.sql.SQLContext + +/** + * Spark's own GetSchemasOperation + * + * @param sqlContext SQLContext to use + * @param parentSession a HiveSession from SessionManager + * @param catalogName catalog name. null if not applicable. + * @param schemaName database name, null or a concrete database name + */ +private[hive] class SparkGetSchemasOperation( +sqlContext: SQLContext, +parentSession: HiveSession, +catalogName: String, +schemaName: String) + extends GetSchemasOperation(parentSession, catalogName, schemaName) { + + override def runInternal(): Unit = { +setState(OperationState.RUNNING) +// Always use the latest class loader provided by executionHive's state. +val executionHiveClassLoader = sqlContext.sharedState.jarClassLoader +Thread.currentThread().setContextClassLoader(executionHiveClassLoader) + +if (isAuthV2Enabled) { + val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName" + authorizeMetaGets(HiveOperationType.GET_TABLES, null, cmdStr) +} + +try { +
[spark] branch branch-2.4 updated: [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 3ece0aa [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER 3ece0aa is described below commit 3ece0aa479bd32732742d1d8e607de25520a9f5a Author: Dongjoon Hyun AuthorDate: Mon Jan 7 17:54:05 2019 -0800 [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER ## What changes were proposed in this pull request? This PR uses GitHub repository instead of GitBox because GitHub repo returns HTTP header status correctly. ## How was this patch tested? Manual. ``` $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.1-rc1]: ``` Closes #23482 from dongjoon-hyun/SPARK-26554-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 6f35ede31cc72a81e3852b1ac7454589d1897bfc) Signed-off-by: Dongjoon Hyun --- dev/create-release/release-util.sh | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/dev/create-release/release-util.sh b/dev/create-release/release-util.sh index 9a34052..5486c18 100755 --- a/dev/create-release/release-util.sh +++ b/dev/create-release/release-util.sh @@ -21,6 +21,7 @@ DRY_RUN=${DRY_RUN:-0} GPG="gpg --no-tty --batch" ASF_REPO="https://gitbox.apache.org/repos/asf/spark.git; ASF_REPO_WEBUI="https://gitbox.apache.org/repos/asf?p=spark.git; +ASF_GITHUB_REPO="https://github.com/apache/spark; function error { echo "$*" @@ -73,9 +74,7 @@ function fcreate_secure { } function check_for_tag { - # Check HTML body messages instead of header status codes. Apache GitBox returns - # a header with `200 OK` status code for both existing and non-existing tag URLs - ! curl -s --fail "$ASF_REPO_WEBUI;a=commit;h=$1" | grep '404 Not Found' > /dev/null + curl -s --head --fail "$ASF_GITHUB_REPO/releases/tag/$1" > /dev/null } function get_release_info { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6f35ede [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER 6f35ede is described below commit 6f35ede31cc72a81e3852b1ac7454589d1897bfc Author: Dongjoon Hyun AuthorDate: Mon Jan 7 17:54:05 2019 -0800 [SPARK-26554][BUILD][FOLLOWUP] Use GitHub instead of GitBox to check HEADER ## What changes were proposed in this pull request? This PR uses GitHub repository instead of GitBox because GitHub repo returns HTTP header status correctly. ## How was this patch tested? Manual. ``` $ ./do-release-docker.sh -d /tmp/test -n Branch [branch-2.4]: Current branch version is 2.4.1-SNAPSHOT. Release [2.4.1]: RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [v2.4.1-rc1]: ``` Closes #23482 from dongjoon-hyun/SPARK-26554-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/create-release/release-util.sh | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/dev/create-release/release-util.sh b/dev/create-release/release-util.sh index 9a34052..5486c18 100755 --- a/dev/create-release/release-util.sh +++ b/dev/create-release/release-util.sh @@ -21,6 +21,7 @@ DRY_RUN=${DRY_RUN:-0} GPG="gpg --no-tty --batch" ASF_REPO="https://gitbox.apache.org/repos/asf/spark.git; ASF_REPO_WEBUI="https://gitbox.apache.org/repos/asf?p=spark.git; +ASF_GITHUB_REPO="https://github.com/apache/spark; function error { echo "$*" @@ -73,9 +74,7 @@ function fcreate_secure { } function check_for_tag { - # Check HTML body messages instead of header status codes. Apache GitBox returns - # a header with `200 OK` status code for both existing and non-existing tag URLs - ! curl -s --fail "$ASF_REPO_WEBUI;a=commit;h=$1" | grep '404 Not Found' > /dev/null + curl -s --head --fail "$ASF_GITHUB_REPO/releases/tag/$1" > /dev/null } function get_release_info { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31805 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_17_16-faa4c28-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 8 01:31:28 2019 New Revision: 31805 Log: Apache Spark 2.4.1-SNAPSHOT-2019_01_07_17_16-faa4c28 docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][K8S] add missing docs for podTemplateContainerName properties
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5fb5a02 [MINOR][K8S] add missing docs for podTemplateContainerName properties 5fb5a02 is described below commit 5fb5a0292d9ced48860abe712a10cbb8e513b75a Author: Adrian Tanase AuthorDate: Mon Jan 7 19:03:38 2019 -0600 [MINOR][K8S] add missing docs for podTemplateContainerName properties ## What changes were proposed in this pull request? Adding docs for an enhancement that came in late in this PR: #22146 Currently the docs state that we're going to use the first container in a pod template, which was the implementation for some time, until it was improved with 2 new properties. ## How was this patch tested? I tested that the properties work by combining pod templates with client-mode and a simple pod template. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23155 from aditanase/k8s-readme. Authored-by: Adrian Tanase Signed-off-by: Sean Owen --- docs/running-on-kubernetes.md | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 3172b1b..3453ee9 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -229,8 +229,11 @@ pod template that will always be overwritten by Spark. Therefore, users of this the pod template file only lets Spark start with a template pod instead of an empty pod during the pod-building process. For details, see the [full list](#pod-template-properties) of pod template values that will be overwritten by spark. -Pod template files can also define multiple containers. In such cases, Spark will always assume that the first container in -the list will be the driver or executor container. +Pod template files can also define multiple containers. In such cases, you can use the spark properties +`spark.kubernetes.driver.podTemplateContainerName` and `spark.kubernetes.executor.podTemplateContainerName` +to indicate which container should be used as a basis for the driver or executor. +If not specified, or if the container name is not valid, Spark will assume that the first container in the list +will be the driver or executor container. ## Using Kubernetes Volumes @@ -932,16 +935,32 @@ specific to Spark on Kubernetes. spark.kubernetes.driver.podTemplateFile (none) - Specify the local file that contains the driver [pod template](#pod-template). For example - spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml` + Specify the local file that contains the driver pod template. For example + spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml + + + + spark.kubernetes.driver.podTemplateContainerName + (none) + + Specify the container name to be used as a basis for the driver in the given pod template. + For example spark.kubernetes.driver.podTemplateContainerName=spark-driver spark.kubernetes.executor.podTemplateFile (none) - Specify the local file that contains the executor [pod template](#pod-template). For example - spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml` + Specify the local file that contains the executor pod template. For example + spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml + + + + spark.kubernetes.executor.podTemplateContainerName + (none) + + Specify the container name to be used as a basis for the executor in the given pod template. + For example spark.kubernetes.executor.podTemplateContainerName=spark-executor - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka (2.4)
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new faa4c28 [SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka (2.4) faa4c28 is described below commit faa4c2823b69c1643d7678ee1cb0b7295c611334 Author: Shixiong Zhu AuthorDate: Mon Jan 7 16:53:07 2019 -0800 [SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka (2.4) ## What changes were proposed in this pull request? Backport #23324 to branch-2.4. ## How was this patch tested? Jenkins Closes #23365 from zsxwing/SPARK-26267-2.4. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu --- .../spark/sql/kafka010/KafkaContinuousReader.scala | 4 +- .../spark/sql/kafka010/KafkaMicroBatchReader.scala | 20 -- .../sql/kafka010/KafkaOffsetRangeCalculator.scala | 2 + .../spark/sql/kafka010/KafkaOffsetReader.scala | 80 -- .../apache/spark/sql/kafka010/KafkaSource.scala| 5 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 48 + 6 files changed, 146 insertions(+), 13 deletions(-) diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala index 8ce56a2..561d501 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala @@ -73,7 +73,7 @@ class KafkaContinuousReader( offset = start.orElse { val offsets = initialOffsets match { case EarliestOffsetRangeLimit => KafkaSourceOffset(offsetReader.fetchEarliestOffsets()) -case LatestOffsetRangeLimit => KafkaSourceOffset(offsetReader.fetchLatestOffsets()) +case LatestOffsetRangeLimit => KafkaSourceOffset(offsetReader.fetchLatestOffsets(None)) case SpecificOffsetRangeLimit(p) => offsetReader.fetchSpecificOffsets(p, reportDataLoss) } logInfo(s"Initial offsets: $offsets") @@ -128,7 +128,7 @@ class KafkaContinuousReader( } override def needsReconfiguration(): Boolean = { -knownPartitions != null && offsetReader.fetchLatestOffsets().keySet != knownPartitions +knownPartitions != null && offsetReader.fetchLatestOffsets(None).keySet != knownPartitions } override def toString(): String = s"KafkaSource[$offsetReader]" diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala index 8cc989f..b6c8035 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala @@ -93,7 +93,8 @@ private[kafka010] class KafkaMicroBatchReader( endPartitionOffsets = Option(end.orElse(null)) .map(_.asInstanceOf[KafkaSourceOffset].partitionToOffsets) .getOrElse { - val latestPartitionOffsets = kafkaOffsetReader.fetchLatestOffsets() + val latestPartitionOffsets = +kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets)) maxOffsetsPerTrigger.map { maxOffsets => rateLimit(maxOffsets, startPartitionOffsets, latestPartitionOffsets) }.getOrElse { @@ -132,10 +133,21 @@ private[kafka010] class KafkaMicroBatchReader( }.toSeq logDebug("TopicPartitions: " + topicPartitions.mkString(", ")) +val fromOffsets = startPartitionOffsets ++ newPartitionInitialOffsets +val untilOffsets = endPartitionOffsets +untilOffsets.foreach { case (tp, untilOffset) => + fromOffsets.get(tp).foreach { fromOffset => +if (untilOffset < fromOffset) { + reportDataLoss(s"Partition $tp's offset was changed from " + +s"$fromOffset to $untilOffset, some data may have been missed") +} + } +} + // Calculate offset ranges val offsetRanges = rangeCalculator.getRanges( - fromOffsets = startPartitionOffsets ++ newPartitionInitialOffsets, - untilOffsets = endPartitionOffsets, + fromOffsets = fromOffsets, + untilOffsets = untilOffsets, executorLocations = getSortedExecutorList()) // Reuse Kafka consumers only when all the offset ranges have distinct TopicPartitions, @@ -192,7 +204,7 @@ private[kafka010] class KafkaMicroBatchReader( case EarliestOffsetRangeLimit => KafkaSourceOffset(kafkaOffsetReader.fetchEarliestOffsets()) case LatestOffsetRangeLimit => - KafkaSourceOffset(kafkaOffsetReader.fetchLatestOffsets()) +
[spark] branch master updated: [SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5102ccc [SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files 5102ccc is described below commit 5102ccc4ab6e30caa5510131dee7098b4f3ad32e Author: Hyukjin Kwon AuthorDate: Mon Jan 7 15:48:54 2019 -0800 [SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files ## What changes were proposed in this pull request? The PR https://github.com/apache/spark/pull/23446 happened to introduce a behaviour change - empty dataframes can't be read anymore from underscore files. It looks controversial to allow or disallow this case so this PR targets to fix to issue warning instead of throwing an exception to be more conservative. **Before** ```scala scala> spark.read.schema("a int").parquet("_tmp*").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:651) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:635) ... 49 elided scala> spark.read.text("_tmp*").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:723) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:695) ... 49 elided ``` **After** ```scala scala> spark.read.schema("a int").parquet("_tmp*").show() 19/01/07 15:14:43 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +---+ | a| +---+ +---+ scala> spark.read.text("_tmp*").show() 19/01/07 15:14:51 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +-+ |value| +-+ +-+ ``` ## How was this patch tested? Manually tested as above. Closes #23481 from HyukjinKwon/SPARK-26339. Authored-by: Hyukjin Kwon Signed-off-by: gatorsmile --- .../spark/sql/execution/datasources/DataSource.scala | 6 +++--- sql/core/src/test/resources/test-data/_cars.csv | 7 --- .../sql/execution/datasources/csv/CSVSuite.scala | 20 3 files changed, 3 insertions(+), 30 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 2a438a5..5dad784 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -567,11 +567,11 @@ case class DataSource( } if (filteredOut.nonEmpty) { if (filteredIn.isEmpty) { - throw new AnalysisException( -s"All paths were ignored:\n${filteredOut.mkString("\n ")}") + logWarning( +s"All paths were ignored:\n ${filteredOut.mkString("\n ")}") } else { logDebug( -s"Some paths were ignored:\n${filteredOut.mkString("\n ")}") +s"Some paths were ignored:\n ${filteredOut.mkString("\n ")}") } } } diff --git a/sql/core/src/test/resources/test-data/_cars.csv b/sql/core/src/test/resources/test-data/_cars.csv deleted file mode 100644 index 40ded57..000 --- a/sql/core/src/test/resources/test-data/_cars.csv +++ /dev/null @@ -1,7 +0,0 @@ - -year,make,model,comment,blank -"2012","Tesla","S","No comment", - -1997,Ford,E350,"Go get one now they are going fast", -2015,Chevy,Volt - diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index fb1bedf..d9e5d7a 100644 ---
[spark] branch master updated: [SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test categories
This is an automated email from the ASF dual-hosted git repository. vanzin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1a64152 [SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test categories 1a64152 is described below commit 1a641525e60039cc6b10816e946cb6f44b3e2696 Author: Marco Gaido AuthorDate: Mon Jan 7 15:35:33 2019 -0800 [SPARK-26491][CORE][TEST] Use ConfigEntry for hardcoded configs for test categories ## What changes were proposed in this pull request? The PR makes hardcoded `spark.test` and `spark.testing` configs to use `ConfigEntry` and put them in the config package. ## How was this patch tested? existing UTs Closes #23413 from mgaido91/SPARK-26491. Authored-by: Marco Gaido Signed-off-by: Marcelo Vanzin --- .../apache/spark/ExecutorAllocationManager.scala | 4 +- .../main/scala/org/apache/spark/SparkContext.scala | 3 +- .../spark/deploy/history/FsHistoryProvider.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 4 +- .../spark/executor/ProcfsMetricsGetter.scala | 2 +- .../org/apache/spark/executor/TaskMetrics.scala| 3 +- .../org/apache/spark/internal/config/Tests.scala | 56 ++ .../apache/spark/memory/StaticMemoryManager.scala | 5 +- .../apache/spark/memory/UnifiedMemoryManager.scala | 7 +-- .../org/apache/spark/scheduler/DAGScheduler.scala | 3 +- .../cluster/StandaloneSchedulerBackend.scala | 3 +- .../org/apache/spark/util/SizeEstimator.scala | 5 +- .../main/scala/org/apache/spark/util/Utils.scala | 5 +- .../scala/org/apache/spark/DistributedSuite.scala | 5 +- .../spark/ExecutorAllocationManagerSuite.scala | 3 +- .../test/scala/org/apache/spark/ShuffleSuite.scala | 5 +- .../scala/org/apache/spark/SparkFunSuite.scala | 3 +- .../history/HistoryServerArgumentsSuite.scala | 6 +-- .../spark/deploy/history/HistoryServerSuite.scala | 7 +-- .../spark/memory/StaticMemoryManagerSuite.scala| 5 +- .../spark/memory/UnifiedMemoryManagerSuite.scala | 33 ++--- .../spark/scheduler/BarrierTaskContextSuite.scala | 7 +-- .../scheduler/BlacklistIntegrationSuite.scala | 19 .../shuffle/sort/ShuffleExternalSorterSuite.scala | 5 +- .../storage/BlockManagerReplicationSuite.scala | 9 ++-- .../apache/spark/storage/BlockManagerSuite.scala | 10 ++-- .../apache/spark/storage/MemoryStoreSuite.scala| 1 - .../org/apache/spark/util/SizeEstimatorSuite.scala | 5 +- .../collection/ExternalAppendOnlyMapSuite.scala| 3 +- .../util/collection/ExternalSorterSuite.scala | 3 +- .../integrationtest/KubernetesTestComponents.scala | 3 +- .../mesos/MesosCoarseGrainedSchedulerBackend.scala | 3 +- .../apache/spark/sql/execution/SQLExecution.scala | 4 +- .../org/apache/spark/sql/BenchmarkQueryTest.scala | 3 +- .../sql/execution/UnsafeRowSerializerSuite.scala | 3 +- 35 files changed, 165 insertions(+), 83 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala index d966582..0807e65 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala @@ -27,6 +27,7 @@ import com.codahale.metrics.{Gauge, MetricRegistry} import org.apache.spark.internal.{config, Logging} import org.apache.spark.internal.config._ +import org.apache.spark.internal.config.Tests.TEST_SCHEDULE_INTERVAL import org.apache.spark.metrics.source.Source import org.apache.spark.scheduler._ import org.apache.spark.storage.BlockManagerMaster @@ -157,7 +158,7 @@ private[spark] class ExecutorAllocationManager( // Polling loop interval (ms) private val intervalMillis: Long = if (Utils.isTesting) { - conf.getLong(TESTING_SCHEDULE_INTERVAL_KEY, 100) + conf.get(TEST_SCHEDULE_INTERVAL) } else { 100 } @@ -899,5 +900,4 @@ private[spark] class ExecutorAllocationManager( private object ExecutorAllocationManager { val NOT_SET = Long.MaxValue - val TESTING_SCHEDULE_INTERVAL_KEY = "spark.testing.dynamicAllocation.scheduleInterval" } diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 89be9de..3a1e1b9 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -45,6 +45,7 @@ import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil} import org.apache.spark.input.{FixedLengthBinaryInputFormat, PortableDataStream, StreamInputFormat, WholeTextFileInputFormat} import org.apache.spark.internal.Logging import org.apache.spark.internal.config._ +import
svn commit: r31803 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_15_09-98be895-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 7 23:22:22 2019 New Revision: 31803 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_07_15_09-98be895 docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new b4202e7 [SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource b4202e7 is described below commit b4202e79833f3adc00afe00f43e8d9165c9c8e48 Author: wuyi AuthorDate: Mon Jan 7 16:22:28 2019 -0600 [SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource ## What changes were proposed in this pull request? As I mentioned in jira [SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to maxmize the use of cluster resource, this pr try to make `YarnAllocator` have the same blacklist behaviour with YARN. ## How was this patch tested? Added. Closes #23368 from Ngone51/dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN-branch-2.4. Lead-authored-by: wuyi Co-authored-by: Ngone51 Signed-off-by: Thomas Graves --- .../apache/spark/deploy/yarn/YarnAllocator.scala | 31 ++-- .../yarn/YarnAllocatorBlacklistTracker.scala | 4 +- .../yarn/YarnAllocatorBlacklistTrackerSuite.scala | 2 +- .../spark/deploy/yarn/YarnAllocatorSuite.scala | 83 -- 4 files changed, 107 insertions(+), 13 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index f4dc80a..3357084 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -578,13 +578,23 @@ private[yarn] class YarnAllocator( (true, memLimitExceededLogMessage( completedContainer.getDiagnostics, PMEM_EXCEEDED_PATTERN)) - case _ => -// all the failures which not covered above, like: -// disk failure, kill by app master or resource manager, ... -allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt) -(true, "Container marked as failed: " + containerId + onHostStr + - ". Exit status: " + completedContainer.getExitStatus + - ". Diagnostics: " + completedContainer.getDiagnostics) + case other_exit_status => +// SPARK-26269: follow YARN's blacklisting behaviour(see https://github +// .com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/had +// oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap +// ache/hadoop/yarn/util/Apps.java#L273 for details) +if (NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) { + (false, s"Container marked as failed: $containerId$onHostStr" + +s". Exit status: ${completedContainer.getExitStatus}" + +s". Diagnostics: ${completedContainer.getDiagnostics}.") +} else { + // completed container from a bad node + allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt) + (true, s"Container from a bad node: $containerId$onHostStr" + +s". Exit status: ${completedContainer.getExitStatus}" + +s". Diagnostics: ${completedContainer.getDiagnostics}.") +} + } if (exitCausedByApp) { @@ -722,4 +732,11 @@ private object YarnAllocator { "Consider boosting spark.yarn.executor.memoryOverhead or " + "disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714." } + val NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS = Set( +ContainerExitStatus.KILLED_BY_RESOURCEMANAGER, +ContainerExitStatus.KILLED_BY_APPMASTER, +ContainerExitStatus.KILLED_AFTER_APP_COMPLETION, +ContainerExitStatus.ABORTED, +ContainerExitStatus.DISKS_FAILED + ) } diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala index ceac7cd..268976b 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala @@ -120,7 +120,9 @@ private[spark] class YarnAllocatorBlacklistTracker( if (removals.nonEmpty) { logInfo(s"removing nodes from YARN application master's blacklist: $removals") } -amClient.updateBlacklist(additions.asJava, removals.asJava) +if (additions.nonEmpty || removals.nonEmpty) { +
[spark] branch master updated: [SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 98be895 [SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field 98be895 is described below commit 98be8953c75c026c1cb432cc8f66dd312feed0c6 Author: maryannxue AuthorDate: Mon Jan 7 13:59:40 2019 -0800 [SPARK-26065][SQL] Change query hint from a `LogicalPlan` to a field ## What changes were proposed in this pull request? The existing query hint implementation relies on a logical plan node `ResolvedHint` to store query hints in logical plans, and on `Statistics` in physical plans. Since `ResolvedHint` is not really a logical operator and can break the pattern matching for existing and future optimization rules, it is a issue to the Optimizer as the old `AnalysisBarrier` was to the Analyzer. Given the fact that all our query hints are either 1) a join hint, i.e., broadcast hint; or 2) a re-partition hint, which is indeed an operator, we only need to add a hint field on the Join plan and that will be a good enough solution for the current hint usage. This PR is to let `Join` node have a hint for its left sub-tree and another hint for its right sub-tree and each hint is a merged result of all the effective hints specified in the corresponding sub-tree. The "effectiveness" of a hint, i.e., whether that hint should be propagated to the `Join` node, is currently consistent with the hint propagation rules originally implemented in the `Statistics` approach. Note that the `ResolvedHint` node still has to live through the analysis stage [...] This PR also introduces a change in how hints work with join reordering. Before this PR, hints would stop join reordering. For example, in "a.join(b).join(c).hint("broadcast").join(d)", the broadcast hint would stop d from participating in the cost-based join reordering while still allowing reordering from under the hint node. After this PR, though, the broadcast hint will not interfere with join reordering at all, and after reordering if a relation associated with a hint stays unchan [...] ## How was this patch tested? Added new tests. Closes #23036 from maryannxue/query-hint. Authored-by: maryannxue Signed-off-by: gatorsmile --- .../spark/sql/catalyst/analysis/Analyzer.scala | 16 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 4 +- .../catalyst/analysis/StreamingJoinHelper.scala| 2 +- .../analysis/UnsupportedOperationChecker.scala | 2 +- .../apache/spark/sql/catalyst/dsl/package.scala| 2 +- .../catalyst/optimizer/CostBasedJoinReorder.scala | 84 ++--- .../catalyst/optimizer/EliminateResolvedHint.scala | 59 +++ .../spark/sql/catalyst/optimizer/Optimizer.scala | 36 ++-- .../optimizer/PropagateEmptyRelation.scala | 2 +- .../ReplaceNullWithFalseInPredicate.scala | 2 +- .../spark/sql/catalyst/optimizer/expressions.scala | 3 +- .../spark/sql/catalyst/optimizer/joins.scala | 27 +-- .../spark/sql/catalyst/optimizer/subquery.scala| 14 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +- .../spark/sql/catalyst/planning/patterns.scala | 41 +++-- .../plans/logical/LogicalPlanVisitor.scala | 3 - .../sql/catalyst/plans/logical/Statistics.scala| 7 +- .../plans/logical/basicLogicalOperators.scala | 14 +- .../spark/sql/catalyst/plans/logical/hints.scala | 27 ++- .../statsEstimation/AggregateEstimation.scala | 3 +- .../statsEstimation/BasicStatsPlanVisitor.scala| 2 - .../logical/statsEstimation/JoinEstimation.scala | 2 +- .../SizeInBytesOnlyStatsPlanVisitor.scala | 22 +-- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 8 +- .../sql/catalyst/analysis/AnalysisSuite.scala | 4 +- .../sql/catalyst/analysis/ResolveHintsSuite.scala | 2 +- .../catalyst/optimizer/ColumnPruningSuite.scala| 6 +- .../catalyst/optimizer/FilterPushdownSuite.scala | 14 -- .../catalyst/optimizer/JoinOptimizationSuite.scala | 28 +-- .../sql/catalyst/optimizer/JoinReorderSuite.scala | 83 - .../catalyst/optimizer/ReplaceOperatorSuite.scala | 8 +- .../apache/spark/sql/catalyst/plans/PlanTest.scala | 10 +- .../spark/sql/catalyst/plans/SameResultSuite.scala | 16 +- .../BasicStatsEstimationSuite.scala| 18 -- .../statsEstimation/FilterEstimationSuite.scala| 2 +- .../statsEstimation/JoinEstimationSuite.scala | 31 ++-- .../main/scala/org/apache/spark/sql/Dataset.scala | 16 +- .../spark/sql/execution/SparkStrategies.scala | 52 +++--- .../sql/execution/columnar/InMemoryRelation.scala | 9 +- .../org/apache/spark/sql/CachedTableSuite.scala| 21 +++ .../org/apache/spark/sql/DataFrameJoinSuite.scala
[spark-website] branch asf-site updated (222263d -> eb0aa14)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git. from 63d Hotfix site links in sitemap.xml add eb0aa14 Suggest new Apache repo for committers No new revisions were added by this update. Summary of changes: committers.md| 4 ++-- site/committers.html | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-25689][YARN] Make driver, not AM, manage delegation tokens.
This is an automated email from the ASF dual-hosted git repository. irashid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 669e8a1 [SPARK-25689][YARN] Make driver, not AM, manage delegation tokens. 669e8a1 is described below commit 669e8a155987995a1a5d49a96b88c05f39e41723 Author: Marcelo Vanzin AuthorDate: Mon Jan 7 14:40:08 2019 -0600 [SPARK-25689][YARN] Make driver, not AM, manage delegation tokens. This change modifies the behavior of the delegation token code when running on YARN, so that the driver controls the renewal, in both client and cluster mode. For that, a few different things were changed: * The AM code only runs code that needs DTs when DTs are available. In a way, this restores the AM behavior to what it was pre-SPARK-23361, but keeping the fix added in that bug. Basically, all the AM code is run in a "UGI.doAs()" block; but code that needs to talk to HDFS (basically the distributed cache handling code) was delayed to the point where the driver is up and running, and thus when valid delegation tokens are available. * SparkSubmit / ApplicationMaster now handle user login, not the token manager. The previous AM code was relying on the token manager to keep the user logged in when keytabs are used. This required some odd APIs in the token manager and the AM so that the right UGI was exposed and used in the right places. After this change, the logged in user is handled separately from the token manager, so the API was cleaned up, and, as explained above, the whole AM runs under the logged in user, which also helps with simplifying some more code. * Distributed cache configs are sent separately to the AM. Because of the delayed initialization of the cached resources in the AM, it became easier to write the cache config to a separate properties file instead of bundling it with the rest of the Spark config. This also avoids having to modify the SparkConf to hide things from the UI. * Finally, the AM doesn't manage the token manager anymore. The above changes allow the token manager to be completely handled by the driver's scheduler backend code also in YARN mode (whether client or cluster), making it similar to other RMs. To maintain the fix added in SPARK-23361 also in client mode, the AM now sends an extra message to the driver on initialization to fetch delegation tokens; and although it might not really be needed, the driver also keeps the running AM updated when new tokens are created. Tested in a kerberized cluster with the same tests used to validate SPARK-23361, in both client and cluster mode. Also tested with a non-kerberized cluster. Closes #23338 from vanzin/SPARK-25689. Authored-by: Marcelo Vanzin Signed-off-by: Imran Rashid --- .../security/HadoopDelegationTokenManager.scala| 110 ++--- .../security/HiveDelegationTokenProvider.scala | 16 ++- .../cluster/CoarseGrainedClusterMessage.scala | 3 + .../cluster/CoarseGrainedSchedulerBackend.scala| 40 -- .../HadoopDelegationTokenManagerSuite.scala| 8 +- .../features/KerberosConfDriverFeatureStep.scala | 2 +- .../k8s/KubernetesClusterSchedulerBackend.scala| 7 +- .../mesos/MesosCoarseGrainedSchedulerBackend.scala | 7 +- .../spark/deploy/yarn/ApplicationMaster.scala | 135 ++--- .../deploy/yarn/ApplicationMasterArguments.scala | 5 + .../org/apache/spark/deploy/yarn/Client.scala | 100 --- .../apache/spark/deploy/yarn/YarnRMClient.scala| 8 +- .../org/apache/spark/deploy/yarn/config.scala | 10 -- .../YARNHadoopDelegationTokenManager.scala | 7 +- .../cluster/YarnClientSchedulerBackend.scala | 6 + .../scheduler/cluster/YarnSchedulerBackend.scala | 17 ++- .../YARNHadoopDelegationTokenManagerSuite.scala| 2 +- 17 files changed, 246 insertions(+), 237 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala index f7e3dde..d97857a 100644 --- a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala @@ -21,7 +21,6 @@ import java.io.File import java.net.URI import java.security.PrivilegedExceptionAction import java.util.concurrent.{ScheduledExecutorService, TimeUnit} -import java.util.concurrent.atomic.AtomicReference import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem @@ -39,32 +38,24 @@ import org.apache.spark.util.ThreadUtils /** * Manager for
svn commit: r31801 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_10_46-71183b2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 7 18:59:19 2019 New Revision: 31801 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_07_10_46-71183b2 docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 2.2.4-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 209af563d61d985f7d95621614c34b4995db9318 Author: Dongjoon Hyun AuthorDate: Mon Jan 7 17:48:40 2019 + Preparing development version 2.2.4-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 38 files changed, 39 insertions(+), 39 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index ad72330..f3af8d4 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.2.3 +Version: 2.2.4 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index af02caa..fc5d51d 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index cf5cb38..5be1ee5 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 9f30485..f12c239 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 9a68884..ae952a9 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 8f5ec41..9d756fd 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 6c3e22c..d11d9c7 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 46dfab2..43db301 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../../pom.xml diff --git a/core/pom.xml b/core/pom.xml index be29a61..05a3abf 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.3 +2.2.4-SNAPSHOT ../pom.xml diff --git a/docs/_config.yml b/docs/_config.yml index b9145cc..2cb4dd0 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -14,8 +14,8 @@ include: #
[spark] branch branch-2.2 updated (a642a6d -> 209af56)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.2 in repository https://gitbox.apache.org/repos/asf/spark.git. from a642a6d [MINOR][BUILD] Fix script name in `release-tag.sh` usage message add 4acb6ba Preparing Spark release v2.2.3-rc1 new 209af56 Preparing development version 2.2.4-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 38 files changed, 39 insertions(+), 39 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v2.2.3-rc1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to tag v2.2.3-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 4acb6ba37b94b90aac445e6546426145a5f9eba2 Author: Dongjoon Hyun AuthorDate: Mon Jan 7 17:48:24 2019 + Preparing Spark release v2.2.3-rc1 --- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 37 files changed, 37 insertions(+), 37 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index f9ec6e7..af02caa 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 55d29d5..cf5cb38 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 6d84766..9f30485 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 6228be6..9a68884 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index d511d40..8f5ec41 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 1c6f6ac..6c3e22c 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index dee2b01..46dfab2 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../../pom.xml diff --git a/core/pom.xml b/core/pom.xml index a300a59..be29a61 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.3-SNAPSHOT +2.2.3 ../pom.xml diff --git a/docs/_config.yml b/docs/_config.yml index 320b077..b9145cc 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -14,7 +14,7 @@ include: # These allow the documentation to be updated with newer releases # of Spark, Scala, and Mesos. -SPARK_VERSION: 2.2.3-SNAPSHOT +SPARK_VERSION: 2.2.3 SPARK_VERSION_SHORT: 2.2.3 SCALA_BINARY_VERSION: "2.11" SCALA_VERSION: "2.11.8" diff --git a/examples/pom.xml b/examples/pom.xml index 89f86f5..4acdba2 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -
[spark] tag v2.2.3-rc1 created (now 4acb6ba)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v2.2.3-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 4acb6ba (commit) This tag includes the following new commits: new 4acb6ba Preparing Spark release v2.2.3-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-24489][ML] Check for invalid input type of weight data in ml.PowerIterationClustering
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 71183b2 [SPARK-24489][ML] Check for invalid input type of weight data in ml.PowerIterationClustering 71183b2 is described below commit 71183b283343a99c6fa99a41268dae412598067f Author: Shahid AuthorDate: Mon Jan 7 09:15:50 2019 -0800 [SPARK-24489][ML] Check for invalid input type of weight data in ml.PowerIterationClustering ## What changes were proposed in this pull request? The test case will result the following failure. currently in ml.PIC, there is no check for the data type of weight column. ``` test("invalid input types for weight") { val invalidWeightData = spark.createDataFrame(Seq( (0L, 1L, "a"), (2L, 3L, "b") )).toDF("src", "dst", "weight") val pic = new PowerIterationClustering() .setWeightCol("weight") val result = pic.assignClusters(invalidWeightData) } ``` ``` Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, executor driver): scala.MatchError: [0,1,null] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178) at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107) at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) ``` In this PR, added check types for weight column. ## How was this patch tested? UT added Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #21509 from shahidki31/testCasePic. Authored-by: Shahid Signed-off-by: Holden Karau --- .../spark/ml/clustering/PowerIterationClustering.scala| 1 + .../ml/clustering/PowerIterationClusteringSuite.scala | 15 +++ 2 files changed, 16 insertions(+) diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala index d9a330f..149e99d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala @@ -166,6 +166,7 @@ class PowerIterationClustering private[clustering] ( val w = if (!isDefined(weightCol) || $(weightCol).isEmpty) { lit(1.0) } else { + SchemaUtils.checkNumericType(dataset.schema, $(weightCol)) col($(weightCol)).cast(DoubleType) } diff --git a/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala index 55b460f..0ba3ffa 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala @@ -145,6 +145,21 @@ class PowerIterationClusteringSuite extends SparkFunSuite assert(msg.contains("Similarity must be nonnegative")) } + test("check for invalid input types of weight") { +val invalidWeightData = spark.createDataFrame(Seq( + (0L, 1L, "a"), + (2L, 3L, "b") +)).toDF("src", "dst", "weight") + +val msg = intercept[IllegalArgumentException] { + new PowerIterationClustering() +.setWeightCol("weight") +.assignClusters(invalidWeightData) +}.getMessage +assert(msg.contains("requirement failed: Column weight must be of type numeric" + + " but was actually of type string.")) + } + test("test default weight") { val dataWithoutWeight = data.sample(0.5, 1L).select('src, 'dst) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a927c764 -> 868e025)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a927c764 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9 add 868e025 [SPARK-26383][CORE] NPE when use DataFrameReader.jdbc with wrong URL No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala| 7 ++- .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala| 13 + 2 files changed, 19 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31798 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_06_17-a927c76-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 7 14:30:21 2019 New Revision: 31798 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_07_06_17-a927c76 docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31790 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_07_04_04-cb1aad6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 7 12:18:44 2019 New Revision: 31790 Log: Apache Spark 2.4.1-SNAPSHOT-2019_01_07_04_04-cb1aad6 docs [This commit notification would consist of 1476 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new cb1aad6 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9 cb1aad6 is described below commit cb1aad69b781bf9612b9b14f5338b338344365f4 Author: Liang-Chi Hsieh AuthorDate: Mon Jan 7 18:36:52 2019 +0800 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9 ## What changes were proposed in this pull request? Due to [API change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc) at 1.9, PySpark image doesn't work with numpy version prior to 1.9. When running image test with numpy version prior to 1.9, we can see error: ``` test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR test_read_images_multiple_times (pyspark.ml.tests.test_image.ImageReaderTest2) ... ok == ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) -- Traceback (most recent call last): File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", line 36, in test_read_images self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), first_row) File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", line 193, in toImage data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) AttributeError: 'numpy.ndarray' object has no attribute 'tobytes' -- Ran 2 tests in 29.040s FAILED (errors=1) ``` ## How was this patch tested? Manually test with numpy version prior and after 1.9. Closes #23484 from viirya/fix-pyspark-image. Authored-by: Liang-Chi Hsieh Signed-off-by: Hyukjin Kwon (cherry picked from commit a927c764c1eee066efc1c2c713dfee411de79245) Signed-off-by: Hyukjin Kwon --- python/pyspark/ml/image.py | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py index edb90a3..a1aacea 100644 --- a/python/pyspark/ml/image.py +++ b/python/pyspark/ml/image.py @@ -28,6 +28,7 @@ import sys import warnings import numpy as np +from distutils.version import LooseVersion from pyspark import SparkContext from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string @@ -190,7 +191,11 @@ class _ImageSchema(object): # Running `bytearray(numpy.array([1]))` fails in specific Python versions # with a specific Numpy version, for example in Python 3.6.0 and NumPy 1.13.3. # Here, it avoids it by converting it to bytes. -data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) +if LooseVersion(np.__version__) >= LooseVersion('1.9'): +data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) +else: +# Numpy prior to 1.9 don't have `tobytes` method. +data = bytearray(array.astype(dtype=np.uint8).ravel()) # Creating new Row with _create_row(), because Row(name = value, ... ) # orders fields by name, which conflicts with expected schema order - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a927c764 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9 a927c764 is described below commit a927c764c1eee066efc1c2c713dfee411de79245 Author: Liang-Chi Hsieh AuthorDate: Mon Jan 7 18:36:52 2019 +0800 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9 ## What changes were proposed in this pull request? Due to [API change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc) at 1.9, PySpark image doesn't work with numpy version prior to 1.9. When running image test with numpy version prior to 1.9, we can see error: ``` test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR test_read_images_multiple_times (pyspark.ml.tests.test_image.ImageReaderTest2) ... ok == ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) -- Traceback (most recent call last): File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", line 36, in test_read_images self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), first_row) File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", line 193, in toImage data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) AttributeError: 'numpy.ndarray' object has no attribute 'tobytes' -- Ran 2 tests in 29.040s FAILED (errors=1) ``` ## How was this patch tested? Manually test with numpy version prior and after 1.9. Closes #23484 from viirya/fix-pyspark-image. Authored-by: Liang-Chi Hsieh Signed-off-by: Hyukjin Kwon --- python/pyspark/ml/image.py | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py index edb90a3..a1aacea 100644 --- a/python/pyspark/ml/image.py +++ b/python/pyspark/ml/image.py @@ -28,6 +28,7 @@ import sys import warnings import numpy as np +from distutils.version import LooseVersion from pyspark import SparkContext from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string @@ -190,7 +191,11 @@ class _ImageSchema(object): # Running `bytearray(numpy.array([1]))` fails in specific Python versions # with a specific Numpy version, for example in Python 3.6.0 and NumPy 1.13.3. # Here, it avoids it by converting it to bytes. -data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) +if LooseVersion(np.__version__) >= LooseVersion('1.9'): +data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes()) +else: +# Numpy prior to 1.9 don't have `tobytes` method. +data = bytearray(array.astype(dtype=np.uint8).ravel()) # Creating new Row with _create_row(), because Row(name = value, ... ) # orders fields by name, which conflicts with expected schema order - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r31789 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_07_01_53-468d25e-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 7 10:05:48 2019 New Revision: 31789 Log: Apache Spark 3.0.0-SNAPSHOT-2019_01_07_01_53-468d25e docs [This commit notification would consist of 1774 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org