[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21318 @rxin re: https://github.com/apache/spark/pull/21318#discussion_r20582 I meant to say for instance, like "Please refer the SQL function documentation in the corresponding version". We don't have to bother update and also it makes sense .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21894: [MINOR][BUILD] Remove -Phive-thriftserver profile within...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21894 cc @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18942: [BACKPORT-2.1][SPARK-19372][SQL] Fix throwing a Java exc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18942 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21902: [SPARK-24952][SQL] Support LZMA2 compression by A...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21902#discussion_r205934042 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1451,16 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.xz.level") --- End diff -- Just for clarification, @MaxGekk, this configuration is not for keeping the configuration in spark-avro (as a third party) but something you here newly propose? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93709/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r205933934 --- Diff: python/pyspark/streaming/tests.py --- @@ -206,6 +207,22 @@ def func(dstream): expected = [[len(x)] for x in input] self._test_func(input, func, expected) +def test_slice(self): --- End diff -- @holdenk Comments please on __test_slice()__ and on the test results https://github.com/apache/spark/pull/20838#issuecomment-408566860 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for bui...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21906 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93709/testReport)** for PR 21103 at commit [`6ef1f22`](https://github.com/apache/spark/commit/6ef1f22aa68d52cf0c00b21211e19d3f80bab7c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` sealed class Hasher[@specialized(Long, Int, Double, Float) T] extends Serializable ` * ` class DoubleHasher extends Hasher[Double] ` * ` class FloatHasher extends Hasher[Float] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21886 @gatorsmile Thank you.. I will make the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21650 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21886 The code looks good to me. Let us improve the test cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933857 --- Diff: sql/core/src/test/resources/sql-tests/results/intersect-all.sql.out --- @@ -0,0 +1,212 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3) +AS tab1(k, v) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), --- End diff -- also add another duplicate rows for (1, 2); --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21906 LGTM. Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933861 --- Diff: sql/core/src/test/resources/sql-tests/results/intersect-all.sql.out --- @@ -0,0 +1,212 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), --- End diff -- also add another duplicate row (1, 3) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933866 --- Diff: sql/core/src/test/resources/sql-tests/results/intersect-all.sql.out --- @@ -0,0 +1,212 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3) +AS tab1(k, v) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(2, 3) --- End diff -- add one more row (3, 4) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933837 --- Diff: sql/core/src/test/resources/sql-tests/results/intersect-all.sql.out --- @@ -0,0 +1,212 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 17 + + +-- !query 0 +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3) +AS tab1(k, v) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(2, 3) +AS tab2(k, v) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +-- !query 2 schema +struct +-- !query 2 output +1 2 +2 3 + + +-- !query 3 +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab1 WHERE k = 1 +-- !query 3 schema +struct +-- !query 3 output +1 2 +1 2 +1 3 + + +-- !query 4 +SELECT * FROM tab1 WHERE k > 2 +INTERSECT ALL +SELECT * FROM tab2 +-- !query 4 schema +struct +-- !query 4 output + + + +-- !query 5 +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 WHERE k > 2 +-- !query 5 schema +struct +-- !query 5 output + + + +-- !query 6 +SELECT * FROM tab1 +INTERSECT ALL +SELECT CAST(1 AS BIGINT), CAST(2 AS BIGINT) +-- !query 6 schema +struct +-- !query 6 output +1 2 + + +-- !query 7 +SELECT * FROM tab1 +INTERSECT ALL +SELECT array(1), 2 +-- !query 7 schema +struct<> +-- !query 7 output +org.apache.spark.sql.AnalysisException +IntersectAll can only be performed on tables with the compatible column types. array <> int at the first column of the second table; + + +-- !query 8 +SELECT c1 FROM tab1 +INTERSECT ALL +SELECT c1, c2 FROM tab2 --- End diff -- use `k` and `v` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 LGTM. Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21906 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21906 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93712/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21906 **[Test build #93712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93712/testReport)** for PR 21906 at commit [`6d703e8`](https://github.com/apache/spark/commit/6d703e8661070a90eee8edd932dfd628bd9982f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1934,6 +1934,23 @@ class Dataset[T] private[sql]( Intersect(planWithBarrier, other.planWithBarrier) } + /** + * Returns a new Dataset containing rows only in both this Dataset and another Dataset while + * preserving the duplicates. + * This is equivalent to `INTERSECT ALL` in SQL. + * + * @note Equality checking is performed directly on the encoded representation of the data + * and thus is not affected by a custom `equals` function defined on `T`. Also as standard + * in SQL, this function resolves columns by position (not by name). + * + * @group typedrel + * @since 2.4.0 + */ + def intersectAll(other: Dataset[T]): Dataset[T] = withSetOperator { +Intersect(planWithBarrier, other.planWithBarrier, isAll = true) --- End diff -- yes. Please do it too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933642 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1934,6 +1934,23 @@ class Dataset[T] private[sql]( Intersect(planWithBarrier, other.planWithBarrier) } + /** + * Returns a new Dataset containing rows only in both this Dataset and another Dataset while + * preserving the duplicates. + * This is equivalent to `INTERSECT ALL` in SQL. + * + * @note Equality checking is performed directly on the encoded representation of the data + * and thus is not affected by a custom `equals` function defined on `T`. Also as standard + * in SQL, this function resolves columns by position (not by name). + * + * @group typedrel + * @since 2.4.0 + */ + def intersectAll(other: Dataset[T]): Dataset[T] = withSetOperator { +Intersect(planWithBarrier, other.planWithBarrier, isAll = true) --- End diff -- @gatorsmile Sure.. how about exceptAll that was checked in today ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93708/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21906 **[Test build #93712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93712/testReport)** for PR 21906 at commit [`6d703e8`](https://github.com/apache/spark/commit/6d703e8661070a90eee8edd932dfd628bd9982f6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21886#discussion_r205933541 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1934,6 +1934,23 @@ class Dataset[T] private[sql]( Intersect(planWithBarrier, other.planWithBarrier) } + /** + * Returns a new Dataset containing rows only in both this Dataset and another Dataset while + * preserving the duplicates. + * This is equivalent to `INTERSECT ALL` in SQL. + * + * @note Equality checking is performed directly on the encoded representation of the data + * and thus is not affected by a custom `equals` function defined on `T`. Also as standard + * in SQL, this function resolves columns by position (not by name). + * + * @group typedrel + * @since 2.4.0 + */ + def intersectAll(other: Dataset[T]): Dataset[T] = withSetOperator { +Intersect(planWithBarrier, other.planWithBarrier, isAll = true) --- End diff -- could you use logicalPlan? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21886 **[Test build #93708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93708/testReport)** for PR 21886 at commit [`67b15ee`](https://github.com/apache/spark/commit/67b15ee535765769ce04b4baf194d5d823344374). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21906 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1437/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21906 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21906 cc @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for bui...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/21906 [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in Avro data source ## What changes were proposed in this pull request? Add one more test case for `com.databricks.spark.avro`. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark avro Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21906.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21906 commit 6d703e8661070a90eee8edd932dfd628bd9982f6 Author: Xiao Li Date: 2018-07-28T00:02:39Z one more test case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21906: [SPARK-24924][SQL][FOLLOW-UP] Add mapping for built-in A...
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21906 @gatorsmile, thanks! I am a bot who has found some folks who might be able to help with the review:@HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21905 hmm, interesting suggestions https://github.com/apache/spark/pull/21905#issuecomment-408580441 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21902: [SPARK-24952][SQL] Support LZMA2 compression by A...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21902#discussion_r205933175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1451,16 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.xz.level") +.doc("Compression level for the XZ codec used in writing of AVRO files. " + + "Valid value must be in the range of from 0 to 9 inclusive: " + + "0-3 for fast with medium compression, 4-6 are fairly slow levels with high compression. " + + "The levels 7-9 are like the level 6 but use bigger dictionaries and have higher " + + "compressor and decompressor memory requirements. Default level is 6.") --- End diff -- use `LZMA2Options.PRESET_DEFAULT` in place of literal `6` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20028 this shouldn't say version 2.3.0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20272 still need to run tests https://github.com/apache/spark/pull/20272#pullrequestreview-108271893 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r205932789 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.integrationtest + +import org.scalatest.concurrent.Eventually +import scala.collection.JavaConverters._ + +import org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, INTERVAL, TIMEOUT} + +trait ClientModeTestsSuite { k8sSuite: KubernetesSuite => + + test("Run in client mode.", k8sTestTag) { +val labels = Map("spark-app-selector" -> driverPodName) +val driverPort = 7077 +val blockManagerPort = 1 +val driverService = testBackend + .getKubernetesClient + .services() + .inNamespace(kubernetesTestComponents.namespace) + .createNew() +.withNewMetadata() + .withName(s"$driverPodName-svc") + .endMetadata() +.withNewSpec() + .withClusterIP("None") + .withSelector(labels.asJava) + .addNewPort() +.withName("driver-port") +.withPort(driverPort) +.withNewTargetPort(driverPort) +.endPort() + .addNewPort() +.withName("block-manager") +.withPort(blockManagerPort) +.withNewTargetPort(blockManagerPort) +.endPort() + .endSpec() +.done() +try { + val driverPod = testBackend +.getKubernetesClient +.pods() +.inNamespace(kubernetesTestComponents.namespace) +.createNew() + .withNewMetadata() + .withName(driverPodName) + .withLabels(labels.asJava) + .endMetadata() +.withNewSpec() + .withServiceAccountName("default") --- End diff -- is there a JIRA? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21905 **[Test build #93711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93711/testReport)** for PR 21905 at commit [`a8a545c`](https://github.com/apache/spark/commit/a8a545cb6b4187dcf88e06eb67b14f743fb5ad55). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21905 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21905 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1436/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` if prev...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21904 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21905 @kiszk, thanks! I am a bot who has found some folks who might be able to help with the review:@tomdz, @pwendell and @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21905: [SPARK-24956][Build][test-maven] Upgrade maven version t...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21905 cc @srowen @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` if prev...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21904 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93705/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21905: [SPARK-24956][Build][test-maven] Upgrade maven ve...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/21905 [SPARK-24956][Build][test-maven] Upgrade maven version to 3.5.4 ## What changes were proposed in this pull request? This PR updates maven version from 3.3.9 to 3.5.4. The current build process uses mvn 3.3.9 that was release on 2015, which looks pretty old. We met [an issue](https://issues.apache.org/jira/browse/SPARK-24895) to need the maven 3.5.2 or later. The release note of the 3.5.4 is [here](https://maven.apache.org/docs/3.5.4/release-notes.html). From [the release note of the 3.5.0](https://maven.apache.org/docs/3.5.0/release-notes.html), the followings are new features: 1. ANSI color logging for improved output visibility 1. add support for module name != artifactId in every calculated URLs (project, SCM, site): special project.directory property 1. create a slf4j-simple provider extension that supports level color rendering 1. ModelResolver interface enhancement: addition of resolveModel(Dependency) supporting version ranges ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-24956 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21905.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21905 commit a8a545cb6b4187dcf88e06eb67b14f743fb5ad55 Author: Kazuaki Ishizaki Date: 2018-07-28T03:49:10Z update maven.version to 3.5.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` if prev...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21904 **[Test build #93705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93705/testReport)** for PR 21904 at commit [`4ac3d1e`](https://github.com/apache/spark/commit/4ac3d1efdec05d56f5051901ca209b67b23be28b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93706/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93706/testReport)** for PR 21898 at commit [`5c5db85`](https://github.com/apache/spark/commit/5c5db85723e10a1c507c83ec2e370996202a77fe). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21902: [SPARK-24952][SQL] Support LZMA2 compression by Avro dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21902 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21902: [SPARK-24952][SQL] Support LZMA2 compression by Avro dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93704/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21902: [SPARK-24952][SQL] Support LZMA2 compression by Avro dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21902 **[Test build #93704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93704/testReport)** for PR 21902 at commit [`3e1139a`](https://github.com/apache/spark/commit/3e1139af293cb2e06e125edfd443a5b5a0265b84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21901 LGTM, too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21883 cc @gatorsmile @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #93710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93710/testReport)** for PR 21608 at commit [`7a2aff4`](https://github.com/apache/spark/commit/7a2aff409d9cb1cbf654a4fcb4e8ea5980d59343). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r205930801 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -272,7 +272,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag]( private def nextPowerOf2(n: Int): Int = { if (n == 0) { - 1 + 2 --- End diff -- Oh, good catch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r205930794 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3801,339 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = ArrayType(elementType, +left.dataType.asInstanceOf[ArrayType].containsNull && + right.dataType.asInstanceOf[ArrayType].containsNull) + + var hsInt: OpenHashSet[Int] = _ + var hsResultInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + var hsResultLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (hsInt.contains(elem) && !hsResultInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsResultInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (hsLong.contains(elem) && !hsResultLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsResultLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + initFoundNullElement: Boolean, + isLongType: Boolean): (Int, Boolean) = { +// store elements into resultArray +var i = 0 +var foundNullElement = initFoundNullElement +if (resultArray == null) { + // hsInt or hsLong is updated only once since it is not changed + while (i < array1.numElements()) { --- End diff -- You are right, fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20636 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21899 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93703/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21899 **[Test build #93703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93703/testReport)** for PR 21899 at commit [`ca94f64`](https://github.com/apache/spark/commit/ca94f64afa2cb64ccac73814c9e941a0a3c960a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205930539 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3968,3 +3964,242 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike +with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +left.dataType + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] --- End diff -- Yeah, to use `byte[]` for binary type looks awkward and inefficient. It would be good to introduce a new class for binary type like `UTF8String`. On the other hand, it is not a task for Spark 2.4 due to a lot of changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r205929427 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -89,23 +97,42 @@ case class ReuseExchange(conf: SQLConf) extends Rule[SparkPlan] { if (!conf.exchangeReuseEnabled) { return plan } + // Build a hash map using schema of exchanges to avoid O(N*N) sameResult calls. val exchanges = mutable.HashMap[StructType, ArrayBuffer[Exchange]]() + +def tryReuseExchange(exchange: Exchange, filterCondition: Exchange => Boolean): SparkPlan = { + // the exchanges that have same results usually also have same schemas (same column names). + val sameSchema = exchanges.getOrElseUpdate(exchange.schema, ArrayBuffer[Exchange]()) + val samePlan = sameSchema.filter(filterCondition).find { e => +exchange.sameResult(e) + } + if (samePlan.isDefined) { +// Keep the output of this exchange, the following plans require that to resolve +// attributes. +ReusedExchangeExec(exchange.output, samePlan.get) + } else { +sameSchema += exchange +exchange + } +} + plan.transformUp { + // For coordinated exchange + case exchange @ ShuffleExchangeExec(_, _, Some(coordinator)) => +tryReuseExchange(exchange, { + // We can reuse an exchange with the same coordinator only + case ShuffleExchangeExec(_, _, Some(c)) => coordinator == c --- End diff -- shall we just include `coordinator` in `ShuffleExchange#sameResult`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r205929402 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan // Ignore this wrapper for canonicalizing. override def doCanonicalize(): SparkPlan = child.canonicalized + override protected def doPrepare(): Unit = { +child match { + case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) => +coordinator.registerExchange(shuffleExchange) --- End diff -- why is it needed? we forget to register the shuffle exchange in some csaes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21772 good catch! LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21772#discussion_r205929311 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala --- @@ -278,6 +278,36 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext { map.free() } + test("SPARK-24809: Serializing LongToUnsafeRowMap in executor may result in data error") { +val unsafeProj = UnsafeProjection.create(Array[DataType](LongType)) +val originalMap = new LongToUnsafeRowMap(mm, 1) + +val key1 = 1L +val value1 = 4852306286022334418L + +val key2 = 2L +val value2 = 8813607448788216010L + +originalMap.append(key1, unsafeProj(InternalRow(value1))) +originalMap.append(key2, unsafeProj(InternalRow(value2))) +originalMap.optimize() + +val ser = new KryoSerializer( --- End diff -- we can write `sparkContext.env.serializer.newInstance()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205929135 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3968,3 +3964,242 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike +with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +left.dataType + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] --- End diff -- This makes me think that it's a bad idea to make `byte[]` the internal data class for binary type. IIRC we have a lot of places to work around the equality issue of `byte[]`, and introduce extra copy when reading binary type column from off-heap. We should consider something like `UTF8String` for binary type. This is not related to this PR, but is something we can do in Spark 3.0. cc @rxin @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21901 the spark-master-test-sbt build failed on ubuntu, but the the improtant bit is that the DateTimeUtilsSuite tests passed! i think this PR should be g2g for merging in to master. thanks @d80tb7 ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205928966 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3968,3 +3964,317 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike +with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck --- End diff -- My motivation is to let these array functions to go through the type coercion rules for `ComplexTypeMergingExpression`. This check is like `assert` and I don't need we need it if we override `dataType` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205928882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3651,14 +3651,9 @@ case class ArrayDistinct(child: Expression) } /** - * Will become common base class for [[ArrayUnion]], ArrayIntersect, and ArrayExcept. + * Will become common base class for [[ArrayUnion]], ArrayIntersect, and [[ArrayExcept]]. */ abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { --- End diff -- I think the question is, shall we apply type coercion rules to these array functions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205928822 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3651,14 +3651,9 @@ case class ArrayDistinct(child: Expression) } /** - * Will become common base class for [[ArrayUnion]], ArrayIntersect, and ArrayExcept. + * Will become common base class for [[ArrayUnion]], ArrayIntersect, and [[ArrayExcept]]. */ abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { --- End diff -- shall we extend `ComplexTypeMergingExpression` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21732 This PR is just a special handling for `Option[Product]` in Aggregator, I think we don't need it when we have the more general solution, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR ...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21584#discussion_r205928623 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -205,7 +218,7 @@ private[spark] object Config extends Logging { .createWithDefault(0.1) val PYSPARK_MAJOR_PYTHON_VERSION = -ConfigBuilder("spark.kubernetes.pyspark.pythonversion") +ConfigBuilder("spark.kubernetes.pyspark.pythonVersion") --- End diff -- idiomatic for configs to have camelCase. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93697/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21901 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21901 **[Test build #93697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93697/testReport)** for PR 21901 at commit [`5867e62`](https://github.com/apache/spark/commit/5867e62531fd9d5027e7af726711fc1b7d5d282f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93709/testReport)** for PR 21103 at commit [`6ef1f22`](https://github.com/apache/spark/commit/6ef1f22aa68d52cf0c00b21211e19d3f80bab7c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1435/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1434/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21103 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21886 **[Test build #93708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93708/testReport)** for PR 21886 at commit [`67b15ee`](https://github.com/apache/spark/commit/67b15ee535765769ce04b4baf194d5d823344374). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21886 @gatorsmile Rebased. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93702/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #93702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93702/testReport)** for PR 21608 at commit [`15ff68d`](https://github.com/apache/spark/commit/15ff68dd3290ad67530aa87e41dc7ef6c9117b91). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21318 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21318 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205925217 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, _) => +val newBranches = branches.foldLeft(List[(Expression, Expression)]()) { + case (newBranches, branch) => +if (newBranches.exists(_._1.semanticEquals(branch._1))) { + // If a condition in a branch is previously seen, this branch can be pruned. + // TODO: In fact, if a condition is a sub-condition of the previous one, + // TODO: it can be pruned. This is less strict and can be implemented + // TODO: by decomposing seen conditions. + newBranches --- End diff -- Seems this can also cover #21852? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93695/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21318 **[Test build #93695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93695/testReport)** for PR 21318 at commit [`db44914`](https://github.com/apache/spark/commit/db449140fd38ce7bfdf6bb699f15443ad3e50ab3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21847 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93707/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21847 **[Test build #93707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93707/testReport)** for PR 21847 at commit [`874d2a8`](https://github.com/apache/spark/commit/874d2a86369083ba271e737b36f19baf2710ee01). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21847 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21847 **[Test build #93707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93707/testReport)** for PR 21847 at commit [`874d2a8`](https://github.com/apache/spark/commit/874d2a86369083ba271e737b36f19baf2710ee01). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org