[GitHub] spark pull request #21107: [DO-NOT-MERGE][WIP] Explicitly print out skipped ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21107#discussion_r183226283 --- Diff: python/run-tests.py --- @@ -152,65 +172,17 @@ def parse_opts(): return opts -def _check_dependencies(python_exec, modules_to_test): -if "COVERAGE_PROCESS_START" in os.environ: -# Make sure if coverage is installed. -try: -subprocess_check_output( -[python_exec, "-c", "import coverage"], -stderr=open(os.devnull, 'w')) -except: -print_red("Coverage is not installed in Python executable '%s' " - "but 'COVERAGE_PROCESS_START' environment variable is set, " - "exiting." % python_exec) -sys.exit(-1) - -# If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and -# explicitly prints out. See SPARK-23300. -if pyspark_sql in modules_to_test: -# TODO(HyukjinKwon): Relocate and deduplicate these version specifications. -minimum_pyarrow_version = '0.8.0' --- End diff -- We are now relaying on the existing checks in the tests. For example: https://github.com/apache/spark/blob/ab7b961a4fe96ca02b8352d16b0fa80c972b67fc/python/pyspark/sql/tests.py#L63-L69 https://github.com/apache/spark/blob/ab7b961a4fe96ca02b8352d16b0fa80c972b67fc/python/pyspark/sql/tests.py#L3121-L3123 which prints out a skip message like: ``` test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.' ``` which I am capturing here with a regex pattern. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21107: [DO-NOT-MERGE][WIP] Explicitly print out skipped tests f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21107 @BryanCutler, will check and update after testing out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 @BryanCutler, mind if I ask to clarify what happens for end-to-end cases in the PR description (like before & after with explaining the reasons)? the change looks small but possibly a breaking change about end-to-end cases although I think for now we are restoring the correct behaviour as expected. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Github user dedunumax commented on the issue: https://github.com/apache/spark/pull/21120 cc @rxin @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 BTW, I believe it's not so easy to pass a configuration from a very quick look because the exception usually would be thrown in a Python worker process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 If the renaming scenario works in most of cases as expected, I think it'd be worthwhile to have a configuration; however, the previous behaviour looks actually odd because it's going to work only in certain weird conditions when fields in `Row` and fields in the given schema are in the same alphabetical order (https://github.com/apache/spark/pull/20280#discussion_r182569705). Otherwise this case fails already as well. The test case modified in https://github.com/apache/spark/pull/20280#discussion_r182569705 actually works only because `key` and `value` in `Row` and `a` and `b` in the schema are in the same order. I think the test case should be invalid .. I thought about this for a while and failed to describe what the configuration does .. It sounded describing a bug like it was a proper behaviour that can be controlled by a configuration .. I think this one sounds more like a bug fix to me so far. Workaround should be relatively easy. Maybe, would it be good enough to describe workaround in the guide instead? I think it should be fine if we just use a map to convert `Row` to things like a tuple. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 > because we can get the MapStatus, but get a 'null'. If I'm not mistaken, this also because the ExecutorLost trigger removeOutputsOnExecutor If there's a `null` MapStatus for stage 2, how can it retry 4 times without any tasks? IIUC, `null` MapStatus leads to missing partition, which means there will be some tasks to submit. As for stage 3's shuffle Id, that's really weird. Hope you can fix it! @xuanyuanking --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21116: [SPARK-24038][SS] Refactor continuous writing to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21116#discussion_r183224838 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/WriteToContinuousDataSourceExec.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.continuous + +import scala.util.control.NonFatal + +import org.apache.spark.{SparkEnv, SparkException, TaskContext} +import org.apache.spark.internal.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.sql.execution.datasources.v2.{DataWritingSparkTask, InternalRowDataWriterFactory} +import org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask.{logError, logInfo} +import org.apache.spark.sql.execution.streaming.StreamExecution +import org.apache.spark.sql.sources.v2.writer._ +import org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter +import org.apache.spark.util.Utils + +/** + * The physical plan for writing data into a continuous processing [[StreamWriter]]. + */ +case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan) +extends SparkPlan with Logging { + override def children: Seq[SparkPlan] = Seq(query) + override def output: Seq[Attribute] = Nil + + override protected def doExecute(): RDD[InternalRow] = { +val writerFactory = writer match { + case w: SupportsWriteInternalRow => w.createInternalRowWriterFactory() + case _ => new InternalRowDataWriterFactory(writer.createWriterFactory(), query.schema) +} + +val rdd = query.execute() +val messages = new Array[WriterCommitMessage](rdd.partitions.length) + +logInfo(s"Start processing data source writer: $writer. " + + s"The input RDD has ${messages.length} partitions.") +// Let the epoch coordinator know how many partitions the write RDD has. +EpochCoordinatorRef.get( + sparkContext.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY), +sparkContext.env) + .askSync[Unit](SetWriterPartitions(rdd.getNumPartitions)) + +try { + // Force the RDD to run so continuous processing starts; no data is actually being collected + // to the driver, as ContinuousWriteRDD outputs nothing. + sparkContext.runJob( +rdd, +(context: TaskContext, iter: Iterator[InternalRow]) => + WriteToContinuousDataSourceExec.run(writerFactory, context, iter), +rdd.partitions.indices) +} catch { + case _: InterruptedException => +// Interruption is how continuous queries are ended, so accept and ignore the exception. + case cause: Throwable => +cause match { + // Do not wrap interruption exceptions that will be handled by streaming specially. + case _ if StreamExecution.isInterruptionException(cause) => throw cause + // Only wrap non fatal exceptions. + case NonFatal(e) => throw new SparkException("Writing job aborted.", e) + case _ => throw cause +} +} + +sparkContext.emptyRDD + } +} + +object WriteToContinuousDataSourceExec extends Logging { + def run( + writeTask: DataWriterFactory[InternalRow], + context: TaskContext, + iter: Iterator[InternalRow]): Unit = { +val epochCoordinator = EpochCoordinatorRef.get( + context.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY), + SparkEnv.get) +val currentMsg: WriterCommitMessage = null --- End diff -- currentMsg is no longer needed? --- - To unsubscribe,
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20280 I'm kinda worry the example you give above is actually fairly common - construct with kwargs, and then (re-)name the columns. perhaps worthwhile to consider a config switch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21071 yap... HTrace is [retired](http://mail-archives.apache.org/mod_mbox/htrace-dev/201804.mbox/%3Cpony-b7497055821405926d63668ab1112e0f108e2346-2561e81afc434e2d237bbeb5b5921941503445e4%40dev.htrace.apache.org%3E). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20940 **[Test build #89685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89685/testReport)** for PR 20940 at commit [`ae8a388`](https://github.com/apache/spark/commit/ae8a388405d8d3402b5b6a45a7c7855d90538edb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 ![image](https://user-images.githubusercontent.com/4833765/39091106-ff11d0a6-461f-11e8-968f-7fcbe6652bb3.png) Stage 0\1\2\3 same with 20\21\22\23 in this screenshot, stage2's shuffleId is 1 but stage3's is 0 can't happen. Good description for the scenario, can't get a FetchFailed because we can get the MapStatus, but get a 'null'. If I'm not mistaken, this also because the ExecutorLost trigger `removeOutputsOnExecutor`. Happy to discuss with all guys and sorry for can't giving more detailed log after checking the root case, this happened in Baidu online env and can't keep all logs for 1 month. I'll keep fixing the case and catching details log as mush as possible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89682/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89684/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #89682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89682/testReport)** for PR 21082 at commit [`657a6a5`](https://github.com/apache/spark/commit/657a6a5ababbf816db8bbd19475b8e3e5f4aa2ae). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21052 **[Test build #89684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89684/testReport)** for PR 21052 at commit [`8369cbc`](https://github.com/apache/spark/commit/8369cbcd5eab3686c78365e1b1f906a3e8136731). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89681/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21122 **[Test build #89681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89681/testReport)** for PR 21122 at commit [`c62bba1`](https://github.com/apache/spark/commit/c62bba1ed024c7d1d91da8f3d8035de8dc169302). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ExternalCatalog ` * ` // Returns the underlying catalog class (e.g., HiveExternalCatalog).` * `class ExternalCatalogWithListener(delegate: ExternalCatalog)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r183221673 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1117,11 +1117,21 @@ case class AddMonths(startDate: Expression, numMonths: Expression) } /** - * Returns number of months between dates date1 and date2. + * Returns number of months between dates `timestamp1` and `timestamp2`. + * If `timestamp` is later than `timestamp2`, then the result is positive. --- End diff -- Nit: timestamp -> timestamp1. Same below. Nit: These are called date1 and date2 in Python, and also here in the Scala code. Worth being consistent? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89683/testReport)** for PR 21121 at commit [`a599544`](https://github.com/apache/spark/commit/a599544b134d5c14936d76d607466adf1529370e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89683/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21052 **[Test build #89684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89684/testReport)** for PR 21052 at commit [`8369cbc`](https://github.com/apache/spark/commit/8369cbcd5eab3686c78365e1b1f906a3e8136731). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89683/testReport)** for PR 21121 at commit [`a599544`](https://github.com/apache/spark/commit/a599544b134d5c14936d76d607466adf1529370e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21121#discussion_r183220685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -883,3 +884,139 @@ case class Concat(children: Seq[Expression]) extends Expression { override def sql: String = s"concat(${children.map(_.sql).mkString(", ")})" } + +/** + * Returns the maximum value in the array. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(array[, indexFirst]) - Transforms the input array by encapsulating elements into pairs with indexes indicating the order.", + examples = """ +Examples: + > SELECT _FUNC_(array("d", "a", null, "b")); + [("d",0),("a",1),(null,2),("b",3)] + > SELECT _FUNC_(array("d", "a", null, "b"), true); + [(0,"d"),(1,"a"),(2,null),(3,"b")] + """, + since = "2.4.0") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user mshtelma commented on the issue: https://github.com/apache/spark/pull/21052 @gatorsmile I have removed explain() and changed formatting --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user mshtelma commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183220650 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) +df2.createTempView("TBL2") +sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user mshtelma commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183220647 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #89682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89682/testReport)** for PR 21082 at commit [`657a6a5`](https://github.com/apache/spark/commit/657a6a5ababbf816db8bbd19475b8e3e5f4aa2ae). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r183220435 --- Diff: python/pyspark/sql/tests.py --- @@ -5156,6 +5156,15 @@ def test_retain_group_columns(self): expected1 = df.groupby(df.id).agg(sum(df.v)) self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) +def test_array_type(self): --- End diff -- This is related, but I figured its shouldn't hurt to add an array test in GroupedAggPandasUDFTests.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r183220392 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala --- @@ -149,7 +149,7 @@ class AnalysisErrorSuite extends AnalysisTest { UnresolvedAttribute("a") :: Nil, SortOrder(UnresolvedAttribute("b"), Ascending) :: Nil, UnspecifiedFrame)).as('window)), -"not supported within a window function" :: Nil) +"does not have any window functions" :: Nil) --- End diff -- This is because an early analysis exception is thrown by rule ExtractWindowExpressions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21052 LGTM except two minor comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183219812 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) --- End diff -- Nit: ```Scala """ |SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 |FROM tbl t1 |JOIN tbl t2 on t1.id=t2.id |WHERE t1.fld3 IN (-123.23,321.23) """.stripMargin) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183219803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) +df2.createTempView("TBL2") +sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() --- End diff -- Please do not use `explain()`. It will output the strings to the console. You can just do this: ``` sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe')").queryExecution.executedPlan ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2562/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21122 **[Test build #89681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89681/testReport)** for PR 21122 at commit [`c62bba1`](https://github.com/apache/spark/commit/c62bba1ed024c7d1d91da8f3d8035de8dc169302). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21122 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89680/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21122 **[Test build #89680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89680/testReport)** for PR 21122 at commit [`c62bba1`](https://github.com/apache/spark/commit/c62bba1ed024c7d1d91da8f3d8035de8dc169302). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ExternalCatalog ` * ` // Returns the underlying catalog class (e.g., HiveExternalCatalog).` * `class ExternalCatalogWithListener(delegate: ExternalCatalog)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12154: [SPARK-12133][STREAMING] Streaming dynamic allocation
Github user sugix commented on the issue: https://github.com/apache/spark/pull/12154 @tdas - Why we cannot see this in the documentation and I am not sure if AWS EMR supports this feature? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89679/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89679/testReport)** for PR 21121 at commit [`551d04d`](https://github.com/apache/spark/commit/551d04d672686339af3dc5a26b6669a3e996d763). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21121 @gatorsmile I'm not aware of any. From user experience, I strongly feel that such a function is missing. Escpecially, when [transform](https://issues.apache.org/jira/browse/SPARK-23908) function is introduced. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89678/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89678/testReport)** for PR 21056 at commit [`fdeac84`](https://github.com/apache/spark/commit/fdeac84f5b6fe2e25b32cbed4d1771e7c85887cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2561/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21122 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21115: [SPARK-24033] [SQL] Fix Mismatched of Window Fram...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21115 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21122 **[Test build #89680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89680/testReport)** for PR 21122 at commit [`c62bba1`](https://github.com/apache/spark/commit/c62bba1ed024c7d1d91da8f3d8035de8dc169302). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21115: [SPARK-24033] [SQL] Fix Mismatched of Window Frame speci...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21115 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/21122 [SPARK-24017] [SQL] Refactor ExternalCatalog to be an interface ## What changes were proposed in this pull request? This refactors the external catalog to be an interface. It can be easier for the future work in the catalog federation. After the refactoring, `ExternalCatalog` is much cleaner without mixing the listener event generation logic. ## How was this patch tested? The existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark refactorExternalCatalog Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21122 commit c62bba1ed024c7d1d91da8f3d8035de8dc169302 Author: gatorsmileDate: 2018-04-21T17:36:20Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21122 cc @rxin @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21121#discussion_r183214860 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -883,3 +884,139 @@ case class Concat(children: Seq[Expression]) extends Expression { override def sql: String = s"concat(${children.map(_.sql).mkString(", ")})" } + +/** + * Returns the maximum value in the array. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(array[, indexFirst]) - Transforms the input array by encapsulating elements into pairs with indexes indicating the order.", + examples = """ +Examples: + > SELECT _FUNC_(array("d", "a", null, "b")); + [("d",0),("a",1),(null,2),("b",3)] + > SELECT _FUNC_(array("d", "a", null, "b"), true); + [(0,"d"),(1,"a"),(2,null),(3,"b")] + """, + since = "2.4.0") +case class ZipWithIndex(child: Expression, indexFirst: Expression) + extends UnaryExpression with ExpectsInputTypes { + + def this(e: Expression) = this(e, Literal.FalseLiteral) + + val indexFirstValue: Boolean = indexFirst match { +case Literal(v: Boolean, BooleanType) => v +case _ => throw new AnalysisException("The second argument has to be a boolean constant.") + } + + private val MAX_ARRAY_LENGTH: Int = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + lazy val childArrayType: ArrayType = child.dataType.asInstanceOf[ArrayType] + + override def dataType: DataType = { +val elementField = StructField("value", childArrayType.elementType, childArrayType.containsNull) +val indexField = StructField("index", IntegerType, false) + +val fields = if (indexFirstValue) Seq(indexField, elementField) else Seq(elementField, indexField) + +ArrayType(StructType(fields), false) + } + + override protected def nullSafeEval(input: Any): Any = { +val array = input.asInstanceOf[ArrayData].toObjectArray(childArrayType.elementType) + +val makeStruct = (v: Any, i: Int) => if (indexFirstValue) InternalRow(i, v) else InternalRow(v, i) +val resultData = array.zipWithIndex.map{case (v, i) => makeStruct(v, i)} + +new GenericArrayData(resultData) + } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, c => { + if (CodeGenerator.isPrimitiveType(childArrayType.elementType)) { +genCodeForPrimitiveElements(ctx, c, ev.value) + } else { +genCodeForNonPrimitiveElements(ctx, c, ev.value) + } +}) + } + + private def genCodeForPrimitiveElements( + ctx: CodegenContext, + childVariableName: String, + arrayData: String): String = { +val numElements = ctx.freshName("numElements") +val byteArraySize = ctx.freshName("byteArraySize") +val data = ctx.freshName("byteArray") +val unsafeRow = ctx.freshName("unsafeRow") +val structSize = ctx.freshName("structSize") +val unsafeArrayData = ctx.freshName("unsafeArrayData") +val structsOffset = ctx.freshName("structsOffset") +val calculateArraySize = "UnsafeArrayData.calculateSizeOfUnderlyingByteArray" +val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes" + +val baseOffset = Platform.BYTE_ARRAY_OFFSET +val longSize = LongType.defaultSize +val primitiveValueTypeName = CodeGenerator.primitiveTypeName(childArrayType.elementType) +val valuePosition = if (indexFirstValue) "1" else "0" +val indexPosition = if (indexFirstValue) "0" else "1" --- End diff -- nit: How about `val (valuePosition, indexPosition) = if (indexFirstValue) ("1", "0") else ("0", "1")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21121 Which database has this function? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89677/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)** for PR 20959 at commit [`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21121#discussion_r183214185 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -883,3 +884,139 @@ case class Concat(children: Seq[Expression]) extends Expression { override def sql: String = s"concat(${children.map(_.sql).mkString(", ")})" } + +/** + * Returns the maximum value in the array. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(array[, indexFirst]) - Transforms the input array by encapsulating elements into pairs with indexes indicating the order.", + examples = """ +Examples: + > SELECT _FUNC_(array("d", "a", null, "b")); + [("d",0),("a",1),(null,2),("b",3)] + > SELECT _FUNC_(array("d", "a", null, "b"), true); + [(0,"d"),(1,"a"),(2,null),(3,"b")] + """, + since = "2.4.0") --- End diff -- nit: `// scalastyle:on line.size.limit` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21121#discussion_r183214315 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3340,6 +3340,17 @@ object functions { */ def reverse(e: Column): Column = withExpr { Reverse(e.expr) } + /** + * Transforms the input array by encapsulating elements into pairs + * with indexes indicating the order. + * + * @group collection_funcs + * @since 2.4.0 + */ + def zip_with_index(e: Column, indexFirst: Boolean = false): Column = withExpr { --- End diff -- Let's avoid using a default value in APIs. It doesn't work in Java. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21121#discussion_r183214167 --- Diff: python/pyspark/sql/functions.py --- @@ -2191,6 +2191,24 @@ def reverse(col): return Column(sc._jvm.functions.reverse(_to_java_column(col))) +@since(2.4) +def zip_with_index(col, indexFirst=False): +""" +Collection function: transforms the input array by encapsulating elements into pairs +with indexes indicating the order. + +:param col: name of column or expression + +>>> df = spark.createDataFrame([([2, 5, 3],), ([],)], ['data']) +>>> df.select(zip_with_index(df.data).alias('r')).collect() +[Row(r=[[value=2, index=0], [value=5, index=1], [value=3, index=2]]), Row(r=[])] +>>> df.select(zip_with_index(df.data, indexFirst=True).alias('r')).collect() +[Row(r=[[index=0, value=2], [index=1, value=5], [index=2, value=3]]), Row(r=[])] + """ --- End diff -- nit: there's one more leading space here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 Right. Will triple check for sure but I am with you for now. Yup, something in the migration guide makes much more sense to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21121 **[Test build #89679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89679/testReport)** for PR 21121 at commit [`551d04d`](https://github.com/apache/spark/commit/551d04d672686339af3dc5a26b6669a3e996d763). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21121 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen so...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21110 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89675/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21110 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21052 **[Test build #89675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89675/testReport)** for PR 21052 at commit [`8d21488`](https://github.com/apache/spark/commit/8d2148814e52a2db1e14592c91467013565c310a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89674/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89674/testReport)** for PR 21056 at commit [`f96134c`](https://github.com/apache/spark/commit/f96134c39adf643148c87f9bf7f0d5340b0219a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89678/testReport)** for PR 21056 at commit [`fdeac84`](https://github.com/apache/spark/commit/fdeac84f5b6fe2e25b32cbed4d1771e7c85887cc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)** for PR 20959 at commit [`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 oops, I missed this. will take a look shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21121 cc @gatorsmile @ueshin @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 Hi, @xuanyuanking , thank for your patient explanation, sincerely. With regard to your latest explanation: > stage 2's shuffleID is 1, but stage 3 failed by missing an output for shuffle '0'! So here the stage 2's skip cause stage 3 got an error shuffleId. However, I don't think stage 2's skip will lead to stage 3 got an error shuffleId, as we've already created all `ShuffleDependencies ` (constructed with certain ids) for `ShuffleMapStages` before any stages of a job submitted. As I struggle for understanding this issue for a while, finally, I got my own inference: (assume the 2 ShuffleMapTasks below is belong to stage 2, and stage 2 has two partitions on map side. And stage 2 has a parent stage named stage 1, and a child stage named stage 3.) 1. ShuffleMapTask 0.0 run on ExecutorB, and write map output on ExecutorB, succeed normally. And now, there's only '1' available map output registered on `MapOutputTrackerMaster `. 2. ShuffleMapTask 1.0 is running on ExecutorA, and fetch data from ExecutorA, and write map output on ExecutorA, too. 3. ExecutorA lost for unknown reason after send `StatusUpdate` message to driver, which tells ShuffleMapTask 1.0's success. And all map outputs on ExecutorA lost, include ShuffleMapTask 1.0's map output. 4. And driver launch a speculative ShuffleMapTask 1.1 before it receives the `StatusUpdate` message. And ShuffleMapTask 1.1 get FetchFailed immediately. 5. `DAGScheduler` handle the FetchFailed ShuffleMapTask 1.1 firstly, mark stage 2 and it's parent stage 1 as failed. And stage 1 & stage 2 are waiting for resubmit. 6. `DAGScheduler ` handle the success ShuffleMapTask 1.0 before stage 1 & stage 2 resubmit, which trigger `MapOutputTrackerMaster.registerMapOutput` . And now, there's '2' available map output registered on `MapOutputTrackerMaster ` (but knowing ShuffleMapTask 1.0's map output on ExecutorA has been lost.). 7. stage 1 resubmitted and succeed normally. 8. stage 2 resubmitted. As stage 2 has '2' available map output registered on `MapOutputTrackerMaster `, so there's no missing partitions for stage 2. Thus, stage 2 has no missing tasks to submit, too. 9. And then, we submit stage 3. As stage 2's map output file lost on ExecutorA, so stage 3 must get a FetchFailed at the end. Then, we resubmit stage 2& stage 3. And then we get into a loop until stag 3 abort. But if the issue is what I described above, we should get `FetchFailedException` instead of `MetadataFetchFailedException` shown in screenshot. So, at this point which can not make sense. Please feel free to point my wrong spot out. Anyway, thanks again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21121 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21121: [SPARK-24042][SQL] Collection function: zip_with_...
GitHub user mn-mikke opened a pull request: https://github.com/apache/spark/pull/21121 [SPARK-24042][SQL] Collection function: zip_with_index ## What changes were proposed in this pull request? Implement function zip_with_index(array[, indexFirst]) that transforms the input array by encapsulating elements into pairs with indexes indicating the order. ``` zip_with_index(array("d", "a", null, "b")) => [("d",0),("a",1),(null,2),("b",3)] zip_with_index(array("d", "a", null, "b"), true) => [(0,"d"),(1,"a"),(2,null),(3,"b")] ``` ## How was this patch tested? New tests added into: - CollectionExpressionSuite - DataFrameFunctionsSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/AbsaOSS/spark feature/array-api-zip_with_index-to-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21121.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21121 commit 9f090309b8d13e37efaf7824b6d960a6f61ca79f Author: mn-mikke Date: 2018-04-18T08:00:27Z [SPARK-24042][SQL] Collection function: zip_with_index --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20350 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20350 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89673/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20350 **[Test build #89673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89673/testReport)** for PR 20350 at commit [`aa84034`](https://github.com/apache/spark/commit/aa84034bd60413057738500564a9714dfa4b4192). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89676/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20997 **[Test build #89676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89676/testReport)** for PR 20997 at commit [`2c45388`](https://github.com/apache/spark/commit/2c453883869921c99024c02f0a29aac395c82341). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 In the meantime found a small glitch in the SQL part. Namely if reattempt happens this line https://github.com/apache/spark/blob/1d758dc73b54e802fdc92be204185fe7414e6553/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala#L445 removes the consumer from cache which will end up in this log message: ``` 13:27:07.556 INFO org.apache.spark.sql.kafka010.KafkaDataConsumer: Released a supposedly cached consumer that was not found in the cache ``` I've solved this here by removing only the closed consumer. The marked for close will be removed in `release`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20997 **[Test build #89676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89676/testReport)** for PR 20997 at commit [`2c45388`](https://github.com/apache/spark/commit/2c453883869921c99024c02f0a29aac395c82341). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21052 **[Test build #89675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89675/testReport)** for PR 21052 at commit [`8d21488`](https://github.com/apache/spark/commit/8d2148814e52a2db1e14592c91467013565c310a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user mshtelma commented on the issue: https://github.com/apache/spark/pull/21052 @maropu thank you for the suggestions! I have implemented them and pushed the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89674/testReport)** for PR 21056 at commit [`f96134c`](https://github.com/apache/spark/commit/f96134c39adf643148c87f9bf7f0d5340b0219a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user mshtelma commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183206908 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org