[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21661 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21930 I think that's binary-incompatible breaking API change, right? ex. https://github.com/apache/spark/pull/21930/files#diff-2b8f0f66fe5397b169d0f754e99da8d5R64 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21936#discussion_r206769869 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging { _shutdownHookRef = ShutdownHookManager.addShutdownHook( ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () => logInfo("Invoking stop() from shutdown hook") - stop() + try { +stop() + } catch { +case e: Throwable => + logWarning("Ignoring Exception while stoping SparkContext. Exception: " + e) --- End diff -- `stoping` -> `stopping` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21936#discussion_r206770131 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging { _shutdownHookRef = ShutdownHookManager.addShutdownHook( ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () => logInfo("Invoking stop() from shutdown hook") - stop() + try { +stop() + } catch { +case e: Throwable => + logWarning("Ignoring Exception while stoping SparkContext. Exception: " + e) --- End diff -- use this format `logWarning("", exception)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21661 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21661 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93860/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21661 **[Test build #93860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93860/testReport)** for PR 21661 at commit [`1db4ab8`](https://github.com/apache/spark/commit/1db4ab8d1781036278329ae313cb7b1bf2c201c7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21941 **[Test build #93872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93872/testReport)** for PR 21941 at commit [`47cbc5a`](https://github.com/apache/spark/commit/47cbc5a8d77c949674ff97c5763936a8425b0f00). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21941 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1553/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206768063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1451,6 +1451,15 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val SETOPS_PRECEDENCE_ENFORCED = +buildConf("spark.sql.setops.precedence.enforced") --- End diff -- @gatorsmile Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21622#discussion_r206766835 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala --- @@ -39,6 +42,23 @@ class MetricsReporter( registerGauge("processingRate-total", _.processedRowsPerSecond, 0.0) registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L) + private val timestampFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 + timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) + + registerGauge("eventTime-watermark", +progress => convertStringDateToMillis(progress.eventTime.get("watermark")), 0L) + + registerGauge("states-rowsTotal", _.stateOperators.map(_.numRowsTotal).sum, 0L) + registerGauge("states-usedBytes", _.stateOperators.map(_.memoryUsedBytes).sum, 0L) + --- End diff -- Thanks for the input! I'll keep the patch as it is. Could you suggest approach to extend the maintained metrics? I would like to expand more, and newer things might be coming from custom metrics (like from source and sink) so might be worth to have extension point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93856/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 Thanks @MaxGekk I've fixed the error and also made the parser run faster than before when processing fields that were not selected in general. Can you please retest with the latest SNAPSHOT build and let me know how it goes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21756 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21756 **[Test build #93856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93856/testReport)** for PR 21756 at commit [`6b9edca`](https://github.com/apache/spark/commit/6b9edca76579cd1adfb42eb4085b604b050b552c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206764090 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -535,14 +535,14 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { case logical.Intersect(left, right, true) => throw new IllegalStateException( "logical intersect operator should have been replaced by union, aggregate" + -"and generate operators in the optimizer") +" and generate operators in the optimizer") case logical.Except(left, right, false) => throw new IllegalStateException( "logical except operator should have been replaced by anti-join in the optimizer") case logical.Except(left, right, true) => throw new IllegalStateException( "logical except (all) operator should have been replaced by union, aggregate" + -"and generate operators in the optimizer") +" and generate operators in the optimizer") --- End diff -- This is not related to the current PR. This addresses a comment from @HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206764069 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -535,14 +535,14 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { case logical.Intersect(left, right, true) => throw new IllegalStateException( "logical intersect operator should have been replaced by union, aggregate" + -"and generate operators in the optimizer") +" and generate operators in the optimizer") --- End diff -- This is not related to the current PR. This addresses a comment from @HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206764004 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -165,9 +165,9 @@ object SetOperation { } case class Intersect( - left: LogicalPlan, - right: LogicalPlan, - isAll: Boolean = false) extends SetOperation(left, right) { +left: LogicalPlan, --- End diff -- This is not related to the current PR. This addresses a comment from @HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206763936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1451,6 +1451,15 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val SETOPS_PRECEDENCE_ENFORCED = +buildConf("spark.sql.setops.precedence.enforced") --- End diff -- let me think about the name of conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206763732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1451,6 +1451,15 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val SETOPS_PRECEDENCE_ENFORCED = +buildConf("spark.sql.setops.precedence.enforced") + .doc("When set to true and order of evaluation is not specified by parentheses, " + +"INTERSECT operations are performed before any UNION or EXCEPT operations. " + --- End diff -- also include MINUS --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206763501 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -676,4 +677,42 @@ class PlanParserSuite extends AnalysisTest { OneRowRelation().select('rtrim.function("c&^,.", "bc...,,,&&&ccc")) ) } + + test("precedence of set operations") { +val a = table("a").select(star()) +val b = table("b").select(star()) +val c = table("c").select(star()) +val d = table("d").select(star()) + +val query1 = + """ +|SELECT * FROM a +|UNION +|SELECT * FROM b +|EXCEPT +|SELECT * FROM c +|INTERSECT +|SELECT * FROM d + """.stripMargin + +val query2 = + """ +|SELECT * FROM a +|UNION +|SELECT * FROM b +|EXCEPT ALL +|SELECT * FROM c +|INTERSECT ALL +|SELECT * FROM d + """.stripMargin + +assertEqual(query1, Distinct(a.union(b)).except(c.intersect(d))) --- End diff -- also add `withSQLConf(SQLConf.SETOPS_PRECEDENCE_ENFORCED.key -> "true") {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21941#discussion_r206763358 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -17,6 +17,12 @@ grammar SqlBase; @members { + /** + * When true, INTERSECT is given precedence over UNION and EXCEPT set operations as per --- End diff -- > When true, INTERSECT is given precedence over UNION and EXCEPT set operations as per -> > When true, INTERSECT is given the greater precedence over the other set operations (UNION, EXCEPT and MINUS) as per --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19084: [SPARK-20711][ML]MultivariateOnlineSummarizer/Sum...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/19084 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/21563 @mengxr I notice that you open a ticket for supporting integer type labels in ClusteringEvalutator, would you like to shepherd this pr too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21622 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21941 **[Test build #93871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93871/testReport)** for PR 21941 at commit [`c0821b6`](https://github.com/apache/spark/commit/c0821b6dd8e713edf2bd1ddd9a27f170d8f8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19449#discussion_r206760031 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala --- @@ -82,4 +84,22 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite with SQLTestUtils { assert(checks.forall(_ == true)) } } + + test("SPARK-22219: refactor to control to generate comment") { +withSQLConf(StaticSQLConf.CODEGEN_COMMENTS.key -> "false") { + val res = codegenStringSeq(spark.range(10).groupBy(col("id") * 2).count() +.queryExecution.executedPlan) + assert(res.length == 2) + assert(res.forall{ case (_, code) => +!code.contains("* Codegend pipeline") && !code.contains("// input[")}) +} + +withSQLConf(StaticSQLConf.CODEGEN_COMMENTS.key -> "true") { + val res = codegenStringSeq(spark.range(10).groupBy(col("id") * 2).count() +.queryExecution.executedPlan) + assert(res.length == 2) + assert(res.forall{ case (_, code) => +code.contains("* Codegend pipeline") && code.contains("// input[")}) +} --- End diff -- combine these two? ``` Seq(true, false).foreach { flag => ... if (flag) { ... } else { ... } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1552/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21941 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21622#discussion_r206761192 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala --- @@ -39,6 +42,23 @@ class MetricsReporter( registerGauge("processingRate-total", _.processedRowsPerSecond, 0.0) registerGauge("latency", _.durationMs.get("triggerExecution").longValue(), 0L) + private val timestampFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 + timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) + + registerGauge("eventTime-watermark", +progress => convertStringDateToMillis(progress.eventTime.get("watermark")), 0L) + + registerGauge("states-rowsTotal", _.stateOperators.map(_.numRowsTotal).sum, 0L) + registerGauge("states-usedBytes", _.stateOperators.map(_.memoryUsedBytes).sum, 0L) + --- End diff -- Those are custom metrics, which may or may not be present depending on the implementation of state store. I dont recommend adding them here directly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21941 @dilipbiswal, thanks! I am a bot who has found some folks who might be able to help with the review:@gatorsmile, @rxin and @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/21941 [SPARK-24966][SQL] Implement precedence rules for set operations. ## What changes were proposed in this pull request? Currently the set operations INTERSECT, UNION and EXCEPT are assigned the same precedence. This PR fixes the problem by giving INTERSECT higher precedence than UNION and EXCEPT. UNION and EXCEPT operators are evaluated in the order in which they appear in the query from left to right. This results in change in behavior because of the change in order of evaluations of set operators in a query. The old behavior is still preserved under a newly added config parameter. Query `:` ``` SELECT * FROM t1 UNION SELECT * FROM t2 EXCEPT SELECT * FROM t3 INTERSECT SELECT * FROM t4 ``` Parsed plan before the change `:` ``` == Parsed Logical Plan == 'Intersect false :- 'Except false : :- 'Distinct : : +- 'Union : : :- 'Project [*] : : : +- 'UnresolvedRelation `t1` : : +- 'Project [*] : :+- 'UnresolvedRelation `t2` : +- 'Project [*] : +- 'UnresolvedRelation `t3` +- 'Project [*] +- 'UnresolvedRelation `t4` ``` Parsed plan after the change `:` ``` == Parsed Logical Plan == 'Except false :- 'Distinct : +- 'Union : :- 'Project [*] : : +- 'UnresolvedRelation `t1` : +- 'Project [*] :+- 'UnresolvedRelation `t2` +- 'Intersect false :- 'Project [*] : +- 'UnresolvedRelation `t3` +- 'Project [*] +- 'UnresolvedRelation `t4` ``` ## How was this patch tested? Added tests in PlanParserSuite, SQLQueryTestSuite. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark SPARK-24966 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21941.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21941 commit c0821b6dd8e713edf2bd1ddd9a27f170d8f8 Author: Dilip Biswal Date: 2018-07-30T05:10:29Z [SPARK-24966] Implement precedence rules for set operations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93870/testReport)** for PR 21103 at commit [`93e7979`](https://github.com/apache/spark/commit/93e7979a1c3fb82c47ecae5b3ed539b31cb99e19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1551/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21103 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21222 @zsxwing Kindly reminder. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21622 Pinging @tdas and @zsxwing for reviewing. It's small one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21934 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #93869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93869/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @tdas Thanks for the review! Addressed review comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman, sounds good I'll get this PR updated with your latest changes as soon as I can. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21883 **[Test build #93868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93868/testReport)** for PR 21883 at commit [`536346e`](https://github.com/apache/spark/commit/536346e60ed24ee447f991aacf58cafe9415a020). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1550/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21883 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93866/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #93866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93866/testReport)** for PR 21561 at commit [`1a93c34`](https://github.com/apache/spark/commit/1a93c3432f95713e9a086a39e2f605ea4953619a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r206755595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql]( def prettyJson: String = pretty(render(jsonValue)) private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress = -new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes) +new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics) private[sql] def jsonValue: JValue = { -("numRowsTotal" -> JInt(numRowsTotal)) ~ -("numRowsUpdated" -> JInt(numRowsUpdated)) ~ -("memoryUsedBytes" -> JInt(memoryUsedBytes)) +def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = { + if (map.isEmpty) return JNothing + val keys = map.keySet.asScala.toSeq.sorted + keys.map { k => k -> valueToJValue(map.get(k)) : JObject }.reduce(_ ~ _) +} + +val jsonVal = ("numRowsTotal" -> JInt(numRowsTotal)) ~ + ("numRowsUpdated" -> JInt(numRowsUpdated)) ~ + ("memoryUsedBytes" -> JInt(memoryUsedBytes)) + +if (!customMetrics.isEmpty) { --- End diff -- Actually didn't notice that. Thanks for letting me know! Will simplify. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r206755538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql]( def prettyJson: String = pretty(render(jsonValue)) private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress = -new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes) +new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics) private[sql] def jsonValue: JValue = { -("numRowsTotal" -> JInt(numRowsTotal)) ~ -("numRowsUpdated" -> JInt(numRowsUpdated)) ~ -("memoryUsedBytes" -> JInt(memoryUsedBytes)) +def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = { --- End diff -- I've first trying to leverage `StreamingQueryProgress.safeMapToJValue` but can't find proper place to move to be co-used, so I simply copied it. Will simplify the code block and inline. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r206754359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -81,10 +81,10 @@ class SQLMetric(val metricType: String, initValue: Long = 0L) extends Accumulato } object SQLMetrics { - private val SUM_METRIC = "sum" - private val SIZE_METRIC = "size" - private val TIMING_METRIC = "timing" - private val AVERAGE_METRIC = "average" + val SUM_METRIC = "sum" + val SIZE_METRIC = "size" + val TIMING_METRIC = "timing" + val AVERAGE_METRIC = "average" --- End diff -- It was to handle exception case while aggregating custom metrics, especially filtering out average since it is not aggregated correctly. Since we remove custom average metric, we no longer need to filter out them. Will revert the change as well as relevant logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93855/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21883 **[Test build #93855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93855/testReport)** for PR 21883 at commit [`536346e`](https://github.com/apache/spark/commit/536346e60ed24ee447f991aacf58cafe9415a020). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93851/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21722: Spark-24742: Fix NullPointerexception in Field Metadata
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21722 **[Test build #4228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4228/testReport)** for PR 21722 at commit [`088e2d7`](https://github.com/apache/spark/commit/088e2d789dad707bd657a72afa8933e957641536). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93851/testReport)** for PR 21103 at commit [`93e7979`](https://github.com/apache/spark/commit/93e7979a1c3fb82c47ecae5b3ed539b31cb99e19). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21357 @tdas The rationalization of this patch is to group functions which deal with delta and snapshot files into one so that the difference between delta file and snapshot file will be clearly shown (actually no difference other than allowing TOMBSTONE value in delta file) as well as easy to document about these files. It's also easier to add tests for delta / snapshot files. Indeed my underlying rationalization is to make the class easier to understand from newcomers (actually I found it helpful to group them logically to understand the code better), but the file has been getting enough love from various contributors so may not worth to put effort to make it easiler. I respect the rule of Spark project, and happy to close if we don't feel benefitial to go on. Let's close it and revisit some other one feels benefitial. Thanks for providing your voice on this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStorePr...
Github user HeartSaVioR closed the pull request at: https://github.com/apache/spark/pull/21357 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93852/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19449 **[Test build #93852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93852/testReport)** for PR 19449 at commit [`afe889d`](https://github.com/apache/spark/commit/afe889d7cd05f7a293f76103616cd62106b91305). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93863/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #93863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93863/testReport)** for PR 21563 at commit [`9064e7b`](https://github.com/apache/spark/commit/9064e7bde92f206602ebde9b3d99a861b2a90f8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries
Github user jzhuge commented on the issue: https://github.com/apache/spark/pull/21911 @gatorsmile Oracle's [PARALLEL Hint](https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/Comments.html#GUID-D25225CE-2DCE-4D9F-8E82-401839690A6E) is the closest I can find. And [SET CURRENT DEGREE](https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_setcurrentdegree.html) for parallel processing in DB2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917] make chunk size configurable
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21933 **[Test build #93867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93867/testReport)** for PR 21933 at commit [`0251bd5`](https://github.com/apache/spark/commit/0251bd517e7fd3e695cb8366ffa03de8c9e2900b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917] make chunk size configurable
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21933 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21940: Pin tag 210
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21940 @zhangchj1990, looks mistakenly open. Close this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/19186 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #93866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93866/testReport)** for PR 21561 at commit [`1a93c34`](https://github.com/apache/spark/commit/1a93c3432f95713e9a086a39e2f605ea4953619a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20918: [SPARK-23805][ML][WIP] Features alg support vecto...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/20918 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1549/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21935#discussion_r206748626 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala --- @@ -114,7 +121,10 @@ object SchemaConverters { case ByteType | ShortType | IntegerType => builder.intType() case LongType => builder.longType() case DateType => builder.longType() - case TimestampType => builder.longType() + case TimestampType => +// To be consistent with the previous behavior of writing Timestamp type with Avro 1.7, --- End diff -- For now I think writing out timestamp micros should be good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21752 ping @c-horn --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93865/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #93865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93865/testReport)** for PR 21561 at commit [`2e48282`](https://github.com/apache/spark/commit/2e48282825a6fb46a50f4497491c550963f2c634). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #93865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93865/testReport)** for PR 21561 at commit [`2e48282`](https://github.com/apache/spark/commit/2e48282825a6fb46a50f4497491c550963f2c634). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206748200 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java --- @@ -38,15 +38,16 @@ * If this method fails (by throwing an exception), the action will fail and no Spark job will be * submitted. * - * @param jobId A unique string for the writing job. It's possible that there are many writing - * jobs running at the same time, and the returned {@link DataSourceWriter} can - * use this job id to distinguish itself from other jobs. + * @param writeUUID A unique string for the writing job. It's possible that there are many writing + * jobs running at the same time, and the returned {@link DataSourceWriter} can + * use this job id to distinguish itself from other jobs. * @param schema the schema of the data to be written. * @param mode the save mode which determines what to do when the data are already in this data * source, please refer to {@link SaveMode} for more details. * @param options the options for the returned data source writer, which is an immutable *case-insensitive string-to-string map. + * @return a writer to append data to this data source --- End diff -- non-append cases also call this `createWriter`, shall we remove this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18589: [SPARK-16872][ML] Add Gaussian NB
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/18589 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21934 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18389: [SPARK-14174][ML] Add minibatch kmeans
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/18389 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r206748015 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", --- End diff -- I think we support this for debugging purpose since, IIRC, that's going to make separate processes for workers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21934 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93849/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21934 **[Test build #93849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93849/testReport)** for PR 21934 at commit [`514fd77`](https://github.com/apache/spark/commit/514fd77501194e43e8029734e4a3669f12fbf749). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206747528 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" + None + +case Some(inAttr) if !DataType.canWrite(outAttr.dataType, inAttr.dataType, resolver) => + Some(upcast(inAttr, outAttr)) + +case Some(inAttr) => + Some(inAttr) // matches nullability, datatype, and name + +case _ => + errors += s"Cannot find data for output column '${outAttr.name}'" + None + } +} + + } else { +if (expected.size > query.output.size) { + throw new AnalysisException( +s"""Cannot write to '$tableName', not enough data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) +} + +query.output.zip(expected).flatMap { + case (inAttr, outAttr) if inAttr.nullable && !outAttr.nullable => +errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" +None + + case (inAttr, outAttr) +if !DataType.canWrite(inAttr.dataType, outAttr.dataType, resolver) || --- End diff -- can't we always do upCast? if it can write, the upCast will be a no-op and removed by optimizer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21935#discussion_r206747402 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala --- @@ -35,6 +36,12 @@ object SchemaConverters { * This function takes an avro schema and returns a sql schema. */ def toSqlType(avroSchema: Schema): SchemaType = { +avroSchema.getLogicalType match { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21935#discussion_r206747243 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -71,7 +72,15 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { private def newWriter( avroType: Schema, catalystType: DataType, - path: List[String]): (CatalystDataUpdater, Int, Any) => Unit = + path: List[String]): (CatalystDataUpdater, Int, Any) => Unit = { +(avroType.getLogicalType, catalystType) match { --- End diff -- Can we do this like: ```scala case (LONG, TimestampType) => avroType.getLogicalType match { case _: TimestampMillis => (updater, ordinal, value) => updater.setLong(ordinal, value.asInstanceOf[Long] * 1000) case _: TimestampMicros => (updater, ordinal, value) => updater.setLong(ordinal, value.asInstanceOf[Long]) case _ => (updater, ordinal, value) => updater.setLong(ordinal, value.asInstanceOf[Long] * 1000) } ``` ? Looks they have Avro long type anyway. Thought it's better to read and actually safer and correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...
Github user lindblombr commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206746980 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: result } - private def resolveNullableType(avroType: Schema, nullable: Boolean): Schema = { -if (nullable) { + // Resolve an Avro union against a supplied DataType, i.e. a LongType compared against + // a ["null", "long"] should return a schema of type Schema.Type.LONG + // This function also handles resolving a DataType against unions of 2 or more types, i.e. + // an IntType resolves against a ["int", "long", "null"] will correctly return a schema of + // type Schema.Type.LONG + private def resolveUnionType(avroType: Schema, catalystType: DataType, + nullable: Boolean): Schema = { +if (avroType.getType == Type.UNION) { // avro uses union to represent nullable type. - val fields = avroType.getTypes.asScala - assert(fields.length == 2) - val actualType = fields.filter(_.getType != NULL) - assert(actualType.length == 1) + val fieldTypes = avroType.getTypes.asScala + + // If we're nullable, we need to have at least two types. Cases with more than two types + // are captured in test("read read-write, read-write w/ schema, read") w/ test.avro input + if (nullable && fieldTypes.length < 2) { +throw new IncompatibleSchemaException( + s"Cannot resolve nullable ${catalystType} against union type ${avroType}") + } + + val actualType = catalystType match { +case NullType => fieldTypes.filter(_.getType == Type.NULL) +case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN) +case ByteType => fieldTypes.filter(_.getType == Type.INT) +case BinaryType => + val at = fieldTypes.filter(x => x.getType == Type.BYTES || x.getType == Type.FIXED) + if (at.length > 1) { +throw new IncompatibleSchemaException( + s"Cannot resolve schema of ${catalystType} against union ${avroType.toString}") + } else { +at + } +case ShortType | IntegerType => fieldTypes.filter(_.getType == Type.INT) +case LongType => fieldTypes.filter(_.getType == Type.LONG) +case FloatType => fieldTypes.filter(_.getType == Type.FLOAT) +case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE) +case d: DecimalType => fieldTypes.filter(_.getType == Type.STRING) +case StringType => fieldTypes + .filter(x => x.getType == Type.STRING || x.getType == Type.ENUM) +case DateType => fieldTypes.filter(x => x.getType == Type.INT || x.getType == Type.LONG) +case TimestampType => fieldTypes.filter(_.getType == Type.LONG) +case ArrayType(et, containsNull) => + // Find array that matches the element type specified + fieldTypes.filter(x => x.getType == Type.ARRAY +&& typeMatchesSchema(et, x.getElementType)) +case st: StructType => // Find the matching record! + val recordTypes = fieldTypes.filter(x => x.getType == Type.RECORD) + if (recordTypes.length > 1) { +throw new IncompatibleSchemaException( + "Unions of multiple record types are NOT supported with user-specified schema") + } + recordTypes +case MapType(kt, vt, valueContainsNull) => + // Find the map that matches the value type. Maps in Avro are always key type string + fieldTypes.filter(x => x.getType == Type.MAP && typeMatchesSchema(vt, x.getValueType)) --- End diff -- In `SchemaConverters.toAvro`, the expectation is that Maps are keyed only with `StringType`: case MapType(StringType, vt, valueContainsNull) => builder.map().values(toAvroType(vt, valueContainsNull, recordName, prevNameSpace)) When you attempt this trivial test case, we fail ``` test("SPARK-24855: Maps with kv not string") { withTempPath { dir => val someData = Seq( Row("a", Map( 1 -> "foo", 2 -> "bar", 3 -> "baz" ) ), Row("b", Map( 1 -> "foo", 2 -> "bar", 3 -> "baz" ) ) ) val someSchema = StructType(Seq( StructField("id", StringType, true), StructField("map", MapType(IntegerType, StringType), true) ) )
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r206746905 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + +/** A [[TaskContext]] with extra info and tooling for a barrier stage. */ +trait BarrierTaskContext extends TaskContext { --- End diff -- Please check the generated JavaDoc. I think it becomes a Java interface with only two methods defined here. We might want to define `class BarrierTaskContext` directly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206746478 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => --- End diff -- shall we check the nullability for nested fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r206746383 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -352,6 +351,36 @@ case class Join( } } +/** + * Append data to an existing table. + */ +case class AppendData( +table: NamedRelation, +query: LogicalPlan, +isByName: Boolean) extends LogicalPlan { + override def children: Seq[LogicalPlan] = Seq(query) --- End diff -- why is `table` not a child? Then we can't transform the table relation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21854 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org