[GitHub] [spark] imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework
imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework URL: https://github.com/apache/spark/pull/27095#discussion_r363073621 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -730,6 +730,8 @@ class Analyzer( case class ResolveNamespace(catalogManager: CatalogManager) extends Rule[LogicalPlan] with LookupCatalog { def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case UnresolvedNamespace(Seq()) => +ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String]) Review comment: The conflict here is that `SHOW NAMESPACES` treats `None` as `Nil`, but `SHOW TABLES` treats `None` as `current namespace`, thus causing the ambiguity. To make `SHOW TABLES` work with `Nil` approach (the current one), I have to do the following: ```scala case ShowTablesStatement(NonSessionCatalogAndNamespace(catalog, ns), pattern) => val namespace = if (ns.isEmpty && currentCatalog.name.equals(catalog.name)) { catalogManager.currentNamespace.toSeq } else { ns } ShowTables(catalog.asTableCatalog, namespace, pattern) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf
viirya commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf URL: https://github.com/apache/spark/pull/26437#issuecomment-570876682 Is the second SQL query wrong (`COL1 > 1` -> `COL1 > 10`) in the PR description? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf
SparkQA removed a comment on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf URL: https://github.com/apache/spark/pull/26437#issuecomment-570866651 **[Test build #4985 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)** for PR 26437 at commit [`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf
SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf URL: https://github.com/apache/spark/pull/26437#issuecomment-570873680 **[Test build #4985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)** for PR 26437 at commit [`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework
cloud-fan commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework URL: https://github.com/apache/spark/pull/27095#discussion_r363071473 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -730,6 +730,8 @@ class Analyzer( case class ResolveNamespace(catalogManager: CatalogManager) extends Rule[LogicalPlan] with LookupCatalog { def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case UnresolvedNamespace(Seq()) => +ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String]) Review comment: `CatalogAndNamespace` doesn't look up the namespace, but look up the catalog. I think it can handle Nil, which resolves catalog to the current catalog, and return Nil as the namespace identifier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode URL: https://github.com/apache/spark/pull/26933#discussion_r363071095 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit // LongConverter private[this] def castToLong(from: DataType): Any => Any = from match { +case StringType if ansiEnabled => Review comment: I'd like to see something like ``` case StringType if ansiEnabled => buildCast[UTF8String](_, _.toLongExact()) case StringType => val result = new LongWrapper() buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null) ``` and in codegen ``` val casting = if (ansi) { s"$evPrim = $c.toLongExact();" } else { s""" if ($c.toLong($wrapper)) { $evPrim = $wrapper.value; } else { $evNull = true; } """ } code""" UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper(); $casting $wrapper = null; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode URL: https://github.com/apache/spark/pull/26933#discussion_r363071095 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit // LongConverter private[this] def castToLong(from: DataType): Any => Any = from match { +case StringType if ansiEnabled => Review comment: I'd like to see something like ``` case StringType if ansiEnabled => buildCast[UTF8String](_, _.toLongExact()) case StringType => val result = new LongWrapper() buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null) ``` and in codegen ``` val casting = if (ansi) { s"$evPrim = $c.toIntExact();" } else { s""" if ($c.toInt($wrapper)) { $evPrim = $wrapper.value; } else { $evNull = true; } """ } code""" UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper(); $casting $wrapper = null; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570868214 **[Test build #116117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116117/testReport)** for PR 27078 at commit [`d6e519a`](https://github.com/apache/spark/commit/d6e519aa09330cf5688e1013fbbfa93a76c68abe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode URL: https://github.com/apache/spark/pull/26933#discussion_r363071095 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit // LongConverter private[this] def castToLong(from: DataType): Any => Any = from match { +case StringType if ansiEnabled => Review comment: I'd like to see something like ``` case StringType if ansiEnabled => val result = new LongWrapper() buildCast[UTF8String](_, s => { s.toLongExact(result) result.value } case StringType => val result = new LongWrapper() buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null) ``` and in codegen ``` val method = if (ansi) "toIntExact" else "toInt" code""" UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper(); if ($c.$method($wrapper)) { $evPrim = $wrapper.value; } else { $evNull = true; } $wrapper = null; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570867562 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570867568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20909/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570867568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20909/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570867562 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty
wangyum closed pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty URL: https://github.com/apache/spark/pull/22721 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf
SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf URL: https://github.com/apache/spark/pull/26437#issuecomment-570866651 **[Test build #4985 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)** for PR 26437 at commit [`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570859143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116116/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570859142 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570859142 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570859143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116116/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
SparkQA removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849671 **[Test build #116116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)** for PR 27093 at commit [`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570859057 **[Test build #116116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)** for PR 27093 at commit [`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs URL: https://github.com/apache/spark/pull/27079#discussion_r363069147 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ## @@ -75,6 +77,10 @@ object CommandUtils extends Logging { }.sum } } +val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} partitions" else "" +logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to calculate" + Review comment: @maropu @srowen Maybe I could change back to the initial version, which prints a log with partition info in the branch for partitioned table? ``` logInfo(s"Starting to calculate sizes for ${partitions.length} partitions.") ``` In this way we keep the "partitioned table" logic only in that branch. Then the final log applies to both non-partitioned and partitioned tables. ``` logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to calculate" + s" the total size for table ${catalogTable.identifier}.") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs URL: https://github.com/apache/spark/pull/27079#discussion_r363068727 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ## @@ -75,6 +77,10 @@ object CommandUtils extends Logging { }.sum } } +val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} partitions" else "" +logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to calculate" + Review comment: If I put two different logs in the branches, I would need to have two `totalSize` values in two branches and return them after the logs. Besides, the majority of two logs would still be the same (except the partition info). So the code may look redundant that way... what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs URL: https://github.com/apache/spark/pull/27079#discussion_r363068298 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ## @@ -124,8 +128,8 @@ object CommandUtils extends Logging { 0L } }.getOrElse(0L) -val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) -logInfo(s"It took $durationInMs ms to calculate the total file size under path $locationUri.") +val durationInMs = (System.nanoTime() - startTime) / 1e6 +logDebug(s"It took $durationInMs ms to calculate the total file size under path $locationUri.") Review comment: @srowen yes, it could be called in the "else" branch above, and one partition per log would be too much if the number of partitions is very large. ``` partitions.map { p => calculateLocationSize(sessionState, catalogTable.identifier, p.storage.locationUri) }.sum ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20908/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849865 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20908/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849865 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#issuecomment-570849671 **[Test build #116116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)** for PR 27093 at commit [`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
huaxingao commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#discussion_r363067570 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -204,14 +204,8 @@ class FMClassifier @Since("3.0.0") ( instr.logNumFeatures(numFeatures) val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val data: RDD[(Double, OldVector)] = - dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map { -case Row(label: Double, features: Vector) => - require(label == 0 || label == 1, s"FMClassifier was given" + -s" dataset with invalid label $label. Labels must be in {0,1}; note that" + -s" FMClassifier currently only supports binary classification.") - (label, features) - } +val labeledPoint = extractLabeledPoints (dataset, numClasses) Review comment: Removed. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: SPARK-28148: repartition after join is not optimized away
AmplabJenkins removed a comment on issue #27096: SPARK-28148: repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-570842530 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away
AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-570843855 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away
AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-570842530 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott opened a new pull request #27096: SPARK-28148: repartition after join is not optimized away
bmarcott opened a new pull request #27096: SPARK-28148: repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096 ### What changes were proposed in this pull request? Extra shuffling was not eliminated after inner joins because they produce PartitioningCollection Partitioning and the current logic only matched on HashPartitioning. Nothing was present in EnsureRequirements to eliminate parent sorting (within partitions) which was unnecessary when the same sort order was introduced by sortmergejoin Copied from jira: Partitioning & sorting is usually retained after join. ``` spark.conf.set('spark.sql.shuffle.partitions', '42') df1 = spark.range(500, numPartitions=5) df2 = spark.range(1000, numPartitions=5) df3 = spark.range(2000, numPartitions=5) # Reuse previous partitions & sort. df1.join(df2, on='id').join(df3, on='id').explain() # == Physical Plan == # *(8) Project [id#367L] # +- *(8) SortMergeJoin [id#367L], [id#374L], Inner #:- *(5) Project [id#367L] #: +- *(5) SortMergeJoin [id#367L], [id#369L], Inner #: :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0 #: : +- Exchange hashpartitioning(id#367L, 42) #: : +- *(1) Range (0, 500, step=1, splits=5) #: +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0 #:+- Exchange hashpartitioning(id#369L, 42) #: +- *(3) Range (0, 1000, step=1, splits=5) #+- *(7) Sort [id#374L ASC NULLS FIRST], false, 0 # +- Exchange hashpartitioning(id#374L, 42) # +- *(6) Range (0, 2000, step=1, splits=5) ``` However here: Partitions persist through left join, sort doesn't. ``` df1.join(df2, on='id', how='left').repartition('id').sortWithinPartitions('id').explain() # == Physical Plan == # *(5) Sort [id#367L ASC NULLS FIRST], false, 0 # +- *(5) Project [id#367L] #+- SortMergeJoin [id#367L], [id#369L], LeftOuter # :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0 # : +- Exchange hashpartitioning(id#367L, 42) # : +- *(1) Range (0, 500, step=1, splits=5) # +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0 # +- Exchange hashpartitioning(id#369L, 42) # +- *(3) Range (0, 1000, step=1, splits=5) ``` Also here: Partitions do not persist though inner join. ``` df1.join(df2, on='id').repartition('id').sortWithinPartitions('id').explain() # == Physical Plan == # *(6) Sort [id#367L ASC NULLS FIRST], false, 0 # +- Exchange hashpartitioning(id#367L, 42) #+- *(5) Project [id#367L] # +- *(5) SortMergeJoin [id#367L], [id#369L], Inner # :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0 # : +- Exchange hashpartitioning(id#367L, 42) # : +- *(1) Range (0, 500, step=1, splits=5) # +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0 # +- Exchange hashpartitioning(id#369L, 42) #+- *(3) Range (0, 1000, step=1, splits=5) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #26891: Fix issue where `newFilesOnly` does nothing
srowen closed pull request #26891: Fix issue where `newFilesOnly` does nothing URL: https://github.com/apache/spark/pull/26891 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints
srowen commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints URL: https://github.com/apache/spark/pull/27093#discussion_r363062505 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -204,14 +204,8 @@ class FMClassifier @Since("3.0.0") ( instr.logNumFeatures(numFeatures) val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val data: RDD[(Double, OldVector)] = - dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map { -case Row(label: Double, features: Vector) => - require(label == 0 || label == 1, s"FMClassifier was given" + -s" dataset with invalid label $label. Labels must be in {0,1}; note that" + -s" FMClassifier currently only supports binary classification.") - (label, features) - } +val labeledPoint = extractLabeledPoints (dataset, numClasses) Review comment: Nit: remove extra space before method call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context
srowen closed pull request #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context URL: https://github.com/apache/spark/pull/22758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch
github-actions[bot] closed pull request #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch URL: https://github.com/apache/spark/pull/23008 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit in LimitPushDown rule
github-actions[bot] closed pull request #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit in LimitPushDown rule URL: https://github.com/apache/spark/pull/23104 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context
github-actions[bot] commented on issue #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context URL: https://github.com/apache/spark/pull/22758#issuecomment-570831370 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
github-actions[bot] closed pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #22964: [SPARK-25963] Optimize generate followed by window
github-actions[bot] closed pull request #22964: [SPARK-25963] Optimize generate followed by window URL: https://github.com/apache/spark/pull/22964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema
github-actions[bot] commented on issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema URL: https://github.com/apache/spark/pull/22905#issuecomment-570831364 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
github-actions[bot] closed pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23042: [SPARK-26070][SQL] add rule for implicit type coercion for decimal(x, 0)
github-actions[bot] closed pull request #23042: [SPARK-26070][SQL] add rule for implicit type coercion for decimal(x,0) URL: https://github.com/apache/spark/pull/23042 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by JDBC writer for table names
github-actions[bot] closed pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by JDBC writer for table names URL: https://github.com/apache/spark/pull/23094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and orderings
github-actions[bot] commented on issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and orderings URL: https://github.com/apache/spark/pull/22957#issuecomment-570831353 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23074: [SPARK-19798][SQL] Refresh table does not have effect on other sessions than the issuing one
github-actions[bot] closed pull request #23074: [SPARK-19798][SQL] Refresh table does not have effect on other sessions than the issuing one URL: https://github.com/apache/spark/pull/23074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro
github-actions[bot] commented on issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro URL: https://github.com/apache/spark/pull/22878#issuecomment-570831366 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23032: [WIP][SPARK-26061][SQL][MINOR] Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
github-actions[bot] closed pull request #23032: [WIP][SPARK-26061][SQL][MINOR] Reduce the number of unused UnsafeRowWriters created in whole-stage codegen URL: https://github.com/apache/spark/pull/23032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed
github-actions[bot] commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed URL: https://github.com/apache/spark/pull/25795#issuecomment-570831321 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertTrue non-deterministic
github-actions[bot] commented on issue #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertTrue non-deterministic URL: https://github.com/apache/spark/pull/22947#issuecomment-570831356 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22774: [SPARK-25780][CORE]Scheduling the tasks which have no higher level locality first
github-actions[bot] commented on issue #22774: [SPARK-25780][CORE]Scheduling the tasks which have no higher level locality first URL: https://github.com/apache/spark/pull/22774#issuecomment-570831368 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty
github-actions[bot] commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty URL: https://github.com/apache/spark/pull/22721#issuecomment-570831372 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #22945: [SPARK-24066][SQL]Add new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions
github-actions[bot] commented on issue #22945: [SPARK-24066][SQL]Add new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions URL: https://github.com/apache/spark/pull/22945#issuecomment-570831361 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r363016268 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1375,6 +1375,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val FALL_BACK_TO_HDFS_FOR_STATS_MAX_PART_NUM = +buildConf("spark.sql.statistics.fallBackToHdfs.maxPartitionNum") +.doc("If the number of table partitions exceed this value, falling back to hdfs " + + "for statistics calculation is not allowed. This is used to avoid calculating " + + "the size of a large number of partitions through hdfs, which is very time consuming." + + "Setting this value to 0 or negative will disable falling back to hdfs for " + + "partition statistic calculation.") Review comment: Yes, in PruneFileSourcePartitions, it also may lead to calculating size of large number of partitions through hdfs. I will create a follow-up PR to refine it after this PR finished. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
srowen closed pull request #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570822844 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ayudovin commented on issue #27030: [SPARK-30244][SQL][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener
ayudovin commented on issue #27030: [SPARK-30244][SQL][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener URL: https://github.com/apache/spark/pull/27030#issuecomment-570820515 @hvanhovell, Could you please review this pull request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ayudovin commented on issue #27034: [SPARK-30122][Resource-Manager][Kubernetes] - Allow setting serviceAccountName for executor pods
ayudovin commented on issue #27034: [SPARK-30122][Resource-Manager][Kubernetes] - Allow setting serviceAccountName for executor pods URL: https://github.com/apache/spark/pull/27034#issuecomment-570820424 @liyinan926, Could you please review this pull request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor
huaxingao commented on issue #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor URL: https://github.com/apache/spark/pull/27094#issuecomment-570819435 I think over. I didn't implement this correctly: the FeaturesType could be Vector too. Even though the Vector features are changed to Double before train and predict, it is not correct for me to use Type Double in ```class IsotonicRegression extends Regressor[Double, IsotonicRegression, IsotonicRegressionModel]``` I tried type parameter just now but had trouble with it. I looked the history and found out this is the reason why IsotonicRegression doesn't inherit from Regressor. I will take a look of other regression algorithms to see if there are any reasons they don't inherit from Regressor. I will be more cautious before submitting PR next time. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao closed pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor
huaxingao closed pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor URL: https://github.com/apache/spark/pull/27094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570816944 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570816944 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570816946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116114/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570816946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116114/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
SparkQA removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799046 **[Test build #116114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)** for PR 27091 at commit [`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570816809 **[Test build #116114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)** for PR 27091 at commit [`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] superbobry commented on issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch
superbobry commented on issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch URL: https://github.com/apache/spark/pull/23008#issuecomment-570815761 @HyukjinKwon I think you might still want to merge this eventually. Closing the PR will only make the issue harder to discover. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor
srowen commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor URL: https://github.com/apache/spark/pull/27094#discussion_r363051568 ## File path: project/MimaExcludes.scala ## @@ -465,7 +465,15 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.SparkHadoopUtil.appendS3AndSparkHadoopConfigurations"), // [SPARK-29348] Add observable metrics. - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this"), + +// [SPARK-30419][ML] Make IsotonicRegression extend Regressor + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit"), + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol"), Review comment: Hm, weird, because Learner is the type IsotonicRegression here. So it shouldn't be a real change. I wonder if it's just a MiMa problem. So as far as you know this isn't changing any APIs right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570813182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116115/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570813182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116115/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570813180 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570813180 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570812994 **[Test build #116115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)** for PR 27078 at commit [`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
SparkQA removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801625 **[Test build #116115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)** for PR 27078 at commit [`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF
erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF URL: https://github.com/apache/spark/pull/25024#discussion_r363046965 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala ## @@ -450,3 +454,88 @@ case class ScalaUDAF( override def nodeName: String = udaf.getClass.getSimpleName } + +case class ScalaAggregator[IN, BUF, OUT]( +children: Seq[Expression], +agg: Aggregator[IN, BUF, OUT], +inputEncoder: ExpressionEncoder[IN], +isNullable: Boolean = true, +isDeterministic: Boolean = true, +mutableAggBufferOffset: Int = 0, +inputAggBufferOffset: Int = 0) + extends TypedImperativeAggregate[BUF] + with NonSQLExpression + with UserDefinedExpression + with ImplicitCastInputTypes + with Logging { + + private[this] lazy val bufferEncoder = agg.bufferEncoder.asInstanceOf[ExpressionEncoder[BUF]] + private[this] lazy val outputEncoder = agg.outputEncoder.asInstanceOf[ExpressionEncoder[OUT]] + + def dataType: DataType = outputEncoder.objSerializer.dataType + + def inputTypes: Seq[DataType] = inputEncoder.schema.map(_.dataType) + + def nullable: Boolean = isNullable + + override lazy val deterministic: Boolean = isDeterministic + + def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ScalaAggregator[IN, BUF, OUT] = +copy(mutableAggBufferOffset = newMutableAggBufferOffset) + + def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): ScalaAggregator[IN, BUF, OUT] = +copy(inputAggBufferOffset = newInputAggBufferOffset) + + private[this] lazy val childrenSchema: StructType = { +val inputFields = children.zipWithIndex.map { + case (child, index) => +StructField(s"input$index", child.dataType, child.nullable, Metadata.empty) +} +StructType(inputFields) + } + + private[this] lazy val inputProjection = { +val inputAttributes = childrenSchema.toAttributes +log.debug( + s"Creating MutableProj: $children, inputSchema: $inputAttributes.") +UnsafeProjection.create(children, inputAttributes) + } + + def createAggregationBuffer(): BUF = agg.zero + + def update(buffer: BUF, input: InternalRow): BUF = { +val proj = inputProjection(input) +val a = inputEncoder.fromRow(proj) +agg.reduce(buffer, a) + } + + def merge(buffer: BUF, input: BUF): BUF = agg.merge(buffer, input) + + private[this] lazy val outputToCatalystConverter: Any => Any = { +CatalystTypeConverters.createToCatalystConverter(dataType) + } + + def eval(buffer: BUF): Any = { +val row = outputEncoder.toRow(agg.finish(buffer)) +if (outputEncoder.isSerializedAsStruct) row else row.get(0, dataType) + } + + private[this] lazy val bufferSerializer = bufferEncoder.namedExpressions + private[this] lazy val bufferDeserializer = bufferEncoder.resolveAndBind().deserializer + private[this] lazy val bufferObjToRow = UnsafeProjection.create(bufferSerializer) + private[this] lazy val bufferRow = new UnsafeRow(bufferSerializer.length) + private[this] lazy val bufferRowToObject = +GenerateSafeProjection.generate(bufferDeserializer :: Nil) + + def serialize(agg: BUF): Array[Byte] = bufferObjToRow(InternalRow(agg)).getBytes Review comment: :+1: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.
amanomer commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights. URL: https://github.com/apache/spark/pull/27052#issuecomment-570805218 Thanks @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor
huaxingao commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor URL: https://github.com/apache/spark/pull/27094#discussion_r363046502 ## File path: project/MimaExcludes.scala ## @@ -465,7 +465,15 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.SparkHadoopUtil.appendS3AndSparkHadoopConfigurations"), // [SPARK-29348] Add observable metrics. - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this"), + +// [SPARK-30419][ML] Make IsotonicRegression extend Regressor + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit"), + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol"), Review comment: I was confused as well when I first saw the Mima errors lol Here are the Mima errors: ``` [error] * method fit(org.apache.spark.sql.Dataset)org.apache.spark.ml.regression.IsotonicRegressionModel in class org.apache.spark.ml.regression.IsotonicRegression has a different result type in current version, where it is org.apache.spark.ml.Model rather than org.apache.spark.ml.regression.IsotonicRegressionModel [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit") [error] * method setFeaturesCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression in class org.apache.spark.ml.regression.IsotonicRegression has a different result type in current version, where it is org.apache.spark.ml.Predictor rather than org.apache.spark.ml.regression.IsotonicRegression [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol") [error] * method setLabelCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression in class org.apache.spark.ml.regression.IsotonicRegression has a different result type in current version, where it is org.apache.spark.ml.Predictor rather than org.apache.spark.ml.regression.IsotonicRegression [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setLabelCol") [error] * method setPredictionCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression in class org.apache.spark.ml.regression.IsotonicRegression has a different result type in current version, where it is org.apache.spark.ml.Predictor rather than org.apache.spark.ml.regression.IsotonicRegression [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setPredictionCol") [error] * method setFeaturesCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegressionModel in class org.apache.spark.ml.regression.IsotonicRegressionModel has a different result type in current version, where it is org.apache.spark.ml.PredictionModel rather than org.apache.spark.ml.regression.IsotonicRegressionModel [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegressionModel.setFeaturesCol") [error] * method setPredictionCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegressionModel in class org.apache.spark.ml.regression.IsotonicRegressionModel has a different result type in current version, where it is org.apache.spark.ml.PredictionModel rather than org.apache.spark.ml.regression.IsotonicRegressionModel [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegressionModel.setPredictionCol") ``` The APIs are still the same, but the return types are different. Before the change, setXXX is in ```IsotonicRegression``` and the return type is ```IsotonicRegression``` ``` def setFeaturesCol(value: String): this.type = set(featuresCol, value) ``` After the change, setXXX is in the super class ```Predictor``` and the return type is ```Predictor``` ``` def setFeaturesCol(value: String): Learner = set(featuresCol, value).asInstanceOf[Learner] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail:
[GitHub] [spark] imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework
imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework URL: https://github.com/apache/spark/pull/27095#discussion_r363023439 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -730,6 +730,8 @@ class Analyzer( case class ResolveNamespace(catalogManager: CatalogManager) extends Rule[LogicalPlan] with LookupCatalog { def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case UnresolvedNamespace(Seq()) => +ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String]) Review comment: This may not be a good idea since empty `Seq` can mean `None` or `Nil`. How about we add `Option` as following since namespace can be optional in a command: ```scala case class ResolvedNamespace(catalog: SupportsNamespaces, namespace: Option[Seq[String]]) extends LeafNode { override def output: Seq[Attribute] = Nil } case class UnresolvedNamespace(multipartIdentifier: Option[Seq[String]]) extends LeafNode { override lazy val resolved: Boolean = false override def output: Seq[Attribute] = Nil } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801735 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20907/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20907/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801735 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks URL: https://github.com/apache/spark/pull/27078#issuecomment-570801625 **[Test build #116115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)** for PR 27078 at commit [`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command
srowen commented on issue #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command URL: https://github.com/apache/spark/pull/26759#issuecomment-570799515 Ping @PavithraRamachandran This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799158 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20906/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799158 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20906/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570799046 **[Test build #116114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)** for PR 27091 at commit [`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc
srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc URL: https://github.com/apache/spark/pull/27091#issuecomment-570798805 Jenkins test this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs
srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs URL: https://github.com/apache/spark/pull/27079#discussion_r363043345 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ## @@ -124,8 +128,8 @@ object CommandUtils extends Logging { 0L } }.getOrElse(0L) -val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) -logInfo(s"It took $durationInMs ms to calculate the total file size under path $locationUri.") +val durationInMs = (System.nanoTime() - startTime) / 1e6 +logDebug(s"It took $durationInMs ms to calculate the total file size under path $locationUri.") Review comment: Do you mean to change to debug level here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs
srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs URL: https://github.com/apache/spark/pull/27079#discussion_r363043362 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ## @@ -75,6 +77,10 @@ object CommandUtils extends Logging { }.sum } } +val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} partitions" else "" +logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to calculate" + Review comment: Because the two branches above differ in several ways, it might be cleaner to just put two different log statements in the branches above instead of this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation
srowen closed pull request #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation URL: https://github.com/apache/spark/pull/27059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation
srowen commented on issue #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation URL: https://github.com/apache/spark/pull/27059#issuecomment-570798581 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
brkyvz commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-570798373 @cloud-fan Any more comments on this? Shall we merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #27092: [SPARK-30416][SQL] Log a warning for deprecated SQL config in `set()` and `unset()`
MaxGekk commented on issue #27092: [SPARK-30416][SQL] Log a warning for deprecated SQL config in `set()` and `unset()` URL: https://github.com/apache/spark/pull/27092#issuecomment-570798353 @HyukjinKwon Please, have a look at the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.
srowen closed pull request #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights. URL: https://github.com/apache/spark/pull/27052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.
srowen commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights. URL: https://github.com/apache/spark/pull/27052#issuecomment-570797957 We can change this further, but this is an improvement and less of a change than anything else we'd do. I'll merge it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org