[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
AmplabJenkins commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839915821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839915832 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42981/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32503: [WIP] better error message
AmplabJenkins commented on pull request #32503: URL: https://github.com/apache/spark/pull/32503#issuecomment-839915826 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42979/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
SparkQA commented on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-839914752 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42977/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839913239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839912309 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42978/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-839911339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
yaooqinn commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631194854 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensionsProvider.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.annotation.{DeveloperApi, Experimental, Unstable} + +// scalastyle:off line.size.limit +/** + * :: Experimental :: + * + * Base trait for implementations used by [[SparkSessionExtensions]] + * + * + * For example, now we have an external function named `Age` to register as an extension for SparkSession: + * + * + * {{{ + * package org.apache.spark.examples.extensions + * + * import org.apache.spark.sql.catalyst.expressions.{CurrentDate, Expression, RuntimeReplaceable, SubtractDates} + * + * case class Age(birthday: Expression, child: Expression) extends RuntimeReplaceable { + * + * def this(birthday: Expression) = this(birthday, SubtractDates(CurrentDate(), birthday)) + * override def exprsReplaced: Seq[Expression] = Seq(birthday) + * override protected def withNewChildInternal(newChild: Expression): Expression = copy(newChild) + * } + * }}} + * + * We need to create our extension which inherits [[SparkSessionExtensionsProvider]] + * Example: + * + * {{{ + * package org.apache.spark.examples.extensions + * + * import org.apache.spark.sql.{SparkSessionExtensions, SparkSessionExtensionsProvider} + * import org.apache.spark.sql.catalyst.FunctionIdentifier + * import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionInfo} + * + * class MyExtensions extends SparkSessionExtensionsProvider { + * override def apply(v1: SparkSessionExtensions): Unit = { + * v1.injectFunction( + * (new FunctionIdentifier("age"), + * new ExpressionInfo(classOf[Age].getName, + * "age"), (children: Seq[Expression]) => new Age(children.head))) + * } + * } + * }}} + * + * We can inject `MyExtensions` in three ways, + * + * + * [[SparkSession.Builder.withExtensions]] + * Config - spark.sql.extensions + * [[java.util.ServiceLoader]] - Add to src/main/resources/META-INF/services/org.apache.spark.sql.SparkSessionExtensionsProvider + * + * + * @since 3.2.0 + * + * @note We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here. + * It's experimental and intended for developers Review comment: nice suggession -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32505: [SPARK-35373][BUILD] Check Maven artifact checksum in build/mvn
srowen commented on pull request #32505: URL: https://github.com/apache/spark/pull/32505#issuecomment-839908551 I may merge this to master first to see if there are issues, then back-port it if it seems OK. We probably do want this as part of maintenance branches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839907834 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42978/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
yaooqinn commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839907616 thanks @dongjoon-hyun, I've updated PR with your comments addressed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
yaooqinn commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631191285 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ## @@ -74,7 +74,7 @@ import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan} * .config("spark.sql.extensions", "org.example.MyExtensions") * .getOrCreate() * - * class MyExtensions extends Function1[SparkSessionExtensions, Unit] { + * class MyExtensions extends SparkSessionExtensionsProvider { Review comment: sgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32503: [WIP] better error message
SparkQA commented on pull request #32503: URL: https://github.com/apache/spark/pull/32503#issuecomment-839905998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
dongjoon-hyun edited a comment on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839898915 Hi, @gengliangwang and @sigmod . The last commit seems to break JAVA 11 consistently. Could you take a look at this? Thanks! https://user-images.githubusercontent.com/9700541/118007771-4b9f4380-b301-11eb-9b85-d0f7286be0ed.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
dongjoon-hyun edited a comment on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839898915 Hi, @gengliangwang and @sigmod . The last commit seems to break JAVA 11. Could you take a look at this? Thanks! https://user-images.githubusercontent.com/9700541/118007771-4b9f4380-b301-11eb-9b85-d0f7286be0ed.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
dongjoon-hyun edited a comment on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839898915 Hi, @gengliangwang and @sigmod . The last commit seems to break JAVA 11. Could you take a look at this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
dongjoon-hyun commented on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839898915 Hi, @gengliangwang and @sigmod . The last commit seems to break JAVA 11. Could you take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA removed a comment on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839697574 **[Test build #138452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138452/testReport)** for PR 32515 at commit [`459eab0`](https://github.com/apache/spark/commit/459eab0282d1b7fc93817c8e19fa78adffdc35e2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839892464 **[Test build #138452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138452/testReport)** for PR 32515 at commit [`459eab0`](https://github.com/apache/spark/commit/459eab0282d1b7fc93817c8e19fa78adffdc35e2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631171488 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -46,16 +46,17 @@ import org.apache.spark.unsafe.types.UTF8String * Test cases for the [[SparkSessionExtensions]]. */ class SparkSessionExtensionSuite extends SparkFunSuite { - type ExtensionsBuilder = SparkSessionExtensions => Unit - private def create(builder: ExtensionsBuilder): Seq[ExtensionsBuilder] = Seq(builder) + private def create( + builder: SparkSessionExtensionsProvider): Seq[SparkSessionExtensionsProvider] = Seq(builder) Review comment: Do we still have all existing test coverage for old code path, @yaooqinn ? Old code path should be protected by a test coverage according to Apache Spark's recent API management policy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631175068 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1203,4 +1204,22 @@ object SparkSession extends Logging { } extensions } + + /** + * Load extensions from [[ServiceLoader]] and active them + */ + private def loadExtensions(extensions: SparkSessionExtensions): Unit = { +val loader = ServiceLoader.load(classOf[SparkSessionExtensionsProvider], + Utils.getContextOrSparkClassLoader) +val loadedExts = loader.iterator() + +while (loadedExts.hasNext) { + try { +val ext = loadedExts.next() +ext(extensions) + } catch { +case e: Throwable => logWarning("Failed to loader session extension", e) Review comment: `loader` -> `load` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631174479 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1203,4 +1204,22 @@ object SparkSession extends Logging { } extensions } + + /** + * Load extensions from [[ServiceLoader]] and active them Review comment: `active` -> `activate`? Or, we can use `use` instead of `activate`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631171488 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -46,16 +46,17 @@ import org.apache.spark.unsafe.types.UTF8String * Test cases for the [[SparkSessionExtensions]]. */ class SparkSessionExtensionSuite extends SparkFunSuite { - type ExtensionsBuilder = SparkSessionExtensions => Unit - private def create(builder: ExtensionsBuilder): Seq[ExtensionsBuilder] = Seq(builder) + private def create( + builder: SparkSessionExtensionsProvider): Seq[SparkSessionExtensionsProvider] = Seq(builder) Review comment: Do we still have all existing test coverage for old code path, @yaooqinn ? Old code path should be protected by a test coverage according to Apache Spark's recent API management policy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631171488 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -46,16 +46,17 @@ import org.apache.spark.unsafe.types.UTF8String * Test cases for the [[SparkSessionExtensions]]. */ class SparkSessionExtensionSuite extends SparkFunSuite { - type ExtensionsBuilder = SparkSessionExtensions => Unit - private def create(builder: ExtensionsBuilder): Seq[ExtensionsBuilder] = Seq(builder) + private def create( + builder: SparkSessionExtensionsProvider): Seq[SparkSessionExtensionsProvider] = Seq(builder) Review comment: Do we still have a test coverage for old code path, @yaooqinn ? Old code path should be protected by a test coverage according to Apache Spark's recent API management policy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631169547 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensionsProvider.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.annotation.{DeveloperApi, Experimental, Unstable} + +// scalastyle:off line.size.limit +/** + * :: Experimental :: + * + * Base trait for implementations used by [[SparkSessionExtensions]] + * + * + * For example, now we have an external function named `Age` to register as an extension for SparkSession: + * + * + * {{{ + * package org.apache.spark.examples.extensions + * + * import org.apache.spark.sql.catalyst.expressions.{CurrentDate, Expression, RuntimeReplaceable, SubtractDates} + * + * case class Age(birthday: Expression, child: Expression) extends RuntimeReplaceable { + * + * def this(birthday: Expression) = this(birthday, SubtractDates(CurrentDate(), birthday)) + * override def exprsReplaced: Seq[Expression] = Seq(birthday) + * override protected def withNewChildInternal(newChild: Expression): Expression = copy(newChild) + * } + * }}} + * + * We need to create our extension which inherits [[SparkSessionExtensionsProvider]] + * Example: + * + * {{{ + * package org.apache.spark.examples.extensions + * + * import org.apache.spark.sql.{SparkSessionExtensions, SparkSessionExtensionsProvider} + * import org.apache.spark.sql.catalyst.FunctionIdentifier + * import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionInfo} + * + * class MyExtensions extends SparkSessionExtensionsProvider { + * override def apply(v1: SparkSessionExtensions): Unit = { + * v1.injectFunction( + * (new FunctionIdentifier("age"), + * new ExpressionInfo(classOf[Age].getName, + * "age"), (children: Seq[Expression]) => new Age(children.head))) + * } + * } + * }}} + * + * We can inject `MyExtensions` in three ways, + * + * + * [[SparkSession.Builder.withExtensions]] + * Config - spark.sql.extensions + * [[java.util.ServiceLoader]] - Add to src/main/resources/META-INF/services/org.apache.spark.sql.SparkSessionExtensionsProvider + * + * + * @since 3.2.0 + * + * @note We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here. + * It's experimental and intended for developers Review comment: I know that this had a good intention, but `@Experimental` and `@Unstable` means exactly this. Let's remove this for consistency because Apache Spark in general are not trying to repeat `@note` always when we have `@Experimental` and `@Unstable` annotations. Keeping it simple will be enough. ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensionsProvider.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.annotation.{DeveloperApi, Experimental, Unstable} + +// scalastyle:off line.size.limit +/** + * :: Experimental :: + * + * Base trait for implementations used by [[SparkSessionExtensions]] + * + * + * For example, now we have an external function named `Age` to register as an extension for SparkSession: + * + * + * {{{ + * package org.apache.spark.examples.extensions + * + *
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
dongjoon-hyun commented on a change in pull request #32515: URL: https://github.com/apache/spark/pull/32515#discussion_r631166233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ## @@ -74,7 +74,7 @@ import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan} * .config("spark.sql.extensions", "org.example.MyExtensions") * .getOrCreate() * - * class MyExtensions extends Function1[SparkSessionExtensions, Unit] { + * class MyExtensions extends SparkSessionExtensionsProvider { Review comment: I know that this is based on the existing comments. Can we add a new example instead of removing old example please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA removed a comment on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839693179 **[Test build #138450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138450/testReport)** for PR 32515 at commit [`1aece6a`](https://github.com/apache/spark/commit/1aece6a5d43dccf04cc357ccfb759fa046d14d2f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631162814 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -759,6 +759,10 @@ case class Range( } override def computeStats(): Statistics = { +if (!conf.cboEnabled) { + return Statistics(sizeInBytes = LongType.defaultSize * numElements) +} + Review comment: I removed the condition. I have added the condition earlier because, if cbo is not enabled, other operators are not returning other statistics. But yes, in this case we have all the statistics readily available. Do we also need to remove histogram check as well? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -767,14 +771,44 @@ case class Range( } else { (start + (numElements - 1) * step, start) } - val colStat = ColumnStat( + var colStat = ColumnStat( Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -767,14 +771,44 @@ case class Range( } else { (start + (numElements - 1) * step, start) } - val colStat = ColumnStat( + var colStat = ColumnStat( distinctCount = Some(numElements), max = Some(maxVal), min = Some(minVal), nullCount = Some(0), avgLen = Some(LongType.defaultSize), maxLen = Some(LongType.defaultSize)) + if (conf.histogramEnabled) { + Review comment: Removed ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -767,14 +771,44 @@ case class Range( } else { (start + (numElements - 1) * step, start) } - val colStat = ColumnStat( + var colStat = ColumnStat( distinctCount = Some(numElements), max = Some(maxVal), min = Some(minVal), nullCount = Some(0), avgLen = Some(LongType.defaultSize), maxLen = Some(LongType.defaultSize)) + if (conf.histogramEnabled) { + +def getRangeValue(index: Int): Long = { Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32519: [SPARK-35347][SQL][FOLLOWUP] Throw exception when cannot find the method
dongjoon-hyun commented on pull request #32519: URL: https://github.com/apache/spark/pull/32519#issuecomment-839879568 BTW, please revise the PR title, @viirya . `RuntimeException` is also a exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839879369 **[Test build #138450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138450/testReport)** for PR 32515 at commit [`1aece6a`](https://github.com/apache/spark/commit/1aece6a5d43dccf04cc357ccfb759fa046d14d2f). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AgeExample(birthday: Expression, child: Expression) extends RuntimeReplaceable ` * `class SessionExtensionsWithLoader extends SparkSessionExtensionsProvider ` * `class SessionExtensionsWithoutLoader extends SparkSessionExtensionsProvider ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631162814 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -759,6 +759,10 @@ case class Range( } override def computeStats(): Statistics = { +if (!conf.cboEnabled) { + return Statistics(sizeInBytes = LongType.defaultSize * numElements) +} + Review comment: I removed the condition as we have all the statistics available for the operator. I have added the condition earlier because, if cbo is not enabled, other operators are not returning other statistics. Do we also need to remove histogram check as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
dongjoon-hyun commented on a change in pull request #32520: URL: https://github.com/apache/spark/pull/32520#discussion_r631160558 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala ## @@ -83,12 +83,6 @@ class TPCDSQueryTestSuite extends QueryTest with TPCDSBase with SQLQueryTestHelp .toFile.getAbsolutePath } - override val tpcdsQueries = { -// SPARK-35327: Filters out the TPC-DS queries that can cause flaky test results Review comment: Please move this to the new location too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
dongjoon-hyun commented on a change in pull request #32520: URL: https://github.com/apache/spark/pull/32520#discussion_r631160095 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -36,6 +36,12 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema { "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + // Since `tpcdsQueriesV2_7_0` has almost the same queries with these oens below, + // we skip them in the TPCDS-related tests. Review comment: Well, this hides the previous reasoning. I believe it's worth to keep SPARK-35327 comment explicitly. ``` SPARK-35327: Filters out the TPC-DS queries that can cause flaky test results ``` ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -36,6 +36,12 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema { "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + // Since `tpcdsQueriesV2_7_0` has almost the same queries with these oens below, + // we skip them in the TPCDS-related tests. Review comment: This hides the previous reasoning. I believe it's worth to keep SPARK-35327 comment explicitly. ``` SPARK-35327: Filters out the TPC-DS queries that can cause flaky test results ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839867454 **[Test build #138460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138460/testReport)** for PR 32498 at commit [`ccd8f06`](https://github.com/apache/spark/commit/ccd8f068b604b085f086ee7a2861ddc1e2f5219c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
dongjoon-hyun edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-839841587 > LGTM. Should we eventually do this in Hadoop, cc @steveloughran and @dongjoon-hyun ? Thank you for review, @dbtsai . The following two are Spark configurations pointing Spark internal classes, `org.apache.internal.*`. ``` spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32503: [WIP] better error message
SparkQA commented on pull request #32503: URL: https://github.com/apache/spark/pull/32503#issuecomment-839864333 **[Test build #138458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138458/testReport)** for PR 32503 at commit [`56d1833`](https://github.com/apache/spark/commit/56d18336f28c944e30ff1264326bf635ad5a3cdf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-839864385 **[Test build #138459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138459/testReport)** for PR 32494 at commit [`c4ab7fc`](https://github.com/apache/spark/commit/c4ab7fc94750abb9327d208abe57fc3fca56207a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-839864289 **[Test build #138457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138457/testReport)** for PR 32515 at commit [`e9b05cd`](https://github.com/apache/spark/commit/e9b05cdfb997de20ff2a7ef4c8566e712df4c92f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
SparkQA commented on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-839864189 **[Test build #138456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138456/testReport)** for PR 32518 at commit [`2f9f408`](https://github.com/apache/spark/commit/2f9f4080f517defdb84a7d7414070e0a1502fa3d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
AmplabJenkins removed a comment on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-839861221 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138449/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins removed a comment on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839861223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
AmplabJenkins commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-839861221 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138449/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839861223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
dongjoon-hyun commented on a change in pull request #32518: URL: https://github.com/apache/spark/pull/32518#discussion_r631119466 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -396,6 +396,8 @@ class SparkContext(config: SparkConf) extends Logging { if (!_conf.contains("spark.app.name")) { throw new SparkException("An application name must be set in your configuration") } +// This should be set as early as possible. +SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf) Review comment: Yes, it's possible to handle there for SparkSubmit command parameters, however Hadoop configurations also can be handed over to `SparkSession` inside Spark Apps after Spark submits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
shahidki31 commented on a change in pull request #32494: URL: https://github.com/apache/spark/pull/32494#discussion_r631128836 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala ## @@ -88,9 +88,18 @@ object UnionEstimation { case (attrs, outputIndex) => val dataType = unionOutput(outputIndex).dataType val statComparator = createStatComparator(dataType) - val minMaxValue = attrs.zipWithIndex.foldLeft[(Option[Any], Option[Any])]((None, None)) { - case ((minVal, maxVal), (attr, childIndex)) => + val colStatValues = attrs.zipWithIndex +.foldLeft[(Option[Any], Option[Any], Option[BigInt])]((None, None, None)) { + case ((minVal, maxVal, totalNullCount), (attr, childIndex)) => val colStat = union.children(childIndex).stats.attributeStats(attr) +// Update null count +val nullCount = if (totalNullCount.isDefined && colStat.nullCount.isDefined) { + Some(totalNullCount.get + colStat.nullCount.get) +} else if (colStat.nullCount.isDefined) { + colStat.nullCount Review comment: Thanks. I updated the code and added a new UT to test the scenario -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
SparkQA removed a comment on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-839659078 **[Test build #138449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138449/testReport)** for PR 32482 at commit [`16b8a22`](https://github.com/apache/spark/commit/16b8a22cb2d9952b68e2e9a09e56e015ed9781d1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
SparkQA commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-839847895 **[Test build #138449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138449/testReport)** for PR 32482 at commit [`16b8a22`](https://github.com/apache/spark/commit/16b8a22cb2d9952b68e2e9a09e56e015ed9781d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839846869 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42976/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
dongjoon-hyun commented on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-839841587 > LGTM. Should we eventually do this in Hadoop, cc @steveloughran and @dongjoon-hyun ? Thank you for review, @dbtsai . The following two are Spark configurations. ``` spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
dongjoon-hyun commented on a change in pull request #32518: URL: https://github.com/apache/spark/pull/32518#discussion_r631119466 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -396,6 +396,8 @@ class SparkContext(config: SparkConf) extends Logging { if (!_conf.contains("spark.app.name")) { throw new SparkException("An application name must be set in your configuration") } +// This should be set as early as possible. +SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf) Review comment: Sure, let me move this to there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs
dongjoon-hyun commented on a change in pull request #32518: URL: https://github.com/apache/spark/pull/32518#discussion_r631119318 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -1237,6 +1238,53 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu } } } + + test("SPARK-XXX: Fill missing S3A magic committer configs if needed") { Review comment: Oops. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA removed a comment on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839732396 **[Test build #138453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138453/testReport)** for PR 32292 at commit [`9b671f1`](https://github.com/apache/spark/commit/9b671f16011a34b769fa6193ca502e03dea45855). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839833251 **[Test build #138453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138453/testReport)** for PR 32292 at commit [`9b671f1`](https://github.com/apache/spark/commit/9b671f16011a34b769fa6193ca502e03dea45855). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839817350 **[Test build #138455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138455/testReport)** for PR 32292 at commit [`d1c4e92`](https://github.com/apache/spark/commit/d1c4e923e66ab1725c4676e466ef86489ce38587). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32452: [SPARK-35243][SQL] Support columnar execution on ANSI interval types
AmplabJenkins commented on pull request #32452: URL: https://github.com/apache/spark/pull/32452#issuecomment-839813460 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138448/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
AmplabJenkins commented on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-839813464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42975/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
AmplabJenkins removed a comment on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-839813464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42975/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32452: [SPARK-35243][SQL] Support columnar execution on ANSI interval types
AmplabJenkins removed a comment on pull request #32452: URL: https://github.com/apache/spark/pull/32452#issuecomment-839813460 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138448/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
SparkQA commented on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-839807672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
cloud-fan closed pull request #32476: URL: https://github.com/apache/spark/pull/32476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
cloud-fan commented on pull request #32476: URL: https://github.com/apache/spark/pull/32476#issuecomment-839804392 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
gengliangwang commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839801210 @cloud-fan @maropu Sorry for the delay. This one is ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-839796652 Merged to master. Thanks again @luhenry for hanging in there - just wanted to be pretty sure about the change. It's a good one. @zhengruifeng this change is in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0
srowen closed pull request #32415: URL: https://github.com/apache/spark/pull/32415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4
srowen commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-839795206 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4
srowen closed pull request #32455: URL: https://github.com/apache/spark/pull/32455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32485: [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities
srowen commented on pull request #32485: URL: https://github.com/apache/spark/pull/32485#issuecomment-839794229 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #32485: [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities
srowen closed pull request #32485: URL: https://github.com/apache/spark/pull/32485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #31776: [SPARK-34661][SQL] Clean up `OriginalType` and `DecimalMetadata ` usage in Parquet related code
srowen commented on pull request #31776: URL: https://github.com/apache/spark/pull/31776#issuecomment-839792934 Unless @wangyum has comments, I can merge to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
cloud-fan commented on a change in pull request #32520: URL: https://github.com/apache/spark/pull/32520#discussion_r631062829 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -36,6 +36,12 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema { "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + // Since `tpcdsQueriesV2_7_0` has almost the same queries with these oens below, + // we skip them in the TPCDS-related tests. + private val excludedTpcdsQueries: Set[String] = Set("q6", "q34", "q64", "q74", "q75", "q78") + + val tpcdsQueries: Seq[String] = tpcdsAllQueries.filterNot(excludedTpcdsQueries.contains) Review comment: another idea: ``` val tpcdsQueries: Seq[String] = 1.to(99).map("q" + _).flatMap { q => if (Seq("q14", "q23", "q24", "q39").contains(q)) Seq(q + "a", q + "b") else Seq(q) }.filterNot { q => // ... Seq("q6", "q34", "q64", "q74", "q75", "q78").contains(q) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32452: [SPARK-35243][SQL] Support columnar execution on ANSI interval types
SparkQA removed a comment on pull request #32452: URL: https://github.com/apache/spark/pull/32452#issuecomment-839620790 **[Test build #138448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138448/testReport)** for PR 32452 at commit [`ff94039`](https://github.com/apache/spark/commit/ff94039550c1fb3f2fa3e187f784dda926ece252). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32452: [SPARK-35243][SQL] Support columnar execution on ANSI interval types
SparkQA commented on pull request #32452: URL: https://github.com/apache/spark/pull/32452#issuecomment-839789596 **[Test build #138448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138448/testReport)** for PR 32452 at commit [`ff94039`](https://github.com/apache/spark/commit/ff94039550c1fb3f2fa3e187f784dda926ece252). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on pull request #32504: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
tgravescs commented on pull request #32504: URL: https://github.com/apache/spark/pull/32504#issuecomment-839785104 changes look fine, assume it was manually tested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32499: [SPARK-29145][SQL][FOLLOWUP] Clean up code about support sub-queries in join conditions
cloud-fan closed pull request #32499: URL: https://github.com/apache/spark/pull/32499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32499: [SPARK-29145][SQL][FOLLOWUP] Clean up code about support sub-queries in join conditions
cloud-fan commented on pull request #32499: URL: https://github.com/apache/spark/pull/32499#issuecomment-839785448 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32504: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
AmplabJenkins removed a comment on pull request #32504: URL: https://github.com/apache/spark/pull/32504#issuecomment-839780679 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138451/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32504: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
AmplabJenkins commented on pull request #32504: URL: https://github.com/apache/spark/pull/32504#issuecomment-839780679 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138451/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32504: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
SparkQA removed a comment on pull request #32504: URL: https://github.com/apache/spark/pull/32504#issuecomment-839693253 **[Test build #138451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138451/testReport)** for PR 32504 at commit [`a26b180`](https://github.com/apache/spark/commit/a26b180e5924ef80d499d58bc2fec770f2f1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32504: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
SparkQA commented on pull request #32504: URL: https://github.com/apache/spark/pull/32504#issuecomment-839779483 **[Test build #138451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138451/testReport)** for PR 32504 at commit [`a26b180`](https://github.com/apache/spark/commit/a26b180e5924ef80d499d58bc2fec770f2f1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins removed a comment on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839769164 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32292: [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839769164 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-839769113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
AmplabJenkins removed a comment on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-839768172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138441/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839768168 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138445/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
SparkQA commented on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-839768710 **[Test build #138454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138454/testReport)** for PR 32520 at commit [`9fe4b42`](https://github.com/apache/spark/commit/9fe4b4240bf13f3f7b2762e0db763091dfeed952). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero
AmplabJenkins removed a comment on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-839768169 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138447/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839768168 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138445/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero
AmplabJenkins commented on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-839768169 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138447/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
AmplabJenkins commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-839768172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138441/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results
maropu commented on a change in pull request #32454: URL: https://github.com/apache/spark/pull/32454#discussion_r631021871 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -24,7 +24,7 @@ import org.apache.spark.sql.test.SharedSparkSession trait TPCDSBase extends SharedSparkSession with TPCDSSchema { // The TPCDS queries below are based on v1.4 - val tpcdsQueries = Seq( + def tpcdsQueries: Seq[String] = Seq( "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", Review comment: See https://github.com/apache/spark/pull/32520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
maropu opened a new pull request #32520: URL: https://github.com/apache/spark/pull/32520 ### What changes were proposed in this pull request? This PR proposes to skip the "q6", "q34", "q64", "q74", "q75", "q78" queries in the TPCDS-related tests because the TPCDS v2.7 queries have almost the same ones; the only differences in these queries are ORDER BY columns. ### Why are the changes needed? To improve test performance. ### Does this PR introduce _any_ user-facing change? No, dev only. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero
SparkQA commented on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-839752937 **[Test build #138447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138447/testReport)** for PR 32496 at commit [`f86fba6`](https://github.com/apache/spark/commit/f86fba68bdfe51f8679f1044793c0dd15d3190e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero
SparkQA removed a comment on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-839579496 **[Test build #138447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138447/testReport)** for PR 32496 at commit [`f86fba6`](https://github.com/apache/spark/commit/f86fba68bdfe51f8679f1044793c0dd15d3190e3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839573760 **[Test build #138445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138445/testReport)** for PR 32498 at commit [`89457c8`](https://github.com/apache/spark/commit/89457c8fec3ef2ecae6e9b752b0c7322ee54b7e9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-839751330 **[Test build #138445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138445/testReport)** for PR 32498 at commit [`89457c8`](https://github.com/apache/spark/commit/89457c8fec3ef2ecae6e9b752b0c7322ee54b7e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
SparkQA removed a comment on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-839535644 **[Test build #138441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138441/testReport)** for PR 32478 at commit [`170aff2`](https://github.com/apache/spark/commit/170aff266382b60b9068c59738cf0d69563a050a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
SparkQA commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-839742044 **[Test build #138441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138441/testReport)** for PR 32478 at commit [`170aff2`](https://github.com/apache/spark/commit/170aff266382b60b9068c59738cf0d69563a050a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
gengliangwang closed pull request #32439: URL: https://github.com/apache/spark/pull/32439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
gengliangwang commented on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839739715 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32439: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
AmplabJenkins removed a comment on pull request #32439: URL: https://github.com/apache/spark/pull/32439#issuecomment-839733785 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138439/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org