[GitHub] [spark] SparkQA commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578629266 **[Test build #117434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117434/testReport)** for PR 27058 at commit [`7a74aae`](https://github.com/apache/spark/commit/7a74aae09f8f696102c5b92b850d572d64fd9cb1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578625013 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578625016 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22193/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578625013 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578625016 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22193/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578621322 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117433/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578621317 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578621322 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117433/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578621317 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578621153 **[Test build #117433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117433/testReport)** for PR 27021 at commit [`6c87a41`](https://github.com/apache/spark/commit/6c87a41df7555085bd1271ef86414f5f0452314f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PartitionIterator[T](reader: PartitionReader[T]) extends Iterator[T] ` * `class MetricsHandler extends Logging with Serializable ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578619376 **[Test build #117433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117433/testReport)** for PR 27021 at commit [`6c87a41`](https://github.com/apache/spark/commit/6c87a41df7555085bd1271ef86414f5f0452314f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578620873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22191/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578620862 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578620910 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578620910 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578620914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22192/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578620862 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578620873 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22191/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578620914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22192/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
xuzikun2003 commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#discussion_r371088520 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -329,6 +328,39 @@ case class HashAggregateExec( } } + private def generateEvalCodeForAggFuncs( + ctx: CodegenContext, + input: Seq[ExprCode], + inputAttrs: Seq[Attribute], + boundUpdateExprs: Seq[Seq[Expression]], + aggNames: Seq[String], + aggCodeBlocks: Seq[Block], + subExprs: SubExprCodes): String = { +val aggCodes = if (conf.codegenSplitAggregateFunc && + aggCodeBlocks.map(_.length).sum > conf.methodSplitThreshold) { + val maybeSplitCodes = splitAggregateExpressions( +ctx, aggNames, boundUpdateExprs, aggCodeBlocks, subExprs.states) + + maybeSplitCodes.getOrElse(aggCodeBlocks.map(_.code)) +} else { + aggCodeBlocks.map(_.code) +} + +aggCodes.zip(aggregateExpressions.map(ae => (ae.mode, ae.filter))).map { + case (aggCode, (Partial | Complete, Some(condition))) => +// Note: wrap in "do { } while(false);", so the generated checks can jump out +// with "continue;" +s""" + |do { + | ${generatePredicateCode(ctx, condition, inputAttrs, input)} + | $aggCode + |} while(false); Review comment: Got it. Thanks for the explanation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578619376 **[Test build #117433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117433/testReport)** for PR 27021 at commit [`6c87a41`](https://github.com/apache/spark/commit/6c87a41df7555085bd1271ef86414f5f0452314f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"
gatorsmile commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" URL: https://github.com/apache/spark/pull/24938#issuecomment-578618967 ping @viirya Do you think we can finish it before the code freeze? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on issue #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#issuecomment-578618570 **[Test build #117432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117432/testReport)** for PR 27058 at commit [`71ba1f4`](https://github.com/apache/spark/commit/71ba1f46229cb9443658818b1f94b2973fbc37ce). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578617191 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117430/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578617183 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578617191 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117430/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578617183 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on a change in pull request #27058: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27058#discussion_r371085072 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -148,24 +207,106 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { val distinctAggs = exprs.flatMap { _.collect { case ae: AggregateExpression if ae.isDistinct => ae }} -// We need at least two distinct aggregates for this rule because aggregation -// strategy can handle a single distinct group. +// This rule serves two purposes: +// One is to rewrite when there exists at least two distinct aggregates. We need at least +// two distinct aggregates for this rule because aggregation strategy can handle a single +// distinct group. +// Another is to expand distinct aggregates which exists filter clause so that we can +// evaluate the filter locally. // This check can produce false-positives, e.g., SUM(DISTINCT a) & COUNT(DISTINCT a). -distinctAggs.size > 1 +distinctAggs.size >= 1 || distinctAggs.exists(_.filter.isDefined) } def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { -case a: Aggregate if mayNeedtoRewrite(a.aggregateExpressions) => rewrite(a) +case a: Aggregate if mayNeedtoRewrite(a.aggregateExpressions) => + val expandAggregate = extractFiltersInDistinctAggregate(a) + rewriteDistinctAggregate(expandAggregate) } - def rewrite(a: Aggregate): Aggregate = { + private def extractFiltersInDistinctAggregate(a: Aggregate): Aggregate = { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
SparkQA removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588187 **[Test build #117430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117430/testReport)** for PR 27365 at commit [`a5d975c`](https://github.com/apache/spark/commit/a5d975cf64b50458f716d235e754ccf9bd2b27c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
SparkQA commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578616313 **[Test build #117430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117430/testReport)** for PR 27365 at commit [`a5d975c`](https://github.com/apache/spark/commit/a5d975cf64b50458f716d235e754ccf9bd2b27c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615281 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117431/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615276 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615276 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615143 **[Test build #117431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117431/testReport)** for PR 27021 at commit [`1be7a33`](https://github.com/apache/spark/commit/1be7a334d8eaf34d62f0a2039461e841fe740bb2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PartitionIterator[T](reader: PartitionReader[T]) extends Iterator[T] ` * `class MetricsHandler extends Logging with Serializable ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615281 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117431/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578614124 **[Test build #117431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117431/testReport)** for PR 27021 at commit [`1be7a33`](https://github.com/apache/spark/commit/1be7a334d8eaf34d62f0a2039461e841fe740bb2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22190/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615024 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22190/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins removed a comment on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615014 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
AmplabJenkins commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578615014 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
SparkQA commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578614124 **[Test build #117431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117431/testReport)** for PR 27021 at commit [`1be7a33`](https://github.com/apache/spark/commit/1be7a334d8eaf34d62f0a2039461e841fe740bb2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD
sandeep-katta commented on issue #27021: [SPARK-30362][Core] Update InputMetrics in DataSourceRDD URL: https://github.com/apache/spark/pull/27021#issuecomment-578611669 @rdblue please review this, I have tested the changes from my end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
HeartSaVioR commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#discussion_r371073779 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -329,6 +328,39 @@ case class HashAggregateExec( } } + private def generateEvalCodeForAggFuncs( + ctx: CodegenContext, + input: Seq[ExprCode], + inputAttrs: Seq[Attribute], + boundUpdateExprs: Seq[Seq[Expression]], + aggNames: Seq[String], + aggCodeBlocks: Seq[Block], + subExprs: SubExprCodes): String = { +val aggCodes = if (conf.codegenSplitAggregateFunc && + aggCodeBlocks.map(_.length).sum > conf.methodSplitThreshold) { + val maybeSplitCodes = splitAggregateExpressions( +ctx, aggNames, boundUpdateExprs, aggCodeBlocks, subExprs.states) + + maybeSplitCodes.getOrElse(aggCodeBlocks.map(_.code)) +} else { + aggCodeBlocks.map(_.code) +} + +aggCodes.zip(aggregateExpressions.map(ae => (ae.mode, ae.filter))).map { + case (aggCode, (Partial | Complete, Some(condition))) => +// Note: wrap in "do { } while(false);", so the generated checks can jump out +// with "continue;" +s""" + |do { + | ${generatePredicateCode(ctx, condition, inputAttrs, input)} + | $aggCode + |} while(false); Review comment: NOTE in above code comment would be enough to explain why, right? It still executes only once, but be able to exit the specific code block instead of exiting the function/method in the middle of the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
HeartSaVioR commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#discussion_r371073779 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -329,6 +328,39 @@ case class HashAggregateExec( } } + private def generateEvalCodeForAggFuncs( + ctx: CodegenContext, + input: Seq[ExprCode], + inputAttrs: Seq[Attribute], + boundUpdateExprs: Seq[Seq[Expression]], + aggNames: Seq[String], + aggCodeBlocks: Seq[Block], + subExprs: SubExprCodes): String = { +val aggCodes = if (conf.codegenSplitAggregateFunc && + aggCodeBlocks.map(_.length).sum > conf.methodSplitThreshold) { + val maybeSplitCodes = splitAggregateExpressions( +ctx, aggNames, boundUpdateExprs, aggCodeBlocks, subExprs.states) + + maybeSplitCodes.getOrElse(aggCodeBlocks.map(_.code)) +} else { + aggCodeBlocks.map(_.code) +} + +aggCodes.zip(aggregateExpressions.map(ae => (ae.mode, ae.filter))).map { + case (aggCode, (Partial | Complete, Some(condition))) => +// Note: wrap in "do { } while(false);", so the generated checks can jump out +// with "continue;" +s""" + |do { + | ${generatePredicateCode(ctx, condition, inputAttrs, input)} + | $aggCode + |} while(false); Review comment: NOTE in above code comment would be enough to explain why, right? It still executes only once, but be able to exit the specific code block via `continue` instead of exiting the function/method in the middle of the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuzikun2003 commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec
xuzikun2003 commented on a change in pull request #27019: [SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec URL: https://github.com/apache/spark/pull/27019#discussion_r371071649 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -329,6 +328,39 @@ case class HashAggregateExec( } } + private def generateEvalCodeForAggFuncs( + ctx: CodegenContext, + input: Seq[ExprCode], + inputAttrs: Seq[Attribute], + boundUpdateExprs: Seq[Seq[Expression]], + aggNames: Seq[String], + aggCodeBlocks: Seq[Block], + subExprs: SubExprCodes): String = { +val aggCodes = if (conf.codegenSplitAggregateFunc && + aggCodeBlocks.map(_.length).sum > conf.methodSplitThreshold) { + val maybeSplitCodes = splitAggregateExpressions( +ctx, aggNames, boundUpdateExprs, aggCodeBlocks, subExprs.states) + + maybeSplitCodes.getOrElse(aggCodeBlocks.map(_.code)) +} else { + aggCodeBlocks.map(_.code) +} + +aggCodes.zip(aggregateExpressions.map(ae => (ae.mode, ae.filter))).map { + case (aggCode, (Partial | Complete, Some(condition))) => +// Note: wrap in "do { } while(false);", so the generated checks can jump out +// with "continue;" +s""" + |do { + | ${generatePredicateCode(ctx, condition, inputAttrs, input)} + | $aggCode + |} while(false); Review comment: I don't understand why "while(false)" can take an effect here. Could you explain why it is needed here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578601545 ? @zero323 . It seems that you missed my point. I advised like the following. > I'd like to recommend you to mention what you've done clearly. That's enough. Let me rephrase my words. "In the PR description, write that you didn't run the full test. Especially Arrow tests are skipped". It was my request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
HeartSaVioR edited a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588303 > I think it's also fine to have git enforce it. Is there any downside to that? I don't think there's outstanding downside, as if it does have some considerable downsides someone should have been complained. Only 3 files were CR/LF and others have been LF. (cmd/bat files are enforced to have CR/LF as EOL, as they're only used in Windows OS.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588611 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22189/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22189/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588611 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
HeartSaVioR commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588303 > I think it's also fine to have git enforce it. Is there any downside to that? I don't think so, as if it does have some considerable downsides someone should have been complained. Only 3 files were CR/LF and others have been LF. (cmd/bat files are enforced to have CR/LF as EOL, as they're only used in Windows OS.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes
SparkQA commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files, and enforce the EOL for java/scala/xml/py/R files to LF in gitattributes URL: https://github.com/apache/spark/pull/27365#issuecomment-578588187 **[Test build #117430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117430/testReport)** for PR 27365 at commit [`a5d975c`](https://github.com/apache/spark/commit/a5d975cf64b50458f716d235e754ccf9bd2b27c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
HeartSaVioR commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578586936 @dongjoon-hyun `^M` in the PR description is CR/LF, so you may want to type CTRL+V -> CTRL+M in bash shell to get it. I'll update the PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578581851 > Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments. Thank you, I appreciate that. In general, full reproducible is defined by the Dockerfile which is shown at the begging, but to put it here for reference ``` FROM rocker/verse:3.4.3 RUN apt-get update \ && apt-get install -y --no-install-recommends gpg openjdk-8-jdk-headless \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*ce RUN wget -qO- https://keybase.io/zero323/pgp_keys.asc | gpg --import RUN git clone --depth 1 --branch SPARK-23435 https://github.com/zero323/spark.git WORKDIR spark RUN git rev-parse HEAD RUN git verify-commit -v HEAD RUN build/mvn -DskipTests -Phive -Psparkr clean package RUN R --version RUN R -e "install.packages(c('e1071', 'praise'))" RUN R -e "install.packages('testthat', repos='https://cloud.r-project.org/'); packageVersion('testthat'); sessionInfo()" RUN R/create-rd.sh RUN R/create-docs.sh RUN R/check-cran.sh RUN R/run-tests.sh ``` It can be re-run to confirm that it reflects current state of things. As show in the cast, build are done directly from this head of this branch (signature is verified) and no changes to the codebase, beyond what is proposed in this PR (and we don't touch any Arrow related components here at all). As of skipping Arrow tests - that's default behavior defined in respective test for example here https://github.com/apache/spark/blob/43d9c7e7e57749ee611e0c97781a71a0645b5e9b/R/pkg/tests/fulltests/test_sparkSQL_arrow.R#L25 and following lines. So it is neither failure or result of any source modification. Can we make arrow tests run? Possibly, but: - R Arrow package is not present in snapshot repositories used by rocker images. Installing testthat from https://cloud.r-project.org, already pushed things a lot. Additionally some transitive dependencies have hidden version bounds. - C++ Arrow bindings would require external system repositories, which can break decencies for R. - Using other images (let's say official R-base) is not an option, as we need Tex as well as OpenSSL and Curl dev libraries and this will either break or require update of R beyond 2.4 (at least it did for other build configurations I considered). At this point Spark has no coverage for any intermediate R version (Jenkins runs 3.1 and then we have almost eight years of releases worth gap to 3.6 on AppVeyor), not to mention version-OS combinations. That's troubling, and as work related to this PR shown, can miss obvious errors. but not something that can be really addressed by running ad-hoc tests outside project infrastructure. Anyway... If you have specific concerns about the process used here, and you suspect that proposed changes can lead to problems in the future, I'll do my best to address these. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun edited a comment on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578573228 We cannot say `We're good` when we know something wrong. I'd like to recommend you to mention what you've done clearly. That's enough. Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments. > So skipped Arrow tests are expected. Especially, for the following. > I don't think that really affects the results though, as primary concern was CRAN tests and overall process, and Arrow related code hasn't been touched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578573228 We cannot say `We're good` when we know something wrong. I'd like to recommend you to mention what you've done clearly. That's enough. Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments. > So skipped Arrow tests are expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
BryanCutler commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358#discussion_r371049435 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -109,7 +109,11 @@ def toPandas(self): # values, but we should use datetime.date to match the behavior with when # Arrow optimization is disabled. pdf = table.to_pandas(date_as_object=True) -return _check_dataframe_localize_timestamps(pdf, timezone) +for field in self.schema: +if isinstance(field.dataType, TimestampType): +pdf[field.name] = \ Review comment: Thanks @viirya ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578570778 > @zero323 . Thank you for the screencast. However, it skipped all arrow related tests. Please playback the screencast. Unfortunately R arrow is not standalone package (like Python one) and it requires system packages with C++ bindings (installing `arrow` package is not sufficient), And that's dependency hell as these R images (there still more stable than R-base ones) are not really designed for updates. So skipped Arrow tests are expected. I don't think that really affects the results though, as primary concern was CRAN tests and overall process, and Arrow related code hasn't been touched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun edited a comment on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578569365 @zero323 . Thank you for the screencast. However, it skipped all arrow related tests. Please playback the screencast. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578569365 @zero323 . Thank you for the screencast. However, it skipped all arrow related tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
dongjoon-hyun commented on a change in pull request #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#discussion_r371046762 ## File path: R/pkg/tests/run-all.R ## @@ -60,11 +59,23 @@ if (identical(Sys.getenv("NOT_CRAN"), "true")) { if (identical(Sys.getenv("NOT_CRAN"), "true")) { # set random seed for predictable results. mostly for base's sample() in tree and classification set.seed(42) -# for testthat 1.0.2 later, change reporter from "summary" to default_reporter() -testthat:::run_tests("SparkR", - file.path(sparkRDir, "pkg", "tests", "fulltests"), - NULL, - "summary") + +# To be removed once testthat 1.x is removed from all builds Review comment: +1. BTW, we had better file a new JIRA and make this comment as an IDed TODO like the following. ``` # TODO(SPARK-X) To be removed once testthat 1.x is removed from all builds ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578568461 > Please update the PR description. For example, the followings? All done @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
dongjoon-hyun edited a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578568350 @HeartSaVioR . When I follow the direction in the PR description at the master branch, the result is different. Did I miss something? ``` $ git log --oneline -n1 43d9c7e7e5 (HEAD -> master, apache/master, apache/HEAD) [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion $ grep -IUrl --color "^M" . | grep "\.java\|\.scala\|\.xml\|\.py\|\.R" | grep -v "/target/" | grep -v "/build/" | grep -v "/dist/" | grep -v "dependency-reduced-pom.xml" | grep -v ".pyc" ./python/pyspark/_globals.py ./python/pyspark/heapq3.py ./python/pyspark/mllib/linalg/__init__.py ./python/pyspark/shuffle.py ./python/pyspark/ml/linalg/__init__.py ./R/pkg/vignettes/sparkr-vignettes.Rmd ./examples/src/main/python/logistic_regression.py ./dev/github_jira_sync.py ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
dongjoon-hyun commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578568350 @HeartSaVioR . When I follow the direction in the PR description, the result is different. Did I miss something? ``` $ git log --oneline -n1 43d9c7e7e5 (HEAD -> master, apache/master, apache/HEAD) [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion $ grep -IUrl --color "^M" . | grep "\.java\|\.scala\|\.xml\|\.py\|\.R" | grep -v "/target/" | grep -v "/build/" | grep -v "/dist/" | grep -v "dependency-reduced-pom.xml" | grep -v ".pyc" ./python/pyspark/_globals.py ./python/pyspark/heapq3.py ./python/pyspark/mllib/linalg/__init__.py ./python/pyspark/shuffle.py ./python/pyspark/ml/linalg/__init__.py ./R/pkg/vignettes/sparkr-vignettes.Rmd ./examples/src/main/python/logistic_regression.py ./dev/github_jira_sync.py ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
zero323 commented on issue #27359: [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 URL: https://github.com/apache/spark/pull/27359#issuecomment-578568240 @HyukjinKwon > @zero323, do you mind if I ask to check R 3.4.x latest and testthat latest combination I'd say we're good: [![asciicast](https://asciinema.org/a/xiIOy6OcntE6hxNXQwI7vcUl0.svg)](https://asciinema.org/a/xiIOy6OcntE6hxNXQwI7vcUl0) Additionally to local builds this gives us: - R 3.1.x, `testthat` 1.0.2 on Linux (Jenkins) - R 3.4.3, `testthat` 2.3.1 on Linux (docker build) - R 3.6.2, `testthtat` 2.3.1 on Windows This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
viirya commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358#discussion_r371046394 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -109,7 +109,11 @@ def toPandas(self): # values, but we should use datetime.date to match the behavior with when # Arrow optimization is disabled. pdf = table.to_pandas(date_as_object=True) -return _check_dataframe_localize_timestamps(pdf, timezone) +for field in self.schema: +if isinstance(field.dataType, TimestampType): +pdf[field.name] = \ Review comment: ok. looks good then. thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
BryanCutler commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358#discussion_r371042017 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -109,7 +109,11 @@ def toPandas(self): # values, but we should use datetime.date to match the behavior with when # Arrow optimization is disabled. pdf = table.to_pandas(date_as_object=True) -return _check_dataframe_localize_timestamps(pdf, timezone) +for field in self.schema: +if isinstance(field.dataType, TimestampType): +pdf[field.name] = \ Review comment: Yeah, for the case of timestamps making a copy is unavailable. This is just to prevent non-timestamp columns that were also causing a copy when assigned back to the DataFrame This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements URL: https://github.com/apache/spark/pull/27246#issuecomment-578559734 I fixed the issues in ExternalAppendOnlyUnsafeRowArray. Next, I coming days will push new PR for lazy spill reader initialization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
viirya commented on a change in pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358#discussion_r371040215 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -109,7 +109,11 @@ def toPandas(self): # values, but we should use datetime.date to match the behavior with when # Arrow optimization is disabled. pdf = table.to_pandas(date_as_object=True) -return _check_dataframe_localize_timestamps(pdf, timezone) +for field in self.schema: +if isinstance(field.dataType, TimestampType): +pdf[field.name] = \ Review comment: Is it different? Doesn't this also assign the series back to the DataFrame? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #18898: [SPARK-21245][ML] Resolve code duplication for classification/regression summarizers
github-actions[bot] commented on issue #18898: [SPARK-21245][ML] Resolve code duplication for classification/regression summarizers URL: https://github.com/apache/spark/pull/18898#issuecomment-578557489 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #23327: [SPARK-26222][SQL] Track file listing time
github-actions[bot] closed pull request #23327: [SPARK-26222][SQL] Track file listing time URL: https://github.com/apache/spark/pull/23327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on issue #20690: [SPARK-23532][Standalone]Improve data locality when launching new executors for dynamic allocation
github-actions[bot] commented on issue #20690: [SPARK-23532][Standalone]Improve data locality when launching new executors for dynamic allocation URL: https://github.com/apache/spark/pull/20690#issuecomment-578557483 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler closed pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
BryanCutler closed pull request #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on issue #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion
BryanCutler commented on issue #27358: [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion URL: https://github.com/apache/spark/pull/27358#issuecomment-578553730 This is a pretty minor change, so I'm gonna go ahead and merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578546109 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117429/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578546105 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578546109 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117429/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578546105 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
SparkQA commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578545881 **[Test build #117429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117429/testReport)** for PR 27354 at commit [`abe0be5`](https://github.com/apache/spark/commit/abe0be5e514eec1b014849300b5db12c12443a39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
SparkQA removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578524901 **[Test build #117429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117429/testReport)** for PR 27354 at commit [`abe0be5`](https://github.com/apache/spark/commit/abe0be5e514eec1b014849300b5db12c12443a39). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
AmplabJenkins removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578543478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117428/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
AmplabJenkins removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578543475 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
AmplabJenkins commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578543478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117428/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
AmplabJenkins commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578543475 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
SparkQA commented on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578543163 **[Test build #117428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117428/testReport)** for PR 27355 at commit [`39e4bd2`](https://github.com/apache/spark/commit/39e4bd264b26c7840d7f1815b926c28837a50889). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function
SparkQA removed a comment on issue #27355: [SPARK-30625][SQL] Support `escape` as third parameter of the `like` function URL: https://github.com/apache/spark/pull/27355#issuecomment-578522206 **[Test build #117428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117428/testReport)** for PR 27355 at commit [`39e4bd2`](https://github.com/apache/spark/commit/39e4bd264b26c7840d7f1815b926c28837a50889). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation
asfgit closed pull request #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation URL: https://github.com/apache/spark/pull/26957 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #27289: [SPARK-30581][DOC] Document SORT BY Clause of SELECT statement in SQLReference
dilipbiswal commented on issue #27289: [SPARK-30581][DOC] Document SORT BY Clause of SELECT statement in SQLReference URL: https://github.com/apache/spark/pull/27289#issuecomment-578528636 @maropu I had tried to document this in the main description section like this : `The SORT BY clause is used to return the result rows sorted within each partition in the user specified order. When there is more than one partition SORT BY may return result that is partially ordered. This is different than ORDER BY clause which guarantees a total order of the output.` What do you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27292: [SPARK-30582][WEBUI] Spark UI is not showing Aggregated Metrics by Executor in stage page
SparkQA removed a comment on issue #27292: [SPARK-30582][WEBUI] Spark UI is not showing Aggregated Metrics by Executor in stage page URL: https://github.com/apache/spark/pull/27292#issuecomment-578518568 **[Test build #4994 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4994/testReport)** for PR 27292 at commit [`1efc3f5`](https://github.com/apache/spark/commit/1efc3f55e55e40b0fb1527938317482f0fb78cfa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27292: [SPARK-30582][WEBUI] Spark UI is not showing Aggregated Metrics by Executor in stage page
SparkQA commented on issue #27292: [SPARK-30582][WEBUI] Spark UI is not showing Aggregated Metrics by Executor in stage page URL: https://github.com/apache/spark/pull/27292#issuecomment-578526618 **[Test build #4994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4994/testReport)** for PR 27292 at commit [`1efc3f5`](https://github.com/apache/spark/commit/1efc3f55e55e40b0fb1527938317482f0fb78cfa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578525325 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578525328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22188/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins removed a comment on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578525328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22188/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
AmplabJenkins commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578525325 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578525218 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578525218 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
AmplabJenkins removed a comment on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578525220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117425/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files
AmplabJenkins commented on issue #27365: [MINOR][SQL] Convert CRLF into LF in source files URL: https://github.com/apache/spark/pull/27365#issuecomment-578525220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117425/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] patrickcording commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType
patrickcording commented on issue #27354: [SPARK-30633][SQL] Append L to seed when type is LongType URL: https://github.com/apache/spark/pull/27354#issuecomment-578525003 @srowen, @dongjoon-hyun, I extended the first test to also run using integer seeds and when mixing integer and long seeds. I also extended `testHash` to explicitly use a long seed for hashing all sorts of inputs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org