[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858328096 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44138/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858326389 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type
MaxGekk commented on a change in pull request #32849: URL: https://github.com/apache/spark/pull/32849#discussion_r648866947 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2513,7 +2514,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } override def visitDayTimeIntervalDataType(ctx: DayTimeIntervalDataTypeContext): DataType = { -DayTimeIntervalType +// TODO(SPARK-X): Support day-time interval fields in SQL Review comment: @cloud-fan Need to modify the parser rules. I plan to do that separately, and focus only on adding unit fields to the class `DayTimeIntervalType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type
MaxGekk commented on a change in pull request #32849: URL: https://github.com/apache/spark/pull/32849#discussion_r648866947 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2513,7 +2514,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } override def visitDayTimeIntervalDataType(ctx: DayTimeIntervalDataTypeContext): DataType = { -DayTimeIntervalType +// TODO(SPARK-X): Support day-time interval fields in SQL Review comment: Need to modify the parser rules. I plan to do that separately, and focus only on adding unit fields to the class `DayTimeIntervalType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
AmplabJenkins removed a comment on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type
MaxGekk commented on a change in pull request #32849: URL: https://github.com/apache/spark/pull/32849#discussion_r648866464 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2358,7 +2358,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } else { assert(calendarInterval.months == 0) val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS) -Literal(micros, DayTimeIntervalType) +// TODO(SPARK-X): Parse to tightest day-time interval type Review comment: @cloud-fan Here, we have already lost info about field units in CalendarInterval. I will open separate JIRA to refactor/implement this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
AmplabJenkins commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
SparkQA commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322328 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44144/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
AmplabJenkins removed a comment on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
SparkQA removed a comment on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750 **[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
AmplabJenkins commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
SparkQA commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858322051 **[Test build #139617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type
MaxGekk commented on a change in pull request #32849: URL: https://github.com/apache/spark/pull/32849#discussion_r648865489 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java ## @@ -21,10 +21,7 @@ import java.math.BigDecimal; import java.math.BigInteger; import java.nio.ByteBuffer; -import java.util.Arrays; -import java.util.Collections; -import java.util.HashSet; -import java.util.Set; +import java.util.*; Review comment: I will revert this back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32849: [WIP][SPARK-35704][SQL] Support fields by the day-time interval type
MaxGekk commented on a change in pull request #32849: URL: https://github.com/apache/spark/pull/32849#discussion_r648864959 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -1152,7 +1154,8 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit (c, evPrim, evNull) => { code"$evPrim = UTF8String.fromString($udtRef.deserialize($c).toString());" } - case i @ (YearMonthIntervalType | DayTimeIntervalType) => + // TODO(SPARK-X): Take into account day-time interval fields in cast Review comment: @cloud-fan I have marked the placed where need some follow ups. I am going to create sub-tasks in JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
SparkQA removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858252115 **[Test build #139608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139608/testReport)** for PR 32816 at commit [`a80bb5c`](https://github.com/apache/spark/commit/a80bb5c06d76f35a80afcad6e242158ef809875e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858319082 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139608/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858319082 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139608/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858318818 **[Test build #139608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139608/testReport)** for PR 32816 at commit [`a80bb5c`](https://github.com/apache/spark/commit/a80bb5c06d76f35a80afcad6e242158ef809875e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class CompositeReadLimit implements ReadLimit ` * `public final class ReadMinRows implements ReadLimit ` * `trait InvokeLike extends Expression with NonSQLExpression with ImplicitCastInputTypes ` * `case class LateralSubquery(` * `case class LateralJoin(` * `case class CommandResultExec(` * `class RocksDBFileManager(` * ` sealed trait SchemaReader ` * ` class SchemaV1Reader extends SchemaReader ` * ` class SchemaV2Reader extends SchemaReader ` * ` trait SchemaWriter ` * ` class SchemaV1Writer extends SchemaWriter ` * ` class SchemaV2Writer extends SchemaWriter ` * `case class CommandResult(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin closed pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.
ueshin closed pull request #32738: URL: https://github.com/apache/spark/pull/32738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.
ueshin commented on pull request #32738: URL: https://github.com/apache/spark/pull/32738#issuecomment-858317725 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them
tanelk commented on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-858317298 This and a very similar PR #31980 have been approved for a while now. @maropu , could you take another look, maybe we can merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858316652 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139616/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858316242 **[Test build #139616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)** for PR 32821 at commit [`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
SparkQA commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750 **[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858316652 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139616/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858316631 **[Test build #139616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)** for PR 32821 at commit [`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.
AmplabJenkins removed a comment on pull request #32841: URL: https://github.com/apache/spark/pull/32841#issuecomment-857578831 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.
SparkQA commented on pull request #32841: URL: https://github.com/apache/spark/pull/32841#issuecomment-858316254 **[Test build #139615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139615/testReport)** for PR 32841 at commit [`94d22b2`](https://github.com/apache/spark/commit/94d22b2c519f3f15af853a249218ed261640136e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858316242 **[Test build #139616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139616/testReport)** for PR 32821 at commit [`e3b1440`](https://github.com/apache/spark/commit/e3b1440e3a0523e8007b927c84851d6609496501). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32853: [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 'other' to driver side
SparkQA commented on pull request #32853: URL: https://github.com/apache/spark/pull/32853#issuecomment-858316127 **[Test build #139614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139614/testReport)** for PR 32853 at commit [`61a5c92`](https://github.com/apache/spark/commit/61a5c92ab303dd6bb6aa65e24abb644255e4fa15). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
SparkQA commented on pull request #32776: URL: https://github.com/apache/spark/pull/32776#issuecomment-858315767 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44140/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648845832 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator { */ private[storage] case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends FetchResult + + /** + * Result of a fetch from a remote merged block unsuccessfully. + * Instead of treating this as a FailureFetchResult, we ignore this failure + * and fallback to fetch the original unmerged blocks. + * @param blockId block id + * @param address BlockManager that the merged block was attempted to be fetched from + * @param size size of the block, used to update bytesInFlight. + * @param isNetworkReqDone Is this the last network request for this host in this fetch + * request. Used to update reqsInFlight. + */ + private[storage] case class IgnoreFetchResult(blockId: BlockId, + address: BlockManagerId, + size: Long, + isNetworkReqDone: Boolean) extends FetchResult + + /** + * Result of a successful fetch of meta information for a merged block. + * + * @param shuffleIdshuffle id. + * @param reduceId reduce id. + * @param blockSizesize of each merged block. + * @param numChunksnumber of chunks in the merged block. + * @param bitmaps bitmaps for every chunk. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFetchResult( + shuffleId: Int, + reduceId: Int, + blockSize: Long, + numChunks: Int, + bitmaps: Array[RoaringBitmap], + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult + + /** + * Result of a failure while fetching the meta information for a merged block. + * + * @param shuffleId shuffle id. + * @param reduceId reduce id. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFailedFetchResult( + shuffleId: Int, + reduceId: Int, + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult +} + +/** + * Helper class that encapsulates all the push-based functionality to fetch merged block meta + * and merged shuffle block chunks. + */ +private class PushBasedFetchHelper( Review comment: A lot of methods in `PushBasedFetchHelper` also needs access to the iterator instance. It needs to work with the iterator to be able to: 1. add results to the iterator's `result` queue when it receives the meta response. 2. updates number of blocks to fetch. 3. fetch fallback blocks when there is a fallback and this in turn removes some pending blocks from `fetchRequests`. It also needs access to the `shuffleClient`, `blockManager`, and `mapOutputTracker`. Most of the methods in this class will access one or more of these instances. IMO, it seem better to create an instance of `PushBasedFetchHelper` per iterator instance. Otherwise, all the methods of `PushBasedFetchHelper` will have way more arguments. I find this class similar to the existing `BufferReleasingInputStream` in the iterator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858314560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44135/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
AmplabJenkins removed a comment on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858314556 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44136/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858314559 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44139/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins removed a comment on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858314558 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44134/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
AmplabJenkins commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858314556 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44136/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858314558 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44134/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858314559 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44139/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858314560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44135/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
Ngone51 commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648857212 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -767,6 +908,43 @@ final class ShuffleBlockFetcherIterator( deferredFetchRequests.getOrElseUpdate(address, new Queue[FetchRequest]()) defReqQueue.enqueue(request) result = null + +case IgnoreFetchResult(blockId, address, size, isNetworkReqDone) => + if (pushBasedFetchHelper.isNotExecutorOrMergedLocal(address)) { +numBlocksInFlightPerAddress(address) = numBlocksInFlightPerAddress(address) - 1 +bytesInFlight -= size + } + if (isNetworkReqDone) { +reqsInFlight -= 1 +logDebug("Number of requests in flight " + reqsInFlight) + } + numBlocksProcessed += pushBasedFetchHelper.initiateFallbackBlockFetchForMergedBlock( +blockId, address) + // Set result to null to trigger another iteration of the while loop to get either + // a SuccessFetchResult or a FailureFetchResult. + result = null + +case MergedBlocksMetaFetchResult(shuffleId, reduceId, blockSize, numChunks, bitmaps, +address, _) => + // The original meta request is processed so we decrease numBlocksToFetch by 1. We will + // collect new chunks request and the count of this is added to numBlocksToFetch in + // collectFetchReqsFromMergedBlocks. + numBlocksToFetch -= 1 + val blocksToRequest = pushBasedFetchHelper.createChunkBlockInfosFromMetaResponse( +shuffleId, reduceId, blockSize, numChunks, bitmaps) + val additionalRemoteReqs = new ArrayBuffer[FetchRequest] + collectFetchRequests(address, blocksToRequest.toSeq, additionalRemoteReqs) + fetchRequests ++= additionalRemoteReqs + // Set result to null to force another iteration. + result = null Review comment: Oh, I see. I misread it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858310485 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44138/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection
ulysses-you commented on pull request #32815: URL: https://github.com/apache/spark/pull/32815#issuecomment-858310082 thanks for merging ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858309113 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858308504 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44139/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648845832 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator { */ private[storage] case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends FetchResult + + /** + * Result of a fetch from a remote merged block unsuccessfully. + * Instead of treating this as a FailureFetchResult, we ignore this failure + * and fallback to fetch the original unmerged blocks. + * @param blockId block id + * @param address BlockManager that the merged block was attempted to be fetched from + * @param size size of the block, used to update bytesInFlight. + * @param isNetworkReqDone Is this the last network request for this host in this fetch + * request. Used to update reqsInFlight. + */ + private[storage] case class IgnoreFetchResult(blockId: BlockId, + address: BlockManagerId, + size: Long, + isNetworkReqDone: Boolean) extends FetchResult + + /** + * Result of a successful fetch of meta information for a merged block. + * + * @param shuffleIdshuffle id. + * @param reduceId reduce id. + * @param blockSizesize of each merged block. + * @param numChunksnumber of chunks in the merged block. + * @param bitmaps bitmaps for every chunk. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFetchResult( + shuffleId: Int, + reduceId: Int, + blockSize: Long, + numChunks: Int, + bitmaps: Array[RoaringBitmap], + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult + + /** + * Result of a failure while fetching the meta information for a merged block. + * + * @param shuffleId shuffle id. + * @param reduceId reduce id. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFailedFetchResult( + shuffleId: Int, + reduceId: Int, + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult +} + +/** + * Helper class that encapsulates all the push-based functionality to fetch merged block meta + * and merged shuffle block chunks. + */ +private class PushBasedFetchHelper( Review comment: The problem is that `PushBasedFetchHelper` also needs access to the iterator instance. It needs to work with the iterator to be able to: 1. add results to the iterator's `result` queue. 2. updates number of blocks to fetch. 3. fetch fallback blocks when there is a fallback and this in turn removes some pending blocks from `fetchRequests`. It also needs access to the `shuffleClient`, `blockManager`, and `mapOutputTracker`. This is why it is a helper class similar to the existing `BufferReleasingInputStream` and `ShuffleFetchCompletionListener`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #32839: [SPARK-35679][SQL] instantToMicros overflow
MaxGekk closed pull request #32839: URL: https://github.com/apache/spark/pull/32839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic opened a new pull request #32853: [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 'other' to driver side
itholic opened a new pull request #32853: URL: https://github.com/apache/spark/pull/32853 ### What changes were proposed in this pull request? This PR fix the wrong behavior of `Index.difference` in pandas APIs on Spark, based on the comment https://github.com/databricks/koalas/pull/1325#discussion_r647889901 and https://github.com/databricks/koalas/pull/1325#discussion_r647890007 - it couldn't handle the case properly when `self` is `Index` or `MultiIndex` and `other` is `MultiIndex` or `Index`. ```python >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 'z', 3)]) >>> idx1 = ps.Index([1, 2, 3]) >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 'z', 3)]) >>> midx1.difference(idx1) pyspark.pandas.exceptions.PandasNotImplementedError: The method `pd.Index.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead. ``` - it's collecting the all data into the driver side when the other is list-like objects, especially when the `other` is distributed object such as Series which is very dangerous. And added the related test cases. ### Why are the changes needed? To correct the incompatible behavior with pandas, and to prevent the case which potentially cause the OOM easily. ```python >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 'z', 3)]) >>> idx1 = ps.Index([1, 2, 3]) >>> midx1 = ps.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'z', 2), ('k', 'z', 3)]) >>> midx1.difference(idx1) MultiIndex([('a', 'x', 1), ('b', 'z', 2), ('k', 'z', 3)], ) ``` And now it only using the for loop when the `other` is only the case `list`, `set` or `dict`. ### Does this PR introduce _any_ user-facing change? Yes, the previous bug is fixed as described in the above code examples. ### How was this patch tested? Manually tested with linter and unittest in local, and it might be passed on CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #32839: [SPARK-35679][SQL] instantToMicros overflow
MaxGekk commented on pull request #32839: URL: https://github.com/apache/spark/pull/32839#issuecomment-858300793 @dgd-contributor Does the issue exist in other versions: 3.0, 3.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858299743 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44135/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858299573 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.
cloud-fan commented on pull request #32841: URL: https://github.com/apache/spark/pull/32841#issuecomment-858298057 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858297795 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44134/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648847338 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -767,6 +908,43 @@ final class ShuffleBlockFetcherIterator( deferredFetchRequests.getOrElseUpdate(address, new Queue[FetchRequest]()) defReqQueue.enqueue(request) result = null + +case IgnoreFetchResult(blockId, address, size, isNetworkReqDone) => + if (pushBasedFetchHelper.isNotExecutorOrMergedLocal(address)) { +numBlocksInFlightPerAddress(address) = numBlocksInFlightPerAddress(address) - 1 +bytesInFlight -= size + } + if (isNetworkReqDone) { +reqsInFlight -= 1 +logDebug("Number of requests in flight " + reqsInFlight) + } + numBlocksProcessed += pushBasedFetchHelper.initiateFallbackBlockFetchForMergedBlock( +blockId, address) + // Set result to null to trigger another iteration of the while loop to get either + // a SuccessFetchResult or a FailureFetchResult. + result = null + +case MergedBlocksMetaFetchResult(shuffleId, reduceId, blockSize, numChunks, bitmaps, +address, _) => + // The original meta request is processed so we decrease numBlocksToFetch by 1. We will + // collect new chunks request and the count of this is added to numBlocksToFetch in + // collectFetchReqsFromMergedBlocks. + numBlocksToFetch -= 1 + val blocksToRequest = pushBasedFetchHelper.createChunkBlockInfosFromMetaResponse( +shuffleId, reduceId, blockSize, numChunks, bitmaps) + val additionalRemoteReqs = new ArrayBuffer[FetchRequest] + collectFetchRequests(address, blocksToRequest.toSeq, additionalRemoteReqs) + fetchRequests ++= additionalRemoteReqs + // Set result to null to force another iteration. + result = null Review comment: Actually, this is the existing code which I haven't modified. The while loop inside iterator.next() is as below, so `fetchUpToMaxBytes` is always called after a response is matched and processed. ``` while (result == null) { val startFetchWait = System.nanoTime() result = results.take() val fetchWaitTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startFetchWait) shuffleMetrics.incFetchWaitTime(fetchWaitTime) result match {...} // Send fetch requests up to maxBytesInFlight fetchUpToMaxBytes() } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.
cloud-fan commented on a change in pull request #32841: URL: https://github.com/apache/spark/pull/32841#discussion_r648847167 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala ## @@ -30,7 +30,8 @@ import org.apache.spark.sql.catalyst.trees.TreePattern.{TreePattern, UNRESOLVED_ case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan) extends UnaryNode { - override lazy val resolved: Boolean = false + override lazy val resolved: Boolean = child.resolved Review comment: Yea, it's just for sanity check, to make sure `UnresolvedHint` shouldn't exist after analysis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection
AmplabJenkins commented on pull request #32815: URL: https://github.com/apache/spark/pull/32815#issuecomment-858296344 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44132/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection
SparkQA commented on pull request #32815: URL: https://github.com/apache/spark/pull/32815#issuecomment-858296323 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44132/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
cloud-fan commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648846235 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private( val desc = if (isLocalReader) { "local" } else if (hasCoalescedPartition && hasSkewedPartition) { - "coalesced and skewed" + s"$coalescedDetail and $skewedDetail" } else if (hasCoalescedPartition) { - "coalesced" + coalescedDetail } else if (hasSkewedPartition) { - "skewed" + skewedDetail } else { "" } Iterator(desc) } + private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1 + /** + * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]] + */ + private def coalesceRange(spec: ShufflePartitionSpec) = spec match { +case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex +case _ => 0 + } + + /* This is left as documentation + * Is it worth reporting this? For example, if we have + * MapOutputStatistics 0,0,0,72,0 + * MapOutputStatistics 0,0,0,138,138 + * with target partition size 10, we'll have + * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partitions 0,1,2 are dropped + * Another example, (target size 10) + * MapOutputStatistics 0,3,0,2,7 + * MapOutputStatistics 0,2,0,2,7 + * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partition 2 is included + * We could figure out dropped partitions but doesn't seem that useful. Review comment: I don't think it's useful to report this metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
cloud-fan commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648845848 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private( val desc = if (isLocalReader) { "local" } else if (hasCoalescedPartition && hasSkewedPartition) { - "coalesced and skewed" + s"$coalescedDetail and $skewedDetail" Review comment: AFAIK plan node string never contains metrics before this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648845832 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator { */ private[storage] case class DeferFetchRequestResult(fetchRequest: FetchRequest) extends FetchResult + + /** + * Result of a fetch from a remote merged block unsuccessfully. + * Instead of treating this as a FailureFetchResult, we ignore this failure + * and fallback to fetch the original unmerged blocks. + * @param blockId block id + * @param address BlockManager that the merged block was attempted to be fetched from + * @param size size of the block, used to update bytesInFlight. + * @param isNetworkReqDone Is this the last network request for this host in this fetch + * request. Used to update reqsInFlight. + */ + private[storage] case class IgnoreFetchResult(blockId: BlockId, + address: BlockManagerId, + size: Long, + isNetworkReqDone: Boolean) extends FetchResult + + /** + * Result of a successful fetch of meta information for a merged block. + * + * @param shuffleIdshuffle id. + * @param reduceId reduce id. + * @param blockSizesize of each merged block. + * @param numChunksnumber of chunks in the merged block. + * @param bitmaps bitmaps for every chunk. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFetchResult( + shuffleId: Int, + reduceId: Int, + blockSize: Long, + numChunks: Int, + bitmaps: Array[RoaringBitmap], + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult + + /** + * Result of a failure while fetching the meta information for a merged block. + * + * @param shuffleId shuffle id. + * @param reduceId reduce id. + * @param address BlockManager that the merged status was fetched from. + */ + private[storage] case class MergedBlocksMetaFailedFetchResult( + shuffleId: Int, + reduceId: Int, + address: BlockManagerId, + blockId: BlockId = DUMMY_SHUFFLE_BLOCK_ID) extends FetchResult +} + +/** + * Helper class that encapsulates all the push-based functionality to fetch merged block meta + * and merged shuffle block chunks. + */ +private class PushBasedFetchHelper( Review comment: The problem is that `PushBasedFetchHelper` also needs access to the iterator instance. It needs to work with the iterator to be able to: 1. add results to the iterator's `result` queue. 2. updates number of blocks to fetch. 3. fetch fallback blocks when there is a fallback and this in turn removes some pending blocks from `fetchRequests`. This is why it is a helper class similar to the existing `BufferReleasingInputStream` and `ShuffleFetchCompletionListener`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
cloud-fan commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648845537 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -171,6 +238,19 @@ case class CustomShuffleReaderExec private( } else { Map.empty } + } ++ { +if (isLocalReader) { + Map.empty +} else { + if (hasCoalescedPartition) { +Map("numCoalescedPartitions" -> + SQLMetrics.createMetric(sparkContext, "number of coalesced partitions"), + "numPartitionsToCoalesce" -> Review comment: There is always a shuffle node below `CustomShuffleReader`. I think it makes more sense to let the shuffle node to report the metrics of the number of partitions/reducers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
cloud-fan commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648844909 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private( val desc = if (isLocalReader) { "local" } else if (hasCoalescedPartition && hasSkewedPartition) { - "coalesced and skewed" + s"$coalescedDetail and $skewedDetail" Review comment: It makes sense to add more metrics but it doesn't make sense to include metrics in the plan node string. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
SparkQA commented on pull request #32776: URL: https://github.com/apache/spark/pull/32776#issuecomment-858294837 **[Test build #139613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139613/testReport)** for PR 32776 at commit [`aee1392`](https://github.com/apache/spark/commit/aee1392720815f332e8fb993b4672bb03fe4ccb1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
AmplabJenkins removed a comment on pull request #32776: URL: https://github.com/apache/spark/pull/32776#issuecomment-854245118 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
cloud-fan commented on pull request #32776: URL: https://github.com/apache/spark/pull/32776#issuecomment-858294433 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions
cloud-fan closed pull request #32842: URL: https://github.com/apache/spark/pull/32842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions
cloud-fan commented on pull request #32842: URL: https://github.com/apache/spark/pull/32842#issuecomment-858292148 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858290874 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139611/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858290656 **[Test build #139611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)** for PR 32821 at commit [`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
AmplabJenkins commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858290874 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139611/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858290866 **[Test build #139611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)** for PR 32821 at commit [`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858290812 **[Test build #139612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139612/testReport)** for PR 32470 at commit [`d08d8e4`](https://github.com/apache/spark/commit/d08d8e4cb6acfaff34e3d81cc8b41c65aa34f4f6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-858290656 **[Test build #139611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139611/testReport)** for PR 32821 at commit [`b3168ac`](https://github.com/apache/spark/commit/b3168ac30f2d99653ef29fe80e968836ca956fe0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858290567 **[Test build #139610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139610/testReport)** for PR 32852 at commit [`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31024: [SPARK-33979][SQL] Reorder predicate
AmplabJenkins removed a comment on pull request #31024: URL: https://github.com/apache/spark/pull/31024#issuecomment-858289716 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139600/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
AmplabJenkins removed a comment on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-858289714 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139601/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions
AmplabJenkins removed a comment on pull request #32842: URL: https://github.com/apache/spark/pull/32842#issuecomment-858289713 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44133/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions
AmplabJenkins commented on pull request #32842: URL: https://github.com/apache/spark/pull/32842#issuecomment-858289713 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44133/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31024: [SPARK-33979][SQL] Reorder predicate
AmplabJenkins commented on pull request #31024: URL: https://github.com/apache/spark/pull/31024#issuecomment-858289716 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139600/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
AmplabJenkins commented on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-858289714 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139601/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ekoifman commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
ekoifman commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648838734 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private( val desc = if (isLocalReader) { "local" } else if (hasCoalescedPartition && hasSkewedPartition) { - "coalesced and skewed" + s"$coalescedDetail and $skewedDetail" } else if (hasCoalescedPartition) { - "coalesced" + coalescedDetail } else if (hasSkewedPartition) { - "skewed" + skewedDetail } else { "" } Iterator(desc) } + private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1 + /** + * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]] + */ + private def coalesceRange(spec: ShufflePartitionSpec) = spec match { +case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex +case _ => 0 + } + + /* This is left as documentation + * Is it worth reporting this? For example, if we have + * MapOutputStatistics 0,0,0,72,0 + * MapOutputStatistics 0,0,0,138,138 + * with target partition size 10, we'll have + * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partitions 0,1,2 are dropped + * Another example, (target size 10) + * MapOutputStatistics 0,3,0,2,7 + * MapOutputStatistics 0,2,0,2,7 + * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partition 2 is included + * We could figure out dropped partitions but doesn't seem that useful. + */ + private def numDroppedPartitions = 0 + + private def numCoalescedPartitions = partitionSpecs.count(isCoalesced) + + /** + * partitions that will be combined with others (as opposed to taken as is, spilt, dropped) + */ + private def numPartitionsToCoalesce = partitionSpecs.filter(isCoalesced) +.foldLeft(0)((c, s) => c + coalesceRange(s)) + + /** + * total splits of all skewed partitions + */ + private def skewedPartitionSplits = partitionSpecs.collect { +case p: PartialReducerPartitionSpec => p + } - def hasCoalescedPartition: Boolean = -partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec]) Review comment: I don't understand this comment. This change is a critical part of this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858287872 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-858287594 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44135/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
beliefer commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858287344 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858286065 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44134/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE
ulysses-you commented on a change in pull request #32776: URL: https://github.com/apache/spark/pull/32776#discussion_r648834767 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private( val desc = if (isLocalReader) { "local" } else if (hasCoalescedPartition && hasSkewedPartition) { - "coalesced and skewed" + s"$coalescedDetail and $skewedDetail" } else if (hasCoalescedPartition) { - "coalesced" + coalescedDetail } else if (hasSkewedPartition) { - "skewed" + skewedDetail } else { "" } Iterator(desc) } + private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1 + /** + * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]] + */ + private def coalesceRange(spec: ShufflePartitionSpec) = spec match { +case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex +case _ => 0 + } + + /* This is left as documentation + * Is it worth reporting this? For example, if we have + * MapOutputStatistics 0,0,0,72,0 + * MapOutputStatistics 0,0,0,138,138 + * with target partition size 10, we'll have + * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partitions 0,1,2 are dropped + * Another example, (target size 10) + * MapOutputStatistics 0,3,0,2,7 + * MapOutputStatistics 0,2,0,2,7 + * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5) + * So pre-shuffle partition 2 is included + * We could figure out dropped partitions but doesn't seem that useful. + */ + private def numDroppedPartitions = 0 + + private def numCoalescedPartitions = partitionSpecs.count(isCoalesced) + + /** + * partitions that will be combined with others (as opposed to taken as is, spilt, dropped) + */ + private def numPartitionsToCoalesce = partitionSpecs.filter(isCoalesced) +.foldLeft(0)((c, s) => c + coalesceRange(s)) + + /** + * total splits of all skewed partitions + */ + private def skewedPartitionSplits = partitionSpecs.collect { +case p: PartialReducerPartitionSpec => p + } - def hasCoalescedPartition: Boolean = -partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec]) Review comment: this change worth a new PR, can you create a new one for it ? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32842: [MINOR][SQL] No need to normolize name for built-in functions
SparkQA commented on pull request #32842: URL: https://github.com/apache/spark/pull/32842#issuecomment-858280175 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44133/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes
venkata91 commented on pull request #32754: URL: https://github.com/apache/spark/pull/32754#issuecomment-858275251 Gentle reminder @HyukjinKwon @cloud-fan @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
SparkQA removed a comment on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-858168705 **[Test build #139601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139601/testReport)** for PR 32787 at commit [`1704a9a`](https://github.com/apache/spark/commit/1704a9a8c54fab6b0b450c4fe32be444dc81df20). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
SparkQA commented on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-858258406 **[Test build #139601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139601/testReport)** for PR 32787 at commit [`1704a9a`](https://github.com/apache/spark/commit/1704a9a8c54fab6b0b450c4fe32be444dc81df20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31024: [SPARK-33979][SQL] Reorder predicate
SparkQA removed a comment on pull request #31024: URL: https://github.com/apache/spark/pull/31024#issuecomment-858167571 **[Test build #139600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139600/testReport)** for PR 31024 at commit [`dc4419a`](https://github.com/apache/spark/commit/dc4419a233ea154aaba6c1842f0417bcd29aed61). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31024: [SPARK-33979][SQL] Reorder predicate
SparkQA commented on pull request #31024: URL: https://github.com/apache/spark/pull/31024#issuecomment-858257375 **[Test build #139600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139600/testReport)** for PR 31024 at commit [`dc4419a`](https://github.com/apache/spark/commit/dc4419a233ea154aaba6c1842f0417bcd29aed61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins removed a comment on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858254178 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139606/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA removed a comment on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858252068 **[Test build #139606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139606/testReport)** for PR 32852 at commit [`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858254178 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139606/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-858254163 **[Test build #139606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139606/testReport)** for PR 32852 at commit [`716a50c`](https://github.com/apache/spark/commit/716a50cf9f48ac9fab05b8860c9c5e714c729610). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
AmplabJenkins removed a comment on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858252760 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139609/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA removed a comment on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858252267 **[Test build #139609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139609/testReport)** for PR 32470 at commit [`e33c0a8`](https://github.com/apache/spark/commit/e33c0a87388d587f18f1e7e21b9b2170dab7b695). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
AmplabJenkins commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858252760 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139609/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32470: [WIP] Simplify ResolveAggregateFunctions
SparkQA commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-858252747 **[Test build #139609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139609/testReport)** for PR 32470 at commit [`e33c0a8`](https://github.com/apache/spark/commit/e33c0a87388d587f18f1e7e21b9b2170dab7b695). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org