[GitHub] [spark] SparkQA commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
SparkQA commented on pull request #31983: URL: https://github.com/apache/spark/pull/31983#issuecomment-810821233 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41331/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30965: [SPARK-33935][SQL] Fix CBO cost function
cloud-fan commented on a change in pull request #30965: URL: https://github.com/apache/spark/pull/30965#discussion_r604638838 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt ## @@ -6,71 +6,71 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact] WholeStageCodegen (12) HashAggregate [i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum] Project [ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip] - InputAdapter -WholeStageCodegen (5) - Sort [ss_customer_sk] -InputAdapter - Exchange [ss_customer_sk] #2 -WholeStageCodegen (4) - Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip] -BroadcastHashJoin [ss_store_sk,s_store_sk] - Project [ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_sold_date_sk,d_date_sk] - Project [ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_item_sk,i_item_sk] - Filter [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk] -ColumnarToRow - InputAdapter -Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price] +BroadcastHashJoin [ss_item_sk,i_item_sk] Review comment: Thanks for looking into it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
SparkQA commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810820115 **[Test build #136755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136755/testReport)** for PR 32010 at commit [`89990af`](https://github.com/apache/spark/commit/89990af1104e533f7b1ad720475036d8ce0f1865). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #30965: [SPARK-33935][SQL] Fix CBO cost function
tanelk commented on a change in pull request #30965: URL: https://github.com/apache/spark/pull/30965#discussion_r604636299 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt ## @@ -6,71 +6,71 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact] WholeStageCodegen (12) HashAggregate [i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum] Project [ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip] - InputAdapter -WholeStageCodegen (5) - Sort [ss_customer_sk] -InputAdapter - Exchange [ss_customer_sk] #2 -WholeStageCodegen (4) - Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip] -BroadcastHashJoin [ss_store_sk,s_store_sk] - Project [ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_sold_date_sk,d_date_sk] - Project [ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_item_sk,i_item_sk] - Filter [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk] -ColumnarToRow - InputAdapter -Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price] +BroadcastHashJoin [ss_item_sk,i_item_sk] Review comment: I'll experiment with it a bit, but it might take a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-810817452 **[Test build #136754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136754/testReport)** for PR 30144 at commit [`7224e01`](https://github.com/apache/spark/commit/7224e01acfe2eed282369cb4a96dadb0e401b627). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP
AngersZh commented on a change in pull request #30144: URL: https://github.com/apache/spark/pull/30144#discussion_r604635105 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -212,3 +212,29 @@ object GroupingID { if (SQLConf.get.integerGroupingIdEnabled) IntegerType else LongType } } + + +object GroupByOperator { Review comment: > `GroupByOperator` -> `GroupingAnalytics`? Just changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP
maropu commented on a change in pull request #30144: URL: https://github.com/apache/spark/pull/30144#discussion_r604634949 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -212,3 +212,29 @@ object GroupingID { if (SQLConf.get.integerGroupingIdEnabled) IntegerType else LongType } } + + +object GroupByOperator { Review comment: `GroupByOperator` -> `GroupingAnalytics`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-810815784 **[Test build #136753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136753/testReport)** for PR 30144 at commit [`005b697`](https://github.com/apache/spark/commit/005b6974d11ed37351f54de8dd43717f7b13aa71). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31932: [SPARK-34906] Refactor TreeNode's children handling methods into specialized traits
cloud-fan commented on a change in pull request #31932: URL: https://github.com/apache/spark/pull/31932#discussion_r604633980 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala ## @@ -27,9 +28,10 @@ import org.apache.spark.sql.types._ * When applied on empty data (i.e., count is zero), it returns NULL. */ abstract class Covariance(x: Expression, y: Expression, nullOnDivideByZero: Boolean) Review comment: we can simply do `val left: Expression, val right: Expression` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-810815432 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41333/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
AmplabJenkins removed a comment on pull request #32011: URL: https://github.com/apache/spark/pull/32011#issuecomment-81081 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41330/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
AmplabJenkins removed a comment on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810814434 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136747/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
AmplabJenkins removed a comment on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810814432 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136740/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
AmplabJenkins removed a comment on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-810814436 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136741/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query
AmplabJenkins removed a comment on pull request #31989: URL: https://github.com/apache/spark/pull/31989#issuecomment-810814443 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136742/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
AmplabJenkins commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-810814436 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136741/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
AmplabJenkins commented on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810814432 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136740/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query
AmplabJenkins commented on pull request #31989: URL: https://github.com/apache/spark/pull/31989#issuecomment-810814443 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136742/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
AmplabJenkins commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810814434 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136747/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
AmplabJenkins commented on pull request #32011: URL: https://github.com/apache/spark/pull/32011#issuecomment-81081 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41330/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
AngersZh commented on pull request #32001: URL: https://github.com/apache/spark/pull/32001#issuecomment-810814418 > As @cloud-fan said we have special functions that convert numbers to timestamps. I quickly look at Oracle, it has similar function for intervals. For example, [NUMTODSINTERVAL](https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions117.htm#SQLRF00682) converts a `NUM` to a `DAY TO SECOND INTERVAL`: > > ``` > NUMTODSINTERVAL(100, 'day') > ``` > > @AngersZh Could you look at other DMBS, and see how they cast intervals from/to numbers. Sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
SparkQA commented on pull request #32011: URL: https://github.com/apache/spark/pull/32011#issuecomment-810814224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31010: [SPARK-33976][SQL] Spark script TRANSFORM related change doc
AngersZh commented on pull request #31010: URL: https://github.com/apache/spark/pull/31010#issuecomment-810814163 > `branch-3.0`/`3.1` does not have a doc for a TRANSFORM clause, so IMO it would be nice to write the common syntaxes of a TRANSFORM clause in this first PR and backport the doc into `branch-3.0`/`3.1`. Then, we can write the other improved syntaxes for master only in following PRs. WDYT? Good suggestion. Will start this a little later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30965: [SPARK-33935][SQL] Fix CBO cost function
cloud-fan commented on a change in pull request #30965: URL: https://github.com/apache/spark/pull/30965#discussion_r604631512 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt ## @@ -6,71 +6,71 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact] WholeStageCodegen (12) HashAggregate [i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum] Project [ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip] - InputAdapter -WholeStageCodegen (5) - Sort [ss_customer_sk] -InputAdapter - Exchange [ss_customer_sk] #2 -WholeStageCodegen (4) - Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip] -BroadcastHashJoin [ss_store_sk,s_store_sk] - Project [ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_sold_date_sk,d_date_sk] - Project [ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_item_sk,i_item_sk] - Filter [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk] -ColumnarToRow - InputAdapter -Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price] +BroadcastHashJoin [ss_item_sk,i_item_sk] Review comment: I think q19 exposes a problem. Previously this `BroadcastHashJoin` is run before the `SortMergeJoin`, which reduces the input data of shuffle, because this `BroadcastHashJoin` has a filter on the right side and likely makes this join very selective. @tanelk , if the idea from @wzhfy doesn't look good to you, can you try with some other ideas and see if we can fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30965: [SPARK-33935][SQL] Fix CBO cost function
cloud-fan commented on a change in pull request #30965: URL: https://github.com/apache/spark/pull/30965#discussion_r604631512 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt ## @@ -6,71 +6,71 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact] WholeStageCodegen (12) HashAggregate [i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum] Project [ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip] - InputAdapter -WholeStageCodegen (5) - Sort [ss_customer_sk] -InputAdapter - Exchange [ss_customer_sk] #2 -WholeStageCodegen (4) - Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip] -BroadcastHashJoin [ss_store_sk,s_store_sk] - Project [ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_sold_date_sk,d_date_sk] - Project [ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact] -BroadcastHashJoin [ss_item_sk,i_item_sk] - Filter [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk] -ColumnarToRow - InputAdapter -Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price] +BroadcastHashJoin [ss_item_sk,i_item_sk] Review comment: I think q19 exposes a problem. Previously this `BroadcastHashJoin` is run before the `SortMergeJoin`, which reduces the input data of shuffle, because this `BroadcastHashJoin` has a filter on the right side and can likely prune many data. @tanelk , if the idea from @wzhfy doesn't look good to you, can you try with some other ideas and see if we can fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #30965: [SPARK-33935][SQL] Fix CBO cost function
tanelk commented on pull request #30965: URL: https://github.com/apache/spark/pull/30965#issuecomment-810810870 @wzhfy and @cloud-fan I'm not a fan of adding up the relative costs. A simple example, where the weight is 0.5: If this plans size (bytes) is 2x larger, then no matter how many times more rows does the other plan have, the other plan will allways be considered to be better - `0.5*2 + 0.5*0.01 > 1`. This basically the same situation, where one cost overwhelms the other. Perhaps this would be a best of both worlds: `(this.card / other.card) ^ cardWeight * (this.size / other.size) ^ (1 - cardWeight) < 1`. In short - multiply the relative costs instead of adding them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
cloud-fan closed pull request #31470: URL: https://github.com/apache/spark/pull/31470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
cloud-fan commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-810808178 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
maropu commented on a change in pull request #31983: URL: https://github.com/apache/spark/pull/31983#discussion_r604626741 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ## @@ -2834,6 +2835,29 @@ class DataFrameSuite extends QueryTest df10.select(zip_with(col("array1"), col("array2"), (b1, b2) => reverseThenConcat2(b1, b2))) checkAnswer(test10, Row(Array(Row("cbaihg"), Row("fedlkj"))) :: Nil) } + + test("SPARK-34882: Aggregate with multiple distinct null sensitive aggregators") { +spark.udf.register("countNulls", udaf(new Aggregator[JLong, JLong, JLong] { Review comment: Ah, okay. I misunderstood it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
SparkQA removed a comment on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810739013 **[Test build #136747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136747/testReport)** for PR 32010 at commit [`7d367e3`](https://github.com/apache/spark/commit/7d367e38e625a1007e1922ac3fb17da9d17647d6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
cloud-fan commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810807243 @viirya how about the history server? I'm a bit worried about the event log with v2 metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
SparkQA commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810806977 **[Test build #136747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136747/testReport)** for PR 32010 at commit [`7d367e3`](https://github.com/apache/spark/commit/7d367e38e625a1007e1922ac3fb17da9d17647d6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
SparkQA removed a comment on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-810691959 **[Test build #136741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136741/testReport)** for PR 31470 at commit [`f0c7ce4`](https://github.com/apache/spark/commit/f0c7ce423009e9465ec614c9e4c64781229e1f19). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join
SparkQA commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-810806264 **[Test build #136741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136741/testReport)** for PR 31470 at commit [`f0c7ce4`](https://github.com/apache/spark/commit/f0c7ce423009e9465ec614c9e4c64781229e1f19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
gatorsmile commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-810802704 This is awesome! We should do it 5 years ago. :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31982: [SPARK-34881][SQL] New SQL Function: TRY_CAST
maropu commented on a change in pull request #31982: URL: https://github.com/apache/spark/pull/31982#discussion_r604621607 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.types.DataType + +/** + * A special version of [[AnsiCast]]. It performs the same operation (i.e. converts a value of + * one data type into another data type), but returns a NULL value instead of raising an error + * when the conversion can not be performed. + * + * When cast from/to timezone related types, we need timeZoneId, which will be resolved with + * session local timezone by an analyzer [[ResolveTimeZone]]. + */ +@ExpressionDescription( + usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`. " + +"This expression is identical to CAST with configuration `spark.sql.ansi.enabled` as " + +"true, except it returns NULL instead of raising an error. Note that the behavior of this " + +"expression doesn't depend on configuration `spark.sql.ansi.enabled`.", + examples = """ +Examples: + > SELECT _FUNC_('10' as int); + 10 + > SELECT _FUNC_(1234567890123L as int); + null + """, + since = "3.2.0", + group = "conversion_funcs") Review comment: Nice! Thanks for letting me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] timarmstrong commented on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
timarmstrong commented on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810801024 Thanks for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #31984: [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint
wangyum commented on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-810800714 Benchmark result(spark.sql.adaptive.enabled=false): SQL | Before(spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=true) | After(spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false) -- | -- | -- 58 | 144 | 21 73 | 8 | 7 83 | 25 | 14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
viirya commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810799188 @cloud-fan Captured a screenshot and attached in the description. The DS v2 uses the same custom metrics as I added in `SQLAppStatusListenerSuite`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
SparkQA removed a comment on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810691742 **[Test build #136740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136740/testReport)** for PR 32006 at commit [`3e25454`](https://github.com/apache/spark/commit/3e254540a7e3f77c3b5db4bacb17f3b9332bf8de). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
SparkQA commented on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810796518 **[Test build #136740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136740/testReport)** for PR 32006 at commit [`3e25454`](https://github.com/apache/spark/commit/3e254540a7e3f77c3b5db4bacb17f3b9332bf8de). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
viirya commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810792543 Okay. Let me have a simple test DS v2 locally and capture some screenshots of the web UI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query
SparkQA removed a comment on pull request #31989: URL: https://github.com/apache/spark/pull/31989#issuecomment-810695572 **[Test build #136742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136742/testReport)** for PR 31989 at commit [`25bbd47`](https://github.com/apache/spark/commit/25bbd4772a08b0c19d1cd305ef82d26b922a21e9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query
SparkQA commented on pull request #31989: URL: https://github.com/apache/spark/pull/31989#issuecomment-810790360 **[Test build #136742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136742/testReport)** for PR 31989 at commit [`25bbd47`](https://github.com/apache/spark/commit/25bbd4772a08b0c19d1cd305ef82d26b922a21e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31680: [SPARK-34568][SQL] When SparkContext's conf not enable hive, we should respect `enableHiveSupport()` when build SparkSession too
cloud-fan closed pull request #31680: URL: https://github.com/apache/spark/pull/31680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31680: [SPARK-34568][SQL] When SparkContext's conf not enable hive, we should respect `enableHiveSupport()` when build SparkSession too
cloud-fan commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-810789864 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31982: [SPARK-34881][SQL] New SQL Function: TRY_CAST
SparkQA commented on pull request #31982: URL: https://github.com/apache/spark/pull/31982#issuecomment-810789707 **[Test build #136752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136752/testReport)** for PR 31982 at commit [`9266934`](https://github.com/apache/spark/commit/92669341d5eb849c853104c2a8052ec81fb2a4e5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation
cloud-fan commented on a change in pull request #31653: URL: https://github.com/apache/spark/pull/31653#discussion_r604614313 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala ## @@ -251,48 +253,129 @@ object OptimizeSkewedJoin extends CustomShuffleReaderRule { } } + /** + * A potential stage is from Exchange down. Actual [[QueryStageExec]] nodes are created + * by [[AdaptiveSparkPlanExec.newQueryStage]] bounded by previously created [[QueryStageExec]] + * nodes below. + * Todo: need better way to identify which join the log msgs below refer to. Tags? + */ + private def handlePotentialQueryStage(plan: SparkPlan): SparkPlan = { +val shuffleStages = collectShuffleStages(plan) +val s = ExplainUtils.getAQELogPrefix(shuffleStages) + +if (shuffleStages.length != 2 && !conf.adaptiveForceIfShuffle) { + /* Consider Case II. Shuffle above SMJ1. We should see 3 SQSE nodes but + with adaptiveForceIfShuffle() we should be able to add a new shuffle + above SMJ2 to enable skew mitigation of SMJ2. W/o ability to add a new + shuffle skew mitigation is still possible in some cases - to be handled later. + + Add a test for this. + See test("skew in deeply nested join - test ShuffleAddedException") and + add a similar test with just 2 joins */ + logInfo(s"OptimizeSkewedJoin: rule is not applied since" + +s" shuffleStages.length=${shuffleStages.length} != 2 and " + +s"${SQLConf.ADAPTIVE_FORCE_IF_SHUFFLE.key}=false; $s") + return plan +} +val numShufflesBefore = plan.collect { + case e: ShuffleExchangeExec => e +}.length +val mitigatedPlan = optimizeSkewJoin(plan) +if (mitigatedPlan eq plan) { + return plan +} +val executedPlan = ensureRequirements.apply(mitigatedPlan) +val numNewShuffles = executedPlan.collect { + case e: ShuffleExchangeExec => e +}.length - numShufflesBefore +if(numNewShuffles > 0) { + if (conf.adaptiveForceIfShuffle) { +logInfo(s"OptimizeSkewedJoin: rule is applied. " + + s"$numNewShuffles additional shuffles will be introduced; $s") +executedPlan // make sure to return plan with new shuffles + } else { +logInfo(s"OptimizeSkewedJoin: rule is not applied due" + + s" to $numNewShuffles additional shuffles will be introduced; $s") +plan + } +} else { + executedPlan +} + } + + def collectShuffleStages(plan: SparkPlan): Seq[ShuffleQueryStageExec] = plan match { +case stage: ShuffleQueryStageExec => Seq(stage) +case _ => plan.children.flatMap(collectShuffleStages) + } + /** + * Now this runs as part of queryStagePreparationRules() which means it runs over the whole plan + * which may have any number of ExchangeExec nodes, i.e. multiple "query stages" Review comment: Maybe we don't have to be optimal in the first version. We can optimize all the leaf SMJs, and revert all of them if extra shuffles are introduced. The optimal solution is to find out which SMJ caused the extra shuffle and only revert it. We can do it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
MaxGekk commented on pull request #32001: URL: https://github.com/apache/spark/pull/32001#issuecomment-810787472 As @cloud-fan said we have special functions that convert numbers to timestamps. I quickly look at Oracle, it has similar function for intervals. For example, [NUMTODSINTERVAL](https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions117.htm#SQLRF00682) converts a `NUM` to a `DAY TO SECOND INTERVAL`: ``` NUMTODSINTERVAL(100, 'day') ``` @AngersZh Could you look at other DMBS, and see how they cast intervals from/to numbers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #31982: [SPARK-34881][SQL] New SQL Function: TRY_CAST
gengliangwang commented on a change in pull request #31982: URL: https://github.com/apache/spark/pull/31982#discussion_r604612597 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.types.DataType + +/** + * A special version of [[AnsiCast]]. It performs the same operation (i.e. converts a value of + * one data type into another data type), but returns a NULL value instead of raising an error + * when the conversion can not be performed. + * + * When cast from/to timezone related types, we need timeZoneId, which will be resolved with + * session local timezone by an analyzer [[ResolveTimeZone]]. + */ +@ExpressionDescription( + usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`. " + +"This expression is identical to CAST with configuration `spark.sql.ansi.enabled` as " + +"true, except it returns NULL instead of raising an error. Note that the behavior of this " + +"expression doesn't depend on configuration `spark.sql.ansi.enabled`.", + examples = """ +Examples: + > SELECT _FUNC_('10' as int); + 10 + > SELECT _FUNC_(1234567890123L as int); + null + """, + since = "3.2.0", + group = "conversion_funcs") Review comment: Actually I plan to create docs for both CAST and TRY_CAST, even with ANSI CAST. Grouping them into one section. I have created https://issues.apache.org/jira/browse/SPARK-34917 and https://issues.apache.org/jira/browse/SPARK-34918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path
SparkQA commented on pull request #30057: URL: https://github.com/apache/spark/pull/30057#issuecomment-810783077 **[Test build #136751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136751/testReport)** for PR 30057 at commit [`81b1bd8`](https://github.com/apache/spark/commit/81b1bd817a30b5a31026d5841c2ba7189598e3b4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support GROUP BY use Separate columns and CUBE/ROLLUP
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-810783018 **[Test build #136750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136750/testReport)** for PR 30144 at commit [`f5763e8`](https://github.com/apache/spark/commit/f5763e8580ebb70a2c89679852e1e2301d58641d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
cloud-fan commented on pull request #32001: URL: https://github.com/apache/spark/pull/32001#issuecomment-810782597 > this conversion could be safe It's not about safe or not. It's about how to make the behavior easy to understand for end-users. CAST is a standard SQL operator, and I don't think it makes sense that casting integral value to day-time value should treat the input as microseconds, simply because in Spark the precision is microseconds. This behavior should not be vendor-specific. Ideally the data source should have native support of interval type to store them. If it does not, we can provide some new functions to convert between int/long and year-month/day-time interval, similar to these timestamp functions. We can also do some research and see how other databases handle this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
SparkQA commented on pull request #31983: URL: https://github.com/apache/spark/pull/31983#issuecomment-810782549 **[Test build #136749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136749/testReport)** for PR 31983 at commit [`2530e89`](https://github.com/apache/spark/commit/2530e89304c874bee785eefcb8b8c09648046d17). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
AmplabJenkins removed a comment on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810780681 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810780686 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136744/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
AmplabJenkins commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810780681 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810780686 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136744/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
AngersZh commented on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810775087 Good catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810715229 **[Test build #136744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136744/testReport)** for PR 31204 at commit [`13e3692`](https://github.com/apache/spark/commit/13e36921cf9898ab83da8b8bc802b8a3edb36a29). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810773894 **[Test build #136744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136744/testReport)** for PR 31204 at commit [`13e3692`](https://github.com/apache/spark/commit/13e36921cf9898ab83da8b8bc802b8a3edb36a29). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
MaxGekk closed pull request #31996: URL: https://github.com/apache/spark/pull/31996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
MaxGekk commented on pull request #31996: URL: https://github.com/apache/spark/pull/31996#issuecomment-810769945 Thank you @cloud-fan @AngersZh for your review. Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
SparkQA removed a comment on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810719159 **[Test build #136746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136746/testReport)** for PR 32009 at commit [`369e08b`](https://github.com/apache/spark/commit/369e08b2e39b09868238db00b14db4a0eb526ddc). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
SparkQA commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810769301 **[Test build #136746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136746/testReport)** for PR 32009 at commit [`369e08b`](https://github.com/apache/spark/commit/369e08b2e39b09868238db00b14db4a0eb526ddc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
ulysses-you commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810767033 thank you for taking a look @HyukjinKwon @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31993: [SPARK-34897][SQL] Add workaround to error message when OrcUtils.requestedColumnIds fails
cloud-fan commented on pull request #31993: URL: https://github.com/apache/spark/pull/31993#issuecomment-810765696 Sorry I may miss something. Why it's only a problem in nested column pruning but not column pruning? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
tanelk commented on a change in pull request #31983: URL: https://github.com/apache/spark/pull/31983#discussion_r604597457 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ## @@ -2834,6 +2835,29 @@ class DataFrameSuite extends QueryTest df10.select(zip_with(col("array1"), col("array2"), (b1, b2) => reverseThenConcat2(b1, b2))) checkAnswer(test10, Row(Array(Row("cbaihg"), Row("fedlkj"))) :: Nil) } + + test("SPARK-34882: Aggregate with multiple distinct null sensitive aggregators") { +spark.udf.register("countNulls", udaf(new Aggregator[JLong, JLong, JLong] { Review comment: I added the `withUserDefinedFunction`, but I did not understand the first question. I added this udaf, because the built in aggregates, that are "null sensitive" (`First` and `Last`) gave unstable test results in the `SQLQueryTestSuite`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
SparkQA commented on pull request #32011: URL: https://github.com/apache/spark/pull/32011#issuecomment-810761488 **[Test build #136748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136748/testReport)** for PR 32011 at commit [`642d7c0`](https://github.com/apache/spark/commit/642d7c09f604c2912042e1b863a6750d86184170). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
cloud-fan closed pull request #32006: URL: https://github.com/apache/spark/pull/32006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
cloud-fan commented on pull request #32006: URL: https://github.com/apache/spark/pull/32006#issuecomment-810761260 thanks, merging to master/3.1/3.0/2.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
HyukjinKwon commented on pull request #32011: URL: https://github.com/apache/spark/pull/32011#issuecomment-810761149 cc @dongjoon-hyun, @gengliangwang and @maropu FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #32011: [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them
HyukjinKwon opened a new pull request #32011: URL: https://github.com/apache/spark/pull/32011 ### What changes were proposed in this pull request? This PR proposes to cache Maven, SBT and Scala in all jobs that use them. For simplicity, we use the same key `build-` and just cache all SBT, Maven and Scala. The cache is not very large. ### Why are the changes needed? To speed up the build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It will be tested in this PR's GA jobs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32006: [SPARK-34909][SQL] Fix conversion of negative to unsigned in conv()
cloud-fan commented on a change in pull request #32006: URL: https://github.com/apache/spark/pull/32006#discussion_r604593841 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala ## @@ -52,7 +32,7 @@ object NumberConverter { java.util.Arrays.fill(value, 0.asInstanceOf[Byte]) var i = value.length - 1 while (tmpV != 0) { - val q = unsignedLongDiv(tmpV, radix) + val q = java.lang.Long.divideUnsigned(tmpV, radix) Review comment: yea we should use the standard Java API as possible as we can. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
cloud-fan commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810758921 @viirya Can we write a simple DS v2 with metrics and try it locally? Then we can get some screenshots of the web UI, and also verify the history server. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
AmplabJenkins removed a comment on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810755166 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136738/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
AmplabJenkins removed a comment on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810755134 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41329/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
AmplabJenkins commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810755166 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136738/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
AmplabJenkins commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810755134 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41329/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
SparkQA commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810755051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32005: [SPARK-34907][TESTS] Add main class that detects and runs all benchmarks
HyukjinKwon closed pull request #32005: URL: https://github.com/apache/spark/pull/32005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32005: [SPARK-34907][TESTS] Add main class that detects and runs all benchmarks
HyukjinKwon commented on pull request #32005: URL: https://github.com/apache/spark/pull/32005#issuecomment-810754861 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32005: [SPARK-34907][TESTS] Add main class that detects and runs all benchmarks
HyukjinKwon commented on pull request #32005: URL: https://github.com/apache/spark/pull/32005#issuecomment-810754656 Thanks guys. Let me merge this in first and proceed (it won't break or affect anything in our CI anyway). I am working on SPARK-34821 now. Let's see how it goes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
SparkQA removed a comment on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810647889 **[Test build #136738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136738/testReport)** for PR 31451 at commit [`d5d8678`](https://github.com/apache/spark/commit/d5d867880ebb57c49ac422251ba50bbabf1159d1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan
SparkQA commented on pull request #31451: URL: https://github.com/apache/spark/pull/31451#issuecomment-810745261 **[Test build #136738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136738/testReport)** for PR 31451 at commit [`d5d8678`](https://github.com/apache/spark/commit/d5d867880ebb57c49ac422251ba50bbabf1159d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #31804: [SPARK-34710][SQL] Add tableType column for SHOW TABLES to distinguish view and tables
yaooqinn commented on pull request #31804: URL: https://github.com/apache/spark/pull/31804#issuecomment-810740574 cc @cloud-fan @HyukjinKwon PTAL, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
AmplabJenkins removed a comment on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810739994 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41328/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
SparkQA commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810739984 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41328/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
AmplabJenkins commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810739994 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41328/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
SparkQA commented on pull request #32010: URL: https://github.com/apache/spark/pull/32010#issuecomment-810739013 **[Test build #136747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136747/testReport)** for PR 32010 at commit [`7d367e3`](https://github.com/apache/spark/commit/7d367e38e625a1007e1922ac3fb17da9d17647d6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810738653 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41326/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #32010: [SPARK-34908][SQL] Add test cases for char and varchar with functions
yaooqinn opened a new pull request #32010: URL: https://github.com/apache/spark/pull/32010 ### What changes were proposed in this pull request? Using char and varchar with the string functions and some other expressions might be confusing and ambiguous. In this PR we add test cases for char and varchar with these operations to reveal these behavior and see if we can come up with a general pattern for them. ### Why are the changes needed? test coverage ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810738637 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41326/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810738653 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41326/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
AmplabJenkins removed a comment on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-810737603 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136739/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
AmplabJenkins commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-810737603 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136739/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32009: [SPARK-34914][CORE] Local scheduler backend support update token
SparkQA commented on pull request #32009: URL: https://github.com/apache/spark/pull/32009#issuecomment-810737551 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41328/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
SparkQA removed a comment on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-810648089 **[Test build #136739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136739/testReport)** for PR 29087 at commit [`1278705`](https://github.com/apache/spark/commit/12787053aec9d015506d5c59c58e91dd23d5bb82). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
SparkQA commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-810737064 **[Test build #136739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136739/testReport)** for PR 29087 at commit [`1278705`](https://github.com/apache/spark/commit/12787053aec9d015506d5c59c58e91dd23d5bb82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-810735545 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41326/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org