Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-04-08 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2043308510 Thanks @sunchao and all for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-04-05 Thread via GitHub
sunchao closed pull request #45267: [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal URL: https://github.com/apache/spark/pull/45267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-04-05 Thread via GitHub
sunchao commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2040912874 Thanks, merged to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-04-03 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2035996293 @sunchao thanks can you take another look? I think the failed test is not related (looks like error in pyspark uploading test report). -- This is an automated message from the

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-04-02 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1538630610 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-25 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2019159327 @sunchao if you have time for another look, reverted to use specific argument type and method for bucket, and worry about other parameterized transforms later. -- This is an

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-25 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1538195196 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-25 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1538195196 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-22 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1536551512 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-22 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2016084578 @sunchao thanks ! addressed review comments. Lmk if supporting generic function parameter is too messy, and we want to switch back to just bucket for first cut. -- This is an

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-22 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1536299907 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-22 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1536300235 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-22 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1535145417 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -846,6 +879,20 @@ case class KeyGroupedShuffleSpec( } }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-20 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-20 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2008733057 @sunchao can you take another look? I guess open question is how to support other potential parameterized transforms in the future, let me know your thoughts. -- This is an

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-20 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-19 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-19 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-18 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1529561359 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-18 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1529562057 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -505,11 +506,28 @@ case class EnsureRequirements( }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-18 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1529561359 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-18 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1529047571 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/Reducer.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523710658 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -829,20 +846,59 @@ case class KeyGroupedShuffleSpec( }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523707057 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -829,20 +846,59 @@ case class KeyGroupedShuffleSpec( }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523706702 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-12 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1521800234 ## sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala: ## @@ -1310,6 +1314,312 @@ class KeyGroupedPartitioningSuite extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-12 Thread via GitHub
advancedxy commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1521751659 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-12 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1521729517 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-12 Thread via GitHub
advancedxy commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1521439073 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520267117 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520187304 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-28 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1506339765 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-28 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1506338476 ## sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala: ## @@ -1310,6 +1314,312 @@ class KeyGroupedPartitioningSuite extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-28 Thread via GitHub
advancedxy commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1505984397 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-28 Thread via GitHub
advancedxy commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1505905928 ## sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala: ## @@ -1310,6 +1314,312 @@ class KeyGroupedPartitioningSuite extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-27 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-1967323627 Thanks @himadripal I was able to modify one of my tests to also reproduce it, and it should be fixed now. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-26 Thread via GitHub
sunchao commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-1965888375 Thanks @szehon-ho ! will take a look in a few days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-26 Thread via GitHub
himadripal commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-1965681861 was trying this test case, `val partition1 = Array(Expressions.years("ts"), bucket(2, "id")) val partition2 = Array(Expressions.days("ts"), bucket(4, "id"))`

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-26 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1503537598 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1537,6 +1537,18 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-02-26 Thread via GitHub
himadripal commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1503514183 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1537,6 +1537,18 @@ object SQLConf { .booleanConf