[spark] branch master updated: [SPARK-40772][SQL] Improve `spark.sql.adaptive.skewJoin.skewedPartitionFactor` to support `Double` values

dongjoon Wed, 12 Oct 2022 14:36:48 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 8251f876fd6 [SPARK-40772][SQL] Improve 
`spark.sql.adaptive.skewJoin.skewedPartitionFactor` to support `Double` values
8251f876fd6 is described below

commit 8251f876fd6a0300bbdac415091786dda3957532
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Wed Oct 12 14:36:12 2022 -0700

    [SPARK-40772][SQL] Improve 
`spark.sql.adaptive.skewJoin.skewedPartitionFactor` to support `Double` values
    
    ### What changes were proposed in this pull request?
    
    This PR aims to improve `spark.sql.adaptive.skewJoin.skewedPartitionFactor` 
to support float values by converging to `doubleConf` from `intConf`.
    
    ### Why are the changes needed?
    
    Like `spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor`, this 
allows users to use the configuration more flexibly.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, but it will accept all previous Integer configuration values.
    
    ### How was this patch tested?
    
    Pass the CIs with the changed default value, `5.0`.
    
    Closes #38225 from dongjoon-hyun/SPARK-40772.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 docs/sql-performance-tuning.md                                        | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala        | 4 ++--
 .../org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 9acfa5f2db4..d736ff8f83f 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -329,7 +329,7 @@ Data skew can severely downgrade the performance of join 
queries. This feature d
      </tr>
      <tr>
        <td><code>spark.sql.adaptive.skewJoin.skewedPartitionFactor</code></td>
-       <td>5</td>
+       <td>5.0</td>
        <td>
          A partition is considered as skewed if its size is larger than this 
factor multiplying the median partition size and also larger than 
<code>spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes</code>.
        </td>
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 161c4092f60..2f96209222b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -723,9 +723,9 @@ object SQLConf {
         "multiplying the median partition size and also larger than " +
         "'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes'")
       .version("3.0.0")
-      .intConf
+      .doubleConf
       .checkValue(_ >= 0, "The skew factor cannot be negative.")
-      .createWithDefault(5)
+      .createWithDefault(5.0)
 
   val SKEW_JOIN_SKEWED_PARTITION_THRESHOLD =
     buildConf("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index d4a173bb9cc..37cdea084d8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -64,7 +64,7 @@ case class OptimizeSkewedJoin(ensureRequirements: 
EnsureRequirements)
    */
   def getSkewThreshold(medianSize: Long): Long = {
     conf.getConf(SQLConf.SKEW_JOIN_SKEWED_PARTITION_THRESHOLD).max(
-      medianSize * conf.getConf(SQLConf.SKEW_JOIN_SKEWED_PARTITION_FACTOR))
+      (medianSize * 
conf.getConf(SQLConf.SKEW_JOIN_SKEWED_PARTITION_FACTOR)).toLong)
   }
 
   /**


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40772][SQL] Improve `spark.sql.adaptive.skewJoin.skewedPartitionFactor` to support `Double` values

Reply via email to