wzhfy edited a comment on pull request #30965:
URL: https://github.com/apache/spark/pull/30965#issuecomment-773864841


   @tanelk Hi, sorry to see this so late.  
   
   IIRC the reason to use a relative value for rowCount and size, is to 
normalize them to a similar scale while comparing cost. Otherwise, one (size) 
may overwhelm the other (rowCount), then the weight and cost function become 
meaningless.  
   
   To resolve the stability issue, can we have a `betterPlan()` function 
instead of current `betterThan()`? Inside that plan, we can fix the comparison 
order based on rowCount or size. And in caller side, we can use 
`!existingPlan.get.eq(betterPlan(existingPlan.get, newJoinPlan, conf))` to 
decide whether to update the best plan so far.
   ```
       def betterPlan(existing: JoinPlan, newPlan: JoinPlan, conf: SQLConf): 
JoinPlan = {
         // To fix the comparison order, set the one with smaller cardinality 
as the baseline.
         val (baseline, toCompare) = if (existing.planCost.card <= 
newPlan.planCost.card) {
           (existing, newPlan)
         } else {
           (newPlan, existing)
         }
   
         if (toCompare.planCost.card == 0 || toCompare.planCost.size == 0) {
           return existing
         }
   
         val relativeRows = BigDecimal(baseline.planCost.card) / 
BigDecimal(toCompare.planCost.card)
         val relativeSize = BigDecimal(baseline.planCost.size) / 
BigDecimal(toCompare.planCost.size)
         val relativeCost = relativeRows * conf.joinReorderCardWeight +
           relativeSize * (1 - conf.joinReorderCardWeight)
         if (relativeCost == 1) {
           // If they have same cost, we don't update the best plan and return 
the existing one.
           existing
         } else if (relativeCost < 1) {
           baseline
         } else {
           toCompare
         }
       }
   ```
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to