wzhfy edited a comment on pull request #30965: URL: https://github.com/apache/spark/pull/30965#issuecomment-773864841
@tanelk Hi, sorry to see this so late. IIRC the reason to use a relative value for rowCount and size, is to normalize them in a similar scale while comparing cost. Otherwise, one (size) may overwhelm the other (rowCount), then the weight and cost function become meaningless. To resolve stability problem, can we have a `betterPlan()` function instead of current `betterThan()`? Inside that plan, we can fix the comparison order based on rowCount or size. And in caller side, we can use `!existingPlan.get.eq(betterPlan(existingPlan.get, newJoinPlan, conf))` to decide whether to update the best plan so far. ``` def betterPlan(existing: JoinPlan, newPlan: JoinPlan, conf: SQLConf): JoinPlan = { // To fix the comparison order, set the one with smaller cardinality as the baseline. val (baseline, toCompare) = if (existing.planCost.card <= newPlan.planCost.card) { (existing, newPlan) } else { (newPlan, existing) } if (toCompare.planCost.card == 0 || toCompare.planCost.size == 0) { return existing } val relativeRows = BigDecimal(baseline.planCost.card) / BigDecimal(toCompare.planCost.card) val relativeSize = BigDecimal(baseline.planCost.size) / BigDecimal(toCompare.planCost.size) val relativeCost = relativeRows * conf.joinReorderCardWeight + relativeSize * (1 - conf.joinReorderCardWeight) if (relativeCost == 1) { // If they have same cost, we don't update the best plan and return the existing one. existing } else if (relativeCost < 1) { baseline } else { toCompare } } ``` What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org