dh-cloud opened a new pull request, #60488: URL: https://github.com/apache/doris/pull/60488
The COALESCE function did not generate proper statistical estimates, leading to inaccurate query planning. When COALESCE was used in queries, the optimizer could not accurately estimate: - The number of NULL values - The number of distinct values (NDV) - Min/max value ranges - Row count estimates This could result in suboptimal execution plans and poor query performance. ### Solution Added support for COALESCE function statistics estimation in the Nereids optimizer's ExpressionEstimation class. The implementation: 1. NULL Value Estimation: Calculates the probability that all COALESCE arguments are NULL by multiplying the NULL probabilities of each argument. The final NULL count is the row count multiplied by this probability. 2. NDV Estimation: Uses the maximum NDV of all COALESCE arguments as the estimated NDV, considering that COALESCE returns the first non-NULL value from any argument. 3. Min/Max Value Range: Considers the full range of all COALESCE arguments by taking the minimum of all minimum values and the maximum of all maximum values. ### Changes - Added `Coalesce` import in `ExpressionEstimation.java` - Implemented `visitCoalesce()` method to provide accurate statistics estimation for COALESCE expressions ### Impact This change improves query planning accuracy for queries using COALESCE function, leading to better execution plans and improved query performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
