konstantinb commented on PR #6308: URL: https://github.com/apache/hive/pull/6308#issuecomment-4051568941
> From my perspective the improvement should land in `PessimisticStatCombiner` since `COALESCE`, and `IF` are also directly affected. > > In addition, I feel that `PessimisticStatCombiner` is not working as expected when it comes to NDV. Taking the `max(NDV(branch_i))` is not really a pessimistic estimate. > > A better formula for the estimation of NDV in CASE/Branch statements is: > > ``` > min(rows, Sum NDV(branch_i)) > ``` > > This is formula is used by some other DBMS systems and should also handle this use-case with constant branches. We could also opt for more complex variants of the formula above taking into account the selectivity of the `WHEN` condition but we can leave that for future work. > > All in all, I don't think we need to add additional bookeeping in the function itself but just modify the formula in `PessimisticStatCombiner`. @zabetak thank you very much for your comment. Changing PessimisticStatCombiner was my first idea too, however, the "truly pessimistic" estimate logic I used opened quite a large can of worms, broke many existing tests and has quickly ballooned to a rather big PR https://github.com/apache/hive/pull/6244 (I admit that it did not, however, use your formula "min(rows, Sum NDV(branch_i))") While COALESCE and IF are, technically, affected in the same way, their "mis-estimations" are usually limited to 2x. The CASE with multiple constants was the most severe, and the code change of this PR has dramatically improved the situation in a private Hive implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
