> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov <a.korot...@postgrespro.ru> > wrote: > > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: > So yes, each estimator works great for exactly the opposite cases. But notice > that typically, the results of the new formula is much higher than the old > one, sometimes by two orders of magnitude (and it shouldn't be difficult to > construct examples of much higher differences). > > The table also includes the 'average' estimator you propose, but it's rather > obvious that the result is always much closer to the new value, simply because > > (small number) + (huge number) > ------------------------------ > 2 > > is always much closer to the huge number. We're usually quite happy when the > estimates are within the same order of magnitude, so whether it's K or K/2 > makes pretty much no difference. > > I believe that Mark means geometrical average, i.e. sqrt((small number) * > (huge number)).
Yes, that is what I proposed upthread. I'm not wedded to that, though. In general, I am with Tomas on this one, believing that his estimate will be much better than the current estimate. But I believe the *best* estimate will be somewhere between his and the current one, and I'm fishing for any decent way of coming up with a weighted average that is closer to his than to the current, but not simply equal to his proposal. The reason I want the formula to be closer to Tomas's than to the current is that I think that on average, across all tables, across all databases, in practice it will be closer to the right estimate than the current formula. That's just my intuition, and so I can't defend it. But if my intuition is right, the best formula we can adopt would be one that is moderated from his by a little bit, and in the direction of the estimate that the current code generates. I could easily lose this debate merely for lack of a principled basis for saying how far toward the current estimate the new estimate should be adjusted. The geometric average is one suggestion, but I don't have a principled argument for it. Like I said above, I'm fishing for a decent formula here. Mark Dilger -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers