Re: [HACKERS] improving GROUP BY estimation

Mark Dilger Thu, 03 Mar 2016 11:44:18 -0800

> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov <a.korot...@postgrespro.ru> 
> wrote:
> 
> On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> 
> wrote:
> So yes, each estimator works great for exactly the opposite cases. But notice 
> that typically, the results of the new formula is much higher than the old 
> one, sometimes by two orders of magnitude (and it shouldn't be difficult to 
> construct examples of much higher differences).
> 
> The table also includes the 'average' estimator you propose, but it's rather 
> obvious that the result is always much closer to the new value, simply because
> 
>    (small number) + (huge number)
>    ------------------------------
>                   2
> 
> is always much closer to the huge number. We're usually quite happy when the 
> estimates are within the same order of magnitude, so whether it's K or K/2 
> makes pretty much no difference.
> 
> I believe that Mark means geometrical average, i.e. sqrt((small number) * 
> (huge number)).


Yes, that is what I proposed upthread.  I'm not wedded to that, though.
In general, I am with Tomas on this one, believing that his estimate
will be much better than the current estimate.  But I believe the *best*
estimate will be somewhere between his and the current one, and I'm
fishing for any decent way of coming up with a weighted average that
is closer to his than to the current, but not simply equal to his proposal.

The reason I want the formula to be closer to Tomas's than to the
current is that I think that on average, across all tables, across all
databases, in practice it will be closer to the right estimate than the
current formula.  That's just my intuition, and so I can't defend it.
But if my intuition is right, the best formula we can adopt would be one
that is moderated from his by a little bit, and in the direction of the
estimate that the current code generates.

I could easily lose this debate merely for lack of a principled basis
for saying how far toward the current estimate the new estimate should
be adjusted.  The geometric average is one suggestion, but I don't have
a principled argument for it.

Like I said above, I'm fishing for a decent formula here.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving GROUP BY estimation

Reply via email to