On Tue, Aug 23, 2016 at 5:28 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
> > On 08/22/2016 07:42 PM, Alvaro Herrera wrote:
> >> Also, if we patch it this way and somebody has a slow query because of a
> >> lot of duplicate values, it's easy to solve the problem by
> >> de-duplicating.  But with the current code, people that have the
> >> opposite problem has no way to work around it.
>
> > I certainly agree it's better when a smart user can fix his query plan
> > by deduplicating the values than when we end up generating a poor plan
> > due to assuming some users are somewhat dumb.
>
> Well, that seems to be the majority opinion, so attached is a patch
> to do it.  Do we want to sneak this into 9.6, or wait?
>
> > I wonder how expensive would it be to actually count the number of
> > distinct values - there certainly are complex data types where the
> > comparisons are fairly expensive, but I would not expect those to be
> > used in explicit VALUES lists.
>
> You'd have to sort the values before de-duping, and deal with VALUES
> expressions that aren't simple literals.  And you'd have to do it a
> lot --- by my count, get_variable_numdistinct is invoked six times
> on the Var representing the VALUES output in Jeff's example query.
> Maybe you could cache the results somewhere, but that adds even
> more complication.  Given the lack of prior complaints about this,
> it's hard to believe it's worth the trouble.
>
>
This patch still applies, and I think the argument for it is still valid.
So I'm going to make a commit-fest entry for it.  Is there additional
evidence we should gather?

Cheers,

Jeff

Reply via email to