On Tue, Aug 23, 2016 at 5:28 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > > On 08/22/2016 07:42 PM, Alvaro Herrera wrote: > >> Also, if we patch it this way and somebody has a slow query because of a > >> lot of duplicate values, it's easy to solve the problem by > >> de-duplicating. But with the current code, people that have the > >> opposite problem has no way to work around it. > > > I certainly agree it's better when a smart user can fix his query plan > > by deduplicating the values than when we end up generating a poor plan > > due to assuming some users are somewhat dumb. > > Well, that seems to be the majority opinion, so attached is a patch > to do it. Do we want to sneak this into 9.6, or wait? > > > I wonder how expensive would it be to actually count the number of > > distinct values - there certainly are complex data types where the > > comparisons are fairly expensive, but I would not expect those to be > > used in explicit VALUES lists. > > You'd have to sort the values before de-duping, and deal with VALUES > expressions that aren't simple literals. And you'd have to do it a > lot --- by my count, get_variable_numdistinct is invoked six times > on the Var representing the VALUES output in Jeff's example query. > Maybe you could cache the results somewhere, but that adds even > more complication. Given the lack of prior complaints about this, > it's hard to believe it's worth the trouble. > > This patch still applies, and I think the argument for it is still valid. So I'm going to make a commit-fest entry for it. Is there additional evidence we should gather? Cheers, Jeff