On 16.2.2015 03:38, Andrew Gierth wrote: >>>>>> "Tomas" == Tomas Vondra <tomas.von...@2ndquadrant.com> >>>>>> writes: > > Tomas> Improving the estimates is always good, but it's not going > to Tomas> fix the case of non-NULL values (it shouldn't be all > that Tomas> difficult to create such examples with a value whose > hash starts Tomas> with a bunch of zeroes). > > Right now, I can't get it to plan such an example, because (a) if > there are no stats to work from then the planner makes fairly > pessimistic assumptions about hash bucket filling, and (b) if > there _are_ stats to work from, then a frequently-occurring > non-null value shows up as an MCV and the planner takes that into > account to calculate bucketsize. > > The problem could only be demonstrated for NULLs because the > planner was ignoring NULL for the purposes of estimating > bucketsize, which is correct for all join types except RIGHT and > FULL (which, iirc, are more recent additions to the hashjoin > repertoire).
Oh, right, the estimate fix is probably sufficient then. > > If you want to try testing it, you may find this useful: > > select i, hashint8(i) from unnest(array[1474049294, -1779024306, -1329041947]) u(i); > i | hashint8 -------------+---------- 1474049294 | 0 > -1779024306 | 0 -1329041947 | 0 (3 rows) > > (those are the only three int4 values that hash to exactly 0) > > It's probably possible to construct pathological cases by finding > a lot of different values with zeros in the high bits of the hash, > but that's something that wouldn't be likely to happen by chance. Yeah, it's probably possible, but it's admittedly considerably harder than I initially thought. For example it could be possible to create the table with no MCV values but sorted so that all the initial values have hashvalue=0, triggering (growEnabled=false). But that's rather unlikely to happen in practice I guess. A more likely failure scenario is a hash join higher up the plan, processing results of other joins etc. In that case the estimates will be tricky, although the planner chooses quite pessimistic defaults in those cases. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers