On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra <tomas.von...@2ndquadrant.com > wrote:
> Hi, > > I'm currently running some tests on a 3TB TPC-H data set, and I tripped > over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which > somehow illustrates the importance of the memory-bounded hashagg patch Jeff > Davis is working on). > > The problem is Q18, particularly this simple subquery: > > select l_orderkey > from lineitem > group by l_orderkey > having sum(l_quantity) > 313; > > which is planned like this: > > QUERY PLAN > > --------------------------------------------------------------------------------- > HashAggregate (cost=598510163.92..598515393.93 rows=418401 width=12) > Group Key: l_orderkey > Filter: (sum(l_quantity) > '313'::double precision) > -> Seq Scan on lineitem (cost=0.00..508509923.28 rows=18000048128 > width=12) > (4 rows) > > but sadly, in reality the l_orderkey cardinality looks like this: > > tpch=# select count(distinct l_orderkey) from lineitem; > count > ------------ > 4500000000 > (1 row) > > That's a helluva difference - not the usual one or two orders of > magnitude, but 10000x underestimate. > > The usual thing to do in this case is increasing statistics target, and > while this improves the estimate, the improvement is rather small: > > statistics target estimate difference > -------------------------------------------------- > 100 429491 10000 > 1000 4240418 1000 > 10000 42913759 100 > > I find the pattern rather strange - every time the statistics target > increases 10x, the difference decreases 10x - maybe that's natural, but the > perfect proportionality is suspicious IMHO. > > Also, this is a quite large dataset - the table has ~18 billion rows, and > even with target=10000 we're sampling only 3M rows, which is ~0.02%. That's > a tiny sample, so inaccuracy is naturally expected, but OTOH the TPC-H > dataset is damn uniform - there's pretty much no skew in the distributions > AFAIK. So I'd expect a slightly better result. > > With target=10000 the plan switches to GroupAggregate, because the > estimate gets sufficient to exceed work_mem (2GB). But it's still way off, > and it's mostly just a lucky coincidence. > > So I'm wondering if there's some bug because of the dataset size (an > integer overflow or something like), so I added a bunch of logging into the > estimator, logging all the parameters computed: > > target=100 (samplerows=30000) > ----------------------------- > WARNING: attnum=1 attname=l_orderkey f1=27976 ndistinct=28977 > nmultiple=1001 toowide_cnt=0 d=28977 numer=869310000.000000 > denom=2024.046627 stadistinct=429491.094029 > WARNING: ndistinct estimate attnum=1 attname=l_orderkey current=429491.09 > adaptive=443730.00 > > target=1000 (samplerows=300000) > ------------------------------- > WARNING: attnum=1 attname=l_orderkey f1=279513 ndistinct=289644 > nmultiple=10131 toowide_cnt=0 d=289644 numer=86893200000.000000 > denom=20491.658538 stadistinct=4240418.111618 > WARNING: ndistinct estimate attnum=1 attname=l_orderkey > current=4240418.11 adaptive=4375171.00 > > target=10000 (samplerows=3000000) > --------------------------------- > WARNING: attnum=1 attname=l_orderkey f1=2797888 ndistinct=2897799 > nmultiple=99911 toowide_cnt=0 d=2897799 numer=8693397000000.000000 > denom=202578.313396 stadistinct=42913759.396282 > WARNING: ndistinct estimate attnum=1 attname=l_orderkey > current=42913759.40 adaptive=44449882.00 > > It's totalrows=18000049031 in all cases. The logs also show estimate > produced by the adaptive estimate (discussed in a separate thread), but > apparently that does not change the estimates much :-( > > Any ideas? > > -- > Tomas Vondra http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > While better sample/stats is important for choosing a good plan, in this query, hash agg is really the right plan. If a sort agg is chosen, the performance will be really really bad. The patch that Jeff is working on is critical for a decent TPCH number (unless you have unlimited amount of memory). Thanks,