On 6 December 2013 09:21, Andres Freund <and...@2ndquadrant.com> wrote: > On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote: >> Has anyone ever thought about opportunistic ANALYZE piggy-backing on >> other full-table scans? That doesn't really help Greg, because his >> complaint is mostly that a fresh ANALYZE is too expensive, but it >> could be an interesting, albeit risky approach. > > What I've been thinking of is > > a) making it piggy back on scans vacuum is doing instead of doing > separate ones all the time (if possible, analyze needs to be more > frequent). Currently with quite some likelihood the cache will be gone > again when revisiting.
> b) make analyze incremental. In lots of bigger tables most of the table > is static - and we actually *do* know that, thanks to the vm. So keep a > rawer form of what ends in the catalogs around somewhere, chunked by the > region of the table the statistic is from. Everytime a part of the table > changes, re-sample only that part. Then recompute the aggregate. Piggy-backing sounds like a bad thing. If I run a query, I don't want to be given some extra task thanks! Especially if we might need to run data type code I'm not authroised to run, or to sample data I may not be authorised to see. The only way that could work is to kick off an autovacuum worker to run the ANALYZE as a separate process and then use synchronous scan behaviour to derive benefit indirectly. However, these things presume that we need to continue scanning most of the blocks of the table, which I don't think needs to be the case. There is a better way. Back in 2005/6, I advocated a block sampling method, as described by Chaudri et al (ref?) That has two advantages * We don't need to visit all of the blocks, reducing I/O * It would also give better analysis of clustered data (its all in that paper...) The recent patch on TABLESAMPLE contained a rewrite of the guts of ANALYZE into a generic sample scan, which would then be used as the basis for a TABLESAMPLE BERNOULLI. A TABLESAMPLE SYSTEM could use a block sample, as it does in some other DBMS (DB2, SQLServer). While I was unimpressed with the TABLESAMPLE patch, all it needs is some committer-love, so Greg... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers