[ redirecting to -hackers ] Tomasz Ostrowski <tometzky...@ato.waw.pl> writes: > I'm routinely bulk inserting data to a PostgreSQL table and then > analyzing a single column of the table, because it contains data which > significantly changes histogram of this column values - for example > something like adding rows with "todo=true" column, when all rows before > bulk insert have "todo=false".
> But I've noticed that a standard automatic analyze, which should work in > background, never runs. I've noticed that this fast analyze of one > column resets pg_stat_user_tables(n_mod_since_analyze) counter. > I suppose this is a bug - an analyze, which does not analyze all > columns, should not reset pg_stat_user_tables(n_mod_since_analyze). What > do you think? I'm inclined to think that this is a reasonable complaint. A usage pattern like that probably hasn't come up before; but now that it has, it's clear it shouldn't block auto-analyze from happening. A cheap-and-dirty solution would be to not send a PgStat_MsgAnalyze message at all, but I think that's probably undesirable: typically it would be a good thing to accept the new n_live_tuples and n_dead_tuples estimates. What we could do instead is add a bool flag to PgStat_MsgAnalyze saying whether or not to reset changes_since_analyze. It would be safe enough to back-patch such a change, because the stats collector messages are private to the backend (in fact, really private to pgstat.c). One question here is whether there is any connection between changes_since_analyze and the tuple-count estimates that would make it improper to update the latter and not the former. I can't really see one; in fact, changes_since_analyze is pretty squishy anyway because changes made by an ANALYZE's own transaction are already accounted for in the new pg_statistic entries but will be added to changes_since_analyze at commit despite that. So I'd go ahead and update all the fields other than changes_since_analyze in this case. Another interesting consideration is that if the table's columns have different stats targets, a selective-column ANALYZE might possibly sample fewer rows than an all-columns ANALYZE would. This might mean that our new tuple-count estimates are less accurate than autoanalyze would get. So you could possibly argue that we shouldn't update the tuple-count estimates after all. But I don't particularly believe that, because it disregards the fact that the new estimates are, well, new. Even if they have more statistical risk than autoanalyze would have, they could well be better just by virtue of having seen whatever bulk updates might have happened since the last autoanalyze. So my inclination is to disregard this fine point. (Though it's interesting to ask whether Tomasz's use case includes a lower-than-default stats target for his very volatile column ...) Also, I'd be a bit inclined to disable the counter reset whenever a column list is specified, disregarding the corner case where a list is given but it includes all the table's analyzable columns. It doesn't really seem worth the effort to account for that case specially (especially after you consider that index expressions should count as analyzable columns). Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers