Tom Lane <[EMAIL PROTECTED]> wrote: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > > In my workload, ANALYZE takes long time (1min at statistics_target = 10, > > and 5min at 100), but the updated table needs to be vacuumed every 30 > > seconds > > because seqscans run on the table repeatedly. > > There is something *seriously* wrong with that. If vacuum can complete > in under 30 seconds, how can analyze take a minute? (I'm also wondering > whether you'll still need such frantic vacuuming with HOT...)
There are two tables here: [S] A small table, that is frequently updated and seqscan-ed [L] A large table, that takes a long time to be analyzed The table [S] should be vacuumed every 30 seconds, because dead tuples affects the performance of seqscan seriously. HOT and autovacuum are very useful here *unless* long transactions begins. Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work during it. I want to use statistics_target = 100 at heart for more accurate statistics, but I'm using 10 instead because of avoiding long transactions by analyze. Basically, the above is based on avoiding needless long transactions. Aside from ANALYZE, pg_start_backup() is also a long transactional command. It takes checkpoint_timeout * checkpoint_completion_target (2.5- min.) at worst. Users could avoid their own long transactions, but have no choice but to use those provided maintenance commands. > > So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored > > by VACUUM. > > I think we need to understand what the real problem is with your test > case. This proposal seems very messy/ugly to me, and I'm unconvinced > that it solves anything. I think there are some direct or indirect solutions: 1. VACUUM removes recently dead tuples under some circumstances. For example, tuples updated twice after a long transaction begins. The oldest tuple can be seen by the old long transaction and the newest can be seen new transactions. However, the intermediate tuple is invisible all transactions. 2. ANALYZE don't disturb vacuuming of other tables. (my first proposal) We know ANALYZE don't touch other tables during sampling phases. We can treat analyzing transactions as same as PROC_IN_VACUUM xacts. The same can be said for pg_start_backup; non-transactinal starting backup command might be better. 3. Recover density of tuples; i.e, auto-CLUSTER. If the performance recovers after long transactions, the problem will not be so serious. It would be better that autovacuum invokes CLUSTER if required and we could run CLUSTER concurrently. 4. ANALYZE finishes in a short time. It is ok that VACUUM takes a long time because it is not a transaction, but ANALYZE should not. It requres cleverer statistics algorithm. Sampling factor 10 is not enough for pg_stats.n_distinct. We seems to estimate n_distinct too low for clustered (ordered) tables. There might be a matter of research in calculation of n_distinct. Also, this cannot resolve the problem in pg_start_backup. 1 or 3 might be more universal approaches, but I think 2 will be an independent improvement from them. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster