Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Mon, 23 Apr 2018 20:09:21 -0500 Justin Pryzby wrote: > Just want to add for the archive that I happened to run across what appears to > be a 7-year old report of (I think) both of these vacuum/analyze bugs: > > Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. > > > Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means > > > > https://www.postgresql.org/message-id/flat/6373488.29.1323717222303.JavaMail.root%40store1.zcs.ext.wpsrv.net#6373488.29.1323717222303.javamail.r...@store1.zcs.ext.wpsrv.net Nice find. It does look like both, although, since the info is not in the thread it is not certain that relpages was large enough to hit the analyze issue. Unless that was with the old default statistics target in which case it would be pretty easy to hit. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Tue, 13 Mar 2018 11:29:03 -0400 Tom Lane wrote: > David Gould writes: > > I have attached the patch we are currently using. It applies to 9.6.8. I > > have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, > > and presumably head but I can update it if there is any interest. > > > The patch has three main features: > > - Impose an ordering on the autovacuum workers worklist to avoid > > the need for rechecking statistics to skip already vacuumed tables. > > - Reduce the frequency of statistics refreshes > > - Remove the AutovacuumScheduleLock > > As per the earlier thread, the first aspect of that needs more work to > not get stuck when the worklist has long tasks near the end. I don't > think you're going to get away with ignoring that concern. I agree that the concern needs to be addressed. The other concern is that sorting by oid is fairly arbitrary, the essential part is that there is some determinate order. > Perhaps we could sort the worklist by decreasing table size? That's not > an infallible guide to the amount of time that a worker will need to > spend, but it's sure safer than sorting by OID. That is better. I'll modify it to also prioritize tables that have relpages and reltuples = 0 which usually means the table has no stats at all. Maybe use oid to break ties. > Alternatively, if we decrease the frequency of stats refreshes, how > much do we even need to worry about reordering the worklist? The stats refresh in the current scheme is needed to make sure that two different workers don't vacuum the same table in close succession. It doesn't actually work, and it costs the earth. The patch imposes an ordering to prevent workers trying to claim recently vacuumed tables. This removes the need for the stats refresh. > In any case, I doubt anyone will have any appetite for back-patching > such a change. I'd recommend that you clean up your patch and rebase > to HEAD, then submit it into the September commitfest (either on a > new thread or a continuation of the old #13750 thread, not this one). I had in mind to make a more comprehensive patch to try to make utovacuum more responsive when there are lots of tables, but was a bit shy of the reception. I'll try again with this one (in a new thread) based on the suggestions. Thanks! -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
theory and am confident that there really is no way to trick it. Basically if there are enough pages that are different to affect the overall density, say 10% empty or so, there is no way a random sample larger than a few hundred probes can miss them no matter how big the table is. If there are few enough pages to "hide" from the sample, then they are so few they don't matter anyway. After all this my vote is for back patching too. I don't see any case where the patched analyze is or could be worse than what we are doing. I'm happy to provide my test cases if anyone is interested. Thanks -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Mon, 12 Mar 2018 10:43:36 -0400 Tom Lane wrote: > Re-reading that thread, it seems like we should have applied Jeff's > initial trivial patch[1] (to not hold across > table_recheck_autovac) rather than waiting around for a super duper > improvement to get agreed on. I'm a bit tempted to go do that; > if nothing else, it seems simple enough to back-patch, unlike most > of the rest of what was discussed. This will help. In my testing it reduced the lock contention considerably. I think a lot of users with lots of tables will benefit from it. However it does nothing about the bigger issue which is that autovacuum flaps the stats temp files. I have attached the patch we are currently using. It applies to 9.6.8. I have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, and presumably head but I can update it if there is any interest. The patch has three main features: - Impose an ordering on the autovacuum workers worklist to avoid the need for rechecking statistics to skip already vacuumed tables. - Reduce the frequency of statistics refreshes - Remove the AutovacuumScheduleLock The patch is aware of the shared relations fix. It is subject to the problem Alvero noted in the original discussion: if the last table to be autovacuumed is large new workers will exit instead of choosing an earlier table. Not sure this is really a big problem in practice, but I agree it is a corner case. My patch does not do what I believe really needs doing: Schedule autovacuum more intelligently. Currently autovacuum collects all the tables in the physical order of pg_class and visits them one by one. With many tables it can take too long to respond to urgent vacuum needs, eg heavily updated tables or wraparound. There is a patch in the current commit fest that allows prioritizing tables manually. I don't like that idea much, but it does recognize that the current scheme is not adequate for databases with large numbers of tables. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 6720248675..3bb06a2337 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -196,6 +196,26 @@ typedef struct autovac_table char *at_datname; } autovac_table; +/* + * This is used to collect the oids of tables that may need vacuuming. + * Once they are all collected it is sorted by oid and then stepped through. + * By establishing an order we reduce contention between workers for tables. + */ +typedef struct aw_workitem +{ + Oid relid; + bool sharedrel; +} aw_workitem; + +/* Extendable array of items. */ +typedef struct aw_worklist +{ + aw_workitem *items; + int maxitems, +numitems; +} aw_worklist; + + /*- * This struct holds information about a single worker's whereabouts. We keep * an array of these in shared memory, sized according to @@ -1069,6 +1089,76 @@ db_comparator(const void *a, const void *b) return (((const avl_dbase *) a)->adl_score < ((const avl_dbase *) b)->adl_score) ? 1 : -1; } + +/* + * Utilities to manage worklist of tables to vacuum. + */ + +static void aw_worklist_init(aw_worklist *w) +{ + w->maxitems = 32; /* not really a magic number, we have to start somewhere */ + w->numitems = 0; + w->items = palloc(w->maxitems * sizeof(aw_workitem)); +} + +static void aw_worklist_clear(aw_worklist *w) +{ + w->maxitems = 0; + w->numitems = 0; + pfree(w->items); + w->items = NULL; +} + +/* Append an item to the worklist */ +static aw_workitem *aw_worklist_add(aw_worklist *w, Oid oid, bool isshared) +{ + aw_workitem *newitem; + + if (w->numitems >= w->maxitems) + { + /* grow the array */ + w->maxitems = w->maxitems * 1.5; + w->items = (aw_workitem *) repalloc(w->items, + w->maxitems * sizeof(aw_workitem)); + } + newitem = w->items + w->numitems++; + newitem->relid = oid, + newitem->sharedrel = isshared; + return newitem; +} + +/* Release extra memory that might have been allocated during growth */ +static void aw_worklist_trim(aw_worklist *w) +{ + if (w->maxitems >= w->numitems) + { + w->maxitems = w->numitems; + w->items = (aw_workitem *) repalloc(w->items, + w->numitems * sizeof(aw_workitem)); + } +} + +/* qsort comparator for aw_workitem items */ +static int +aw_workitem_cmp(const void *p1, const void *p2) +{ + const aw_workitem *v1 = ((const aw_workitem *) p1); + const aw_workitem *v2 = ((const aw_workitem *) p2); + + if (v1->relid < v2->relid) + return -1; + if (v1->relid > v2->relid) + return 1; + return 0; +} + +static void aw_worklist_sort(aw_worklist *w) +{ + if (w->numitems > 1) + qsort(w->items, w->numitems, sizeof(aw_workitem), aw_workitem_cmp); +} + + /
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Wed, 7 Mar 2018 21:39:08 -0800 Jeff Janes wrote: > As for preventing it in the first place, based on your description of your > hardware and operations, I was going to say you need to increase the max > number of autovac workers, but then I remembered you from "Autovacuum slows > down with large numbers of tables. More workers makes it slower" ( > https://www.postgresql.org/message-id/20151030133252.3033.4249%40wrigleys.postgresql.org). > So you are probably still suffering from that? Your patch from then seemed > to be pretty invasive and so controversial. We have been building from source using that patch for the worker contention since then. It's very effective, there is no way we could have continued to rely on autovacuum without it. It's sort of a nuisance to keep updating it for each point release that touches autovacuum, but here we are. The current patch is motivated by the fact that even with effective workers we still regularly find tables with inflated reltuples. I have some theories about why, but not really proof. Mainly variants on "all the vacuum workers were busy making their way through a list of 100,000 tables and did not get back to the problem table before it became a problem." I do have a design in mind for a larger more principled patch that fixes the same issue and some others too, but given the reaction to the earlier one I hesitate to spend a lot of time on it. I'd be happy to discuss a way to try to move forward though if any one is interested. Your patch helped, but mainly was targeted at the lock contention part of the problem. The other part of the problem was that autovacuum workers will force a rewrite of the stats file every time they try to choose a new table to work on. With large numbers of tables and many autovacuum workers this is a significant extra workload. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Tue, 06 Mar 2018 11:16:04 -0500 Tom Lane wrote: > so that we can decide whether this bug is bad enough to justify > back-patching a behavioral change. I remain concerned that the proposed > fix is too simplistic and will have some unforeseen side-effects, so > I'd really rather just put it in HEAD and let it go through a beta test > cycle before it gets loosed on the world. It happens to us fairly regularly and causes lots of problems. However, I'm agreeable to putting it in head for now, my client can build from source to pick up this patch until that ships, but doesn't want to maintain their own fork forever. That said, if it does get though beta I'd hope we could back-patch at that time. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > I don't see how it could have caused the problem in the first place. In > your demonstration case, you had to turn off autovac in order to get it to > happen, and then when autovac is turned back on, it is all primed for an > autovac to launch, go through, touch almost all of the pages, and fix it > for you. How did your original table get into a state where this wouldn't > happen? One more way for this to happen, vacuum was including the dead tuples in the estimate in addition to the live tuples. This is a separate bug that tends to aggravate the one I'm trying to fix. See the thread re BUG #15005 at: https://www.postgresql.org/message-id/16db4468-edfa-830a-f921-39a50498e77e%402ndquadrant.com > It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly > reltuples means. VACUUM seems to be thinking that > > reltuples = live + dead > > while ANALYZE apparently believes that > > reltuples = live There is a patch for this one from Tomas Vondra/Tom Lane that I hope it will land in the next set of releases. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > In any event, I agree with your analysis that ANALYZE should set the number > of tuples from scratch. After all, it sets the other estimates, such as > MCV, from scratch, and those are much more fragile to sampling than just > the raw number of tuples are. But if the default target is set to 1, that > would scan only 300 pages. I think that that is a little low of a sample > size to base an estimate on, but it isn't clear to that using 300 pages > plus whacking them around with an exponential averaging is really going to > be much better. And if you set your default target to 1, that is > more-or-less what you signed up for. > > It is little weird to have VACUUM incrementally update and then ANALYZE > compute from scratch and discard the previous value, but no weirder than > what we currently do of having ANALYZE incrementally update despite that it > is specifically designed to representatively sample the entire table. So I > don't think we need to decide what to do about VACUUM before we can do > something about ANALYZE. Thanks. I was going to add the point about trusting ANALYZE for the statistics but not for reltuples, but you beat me to it. 300 samples would be on the small side, as you say that's asking for it. Even the old default target of 10 gives 3000 samples which is probably plenty. I think the method VACUUM uses is appropriate and probably correct for VACUUM. But not for ANALYZE. Which is actually hinted at in the original comments but not in the code. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > On Wed, Jan 17, 2018 at 4:49 PM, David Gould wrote: > > # analyze verbose pg_attribute; > > INFO: "pg_attribute": scanned 3 of 24519424 pages, containing 6475 > > live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total > > rows. > > I can see how this issue would prevent ANALYZE from fixing the problem, but > I don't see how it could have caused the problem in the first place. In > your demonstration case, you had to turn off autovac in order to get it to > happen, and then when autovac is turned back on, it is all primed for an > autovac to launch, go through, touch almost all of the pages, and fix it > for you. How did your original table get into a state where this wouldn't > happen? > > Maybe a well-timed crash caused n_dead_tup to get reset to zero and that is > why autovac is not kicking in? What are the pg_stat_user_table number and > the state of the visibility map for your massively bloated table, if you > still have them? We see this sort of thing pretty routinely on more than just catalogs, but catalogs are where it really hurts. These systems are 40 cores/80 threads, 1 TB memory, Fusion IO. Databases are 5 to 10 TB with 100,000 to 200,000 tables. Tables are updated in batches every few minutes 100 threads at a time. There are also some long running queries that don't help. Due to the large number of tables and high rate of mutation it can take a long time between visits from autovacuum, especially since autovacuum builds a list of pending work and then processes it to completion so new tables in need of vacuum can't even be seen until all the old work is done. For what it is worth, streaming replication doesn't work either as the single threaded recovery can't keep up with the 80 thread mutator. We tried relying on external scripts to address the most bloated tables, but those also depended on reltuples to detect bloat so they missed out a lot. For a long time we simply had recurring crisis. Once I figured out that ANALYZE could not set reltuples effectively we worked around it by running ANALYZE VERBOSE on all the large tables and parsing the notices to calculate the rowcount the same way as in the patch. This works, but is a nuisance. The main pain points are that when reltuples gets inflated there is no way to fix it, auto vacuum stops looking at the table and hand run ANALYZE can't reset the reltuples. The only cure is VACUUM FULL, but that is not really practical without unacceptable amounts of downtime. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Fri, 02 Mar 2018 17:17:29 -0500 Tom Lane wrote: > But by the same token, analyze only looked at 0.0006 of the pages. It's > nice that for you, that's enough to get a robust estimate of the density > everywhere; but I have a nasty feeling that that won't hold good for > everybody. My grasp of statistics is somewhat weak, so please inform me if I've got this wrong, but every time I've looked into it I've found that one can get pretty good accuracy and confidence with fairly small samples. Typically 1000 samples will serve no matter the population size if the desired margin of error is 5%. Even with 99% confidence and a 1% margin of error it takes less than 20,000 samples. See the table at: http://www.research-advisors.com/tools/SampleSize.htm Since we have by default 3 sample pages and since ANALYZE takes some trouble to get a random sample I think we really can rely on the results of extrapolating reltuples from analyze. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Thu, 01 Mar 2018 18:49:20 -0500 Tom Lane wrote: > The sticking point in my mind right now is, if we do that, what to do with > VACUUM's estimates. If you believe the argument in the PDF that we'll > necessarily overshoot reltuples in the face of declining true density, > then it seems like that argument applies to VACUUM as well. However, > VACUUM has the issue that we should *not* believe that it looked at a > random sample of pages. Maybe the fact that it looks exactly at the > changed pages causes it to see something less than the overall density, > cancelling out the problem, but that seems kinda optimistic. For what it's worth, I think the current estimate formula for VACUUM is pretty reasonable. Consider a table T with N rows and P pages clustered on serial key k. Assume reltuples is initially correct. Then after: delete from T where k < 0.2 * (select max k from T); vacuum T; Vacuum will touch the first 20% of the pages due to visibility map, the sample will have 0 live rows, scanned pages will be 0.2 * P. Then according to the current code: old_density = old_rel_tuples / old_rel_pages; new_density = scanned_tuples / scanned_pages; multiplier = (double) scanned_pages / (double) total_pages; updated_density = old_density + (new_density - old_density) * multiplier; return floor(updated_density * total_pages + 0.5); the new density will be: N/P + (0/0.2*P - N/P) * 0.2 = N/P - N/P * 0.2 = 0.8 * N/P New reltuples estimate will be 0.8 * old_reltuples. Which is what we wanted. If we evenly distribute the deletes across the table: delete from T where rand() < 0.2; Then vacuum will scan all the pages, the sample will have 0.8 * N live rows, scanned pages will be 1.0 * P. The new density will be N/P + (0.8 * N/1.0*P - N/P) * 1.0 = N/P + (0.8 N/P - N/P) = N/P - 0.2 * N/P = 0.8 * N/P Which again gives new reltuples as 0.8 * old_reltuples and is again correct. I believe that given a good initial estimate of reltuples and relpages and assuming that the pages vacuum does not scan do not change density then the vacuum calculation does the right thing. However, for ANALYZE the case is different. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Fri, 2 Mar 2018 18:47:44 +0300 Alexander Kuzmenkov wrote: > The calculation I made for the first step applies to the next steps too, > with minor differences. So, the estimate increases at each step. Just > out of interest, I plotted the reltuples for 60 steps, and it doesn't > look like it's going to converge anytime soon (see attached). > Looking at the formula, this overshoot term is created when we multiply > the old density by the new number of pages. I'm not sure how to fix > this. I think we could average the number of tuples, not the densities. > The attached patch demonstrates what I mean. I'm confused at this point, I provided a patch that addresses this and a test case. We seem to be discussing everything as if we first noticed the issue. Have you reviewed the patch and and attached analysis and tested it? Please commment on that? Thanks. Also, here is a datapoint that I found just this morning on a clients production system: INFO: "staging_xyz": scanned 3 of pages, containing 63592 live rows and 964346 dead rows; 3 rows in sample, 1959918155 estimated total rows # select (5953.0/3*63592)::int as nrows; nrows --- 105988686 This tables reltuples is 18 times the actual row count. It will never converge because with 5953 pages analyze can only adjust reltuples by 0.0006 each time. It will also almost never get vacuumed because the autovacuum threshold of 0.2 * 1959918155 = 391983631 about 3.7 times larger than the actual row count. The submitted patch is makes analyze effective in setting reltuples to within a few percent of the count(*) value. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Thu, 1 Mar 2018 17:25:09 +0300 Alexander Kuzmenkov wrote: > Well, that sounds reasonable. But the problem with the moving average > calculation remains. Suppose you run vacuum and not analyze. If the > updates are random enough, vacuum won't be able to reclaim all the > pages, so the number of pages will grow. Again, we'll have the same > thing where the number of pages grows, the real number of live tuples > stays constant, and the estimated reltuples grows after each vacuum run. I agree VACUUM's moving average may be imperfect, but the rationale makes sense and I don't have a plan to improve it now. This patch only intends to improve the behavior of ANALYZE by using the estimated row density time relpages to get reltuples. It does not change VACUUM. The problem with the moving average for ANALYZE is that it prevents ANALYZE from changing the reltuples estimate enough for large tables. Consider this based on the test setup from the patch: create table big as select id*p, ten, hun, thou, tenk, lahk, meg, padding from reltuples_test, generate_series(0,9) g(p); -- SELECT 1 alter table big set (autovacuum_enabled=false); select count(*) from big; --count -- 1 select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 0 |0 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 3 of 1538462 pages, containing 195 live rows and 0 dead rows; --3 rows in sample, 10030 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 10032 | 1538462 delete from big where ten > 1; -- DELETE 8000 select count(*) from big; -- count -- 2000 select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 10032 | 1538462 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 3 of 1538462 pages, containing 388775 live rows and 1561225 dead rows; --3 rows in sample, 98438807 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 98438808 | 1538462 select count(*) from big; -- count -- 2000 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 3 of 1538462 pages, containing 390885 live rows and 1559115 dead rows; -- 3 rows in sample, 96910137 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 96910136 | 1538462 Table big has 1.5 million pages. ANALYZE samples 30 thousand. No matter how many rows we change in T, ANALYZE can only change the reltuples estimate by old_estimate + new_estimate * (3/1538462), ie about 1.9 percent. With the patch on this same table we get: select count(*) from big; -- count -- 2000 select reltuples::int, relpages from pg_class where relname = 'big'; reltuples | relpages 96910136 | 1538462 analyze verbose big; -- INFO: analyzing "public.big" -- INFO: "big": scanned 3 of 1538462 pages, containing 390745 live rows and 1559255 dead rows; --3 rows in sample, 20038211 estimated total rows select reltuples::int, relpages from pg_class where relname = 'big'; -- reltuples | relpages -- 20038212 | 1538462 -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
On Wed, 28 Feb 2018 15:55:19 +0300 Alexander Kuzmenkov wrote: > Hi David, > > I was able to reproduce the problem using your script. > analyze_counts.awk is missing, though. Attached now I hope. I think I also added it to the commitfest page. > The idea of using the result of ANALYZE as-is, without additional > averaging, was discussed when vac_estimate_reltuples() was introduced > originally. Ultimately, it was decided not to do so. You can find the > discussion in this thread: > https://www.postgresql.org/message-id/flat/BANLkTinL6QuAm_Xf8teRZboG2Mdy3dR_vw%40mail.gmail.com#banlktinl6quam_xf8terzbog2mdy3dr...@mail.gmail.com Well that was a long discussion. I'm not sure I would agree that there was a firm conclusion on what to do about ANALYZE results. There was some recognition that the case of ANALYZE is different than VACUUM and that is reflected in the original code comments too. However the actual code ended up being the same for both ANALYZE and VACUUM. This patch is about that. See messages: https://www.postgresql.org/message-id/BANLkTimVhdO_bKQagRsH0OLp7MxgJZDryg%40mail.gmail.com https://www.postgresql.org/message-id/BANLkTimaDj950K-298JW09RrmG0eJ_C%3DqQ%40mail.gmail.com https://www.postgresql.org/message-id/28116.1306609295%40sss.pgh.pa.us > The core problem here seems to be that this calculation of moving > average does not converge in your scenario. It can be shown that when > the number of live tuples is constant and the number of pages grows, the > estimated number of tuples will increase at each step. Do you think we > can use some other formula that would converge in this scenario, but > still filter the noise in ANALYZE results? I couldn't think of one yet. Besides the test data generated with the script I have parsed the analyze verbose output for several large production systems running complex applications and have found that for tables larger than the statistics sample size (300*default_statistics_target) the row count you can caculate from (pages/sample_pages) * live_rows is pretty accurate, within a few percent of the value from count(*). In theory the sample pages analyze uses should represent the whole table fairly well. We rely on this to generate pg_statistic and it is a key input to the planner. Why should we not believe in it as much only for reltuples? If the analyze sampling does not work, the fix would be to improve that, not to disregard it piecemeal. My motivation is that I have seen large systems fighting mysterious run-away bloat for years no matter how aggressively autovacuum is tuned. The fact that an inflated reltuples can cause autovacuum to simply ignore tables forever seems worth fixing. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. analyze_counts.awk Description: application/awk
[patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Please add the attached patch and this discussion to the open commit fest. The original bugs thread is here: 2018011254.1408.8...@wrigleys.postgresql.org. Bug reference: 15005 Logged by: David Gould Email address: da...@sonic.net PostgreSQL version: 10.1 and earlier Operating system: Linux Description: ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual row counts for tables that are larger than the default_statistics_target. Example from one of a clients production instances: # analyze verbose pg_attribute; INFO: analyzing "pg_catalog.pg_attribute" INFO: "pg_attribute": scanned 3 of 24519424 pages, containing 6475 live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total rows. This is a large complex database - pg_attribute actually has about five million rows and needs about one hundred thouand pages. However it has become extremely bloated and is taking 25 million pages (192GB!), about 250 times too much. This happened despite aggressive autovacuum settings and a periodic bloat monitoring script. Since pg_class.reltuples was 800 million, the bloat monitoring script did not detect that this table was bloated and autovacuum did not think it needed vacuuming. When reltuples is very large compared to the actual row count it causes a number of problems: - Bad input to the query planner. - Prevents autovacuum from processing large bloated tables because autovacuum_scale_factor * reltuples is large enough the threshold is rarely reached. - Decieves bloat checking tools that rely on the relationship of relpages to reltuples*average_row_size. I've tracked down how this happens and created a reproduction script and a patch. Attached: - analyze_reltuples_bug-v1.patch Patch against master - README.txt Instructions for testing - analyze_reltuples_bug.sqlReproduction script - analyze_counts.awk Helper for viewing results of test - test_standard.txtTest output for unpatched postgresql 10.1 - test_patched.txt Test output with patch The patch applies cleanly, with some offsets, to 9.4.15, 9.5.10, 9.6.6 and 10.1. Note that this is not the same as the reltuples calculation bug discussed in the thread at 16db4468-edfa-830a-f921-39a50498e...@2ndquadrant.com. That one is mainly concerned with vacuum, this with analyze. The two bugs do amplify each other though. Analysis: - Analyze and vacuum calculate the new value for pg_class.reltuples in vacuum.c:vac_estimate_reltuples(): old_density = old_rel_tuples / old_rel_pages; new_density = scanned_tuples / scanned_pages; multiplier = (double) scanned_pages / (double) total_pages; updated_density = old_density + (new_density - old_density) * multiplier; return floor(updated_density * total_pages + 0.5); The comments talk about the difference between VACUUM and ANALYZE and explain that VACUUM probably only scanned changed pages so the density of the scanned pages is not representative of the rest of the unchanged table. Hence the new overall density of the table should be adjusted proportionaly to the scanned pages vs total pages. Which makes sense. However despite the comment noteing that ANALYZE and VACUUM are different, the code actually does the same calculation for both. The problem is that it dilutes the impact of ANALYZE on reltuples for large tables: - For a table of 300 pages an analyze can only change the reltuples value by 1%. - When combined with changes in relpages due to bloat the new computed reltuples can end up far from reality. Reproducing the reltuples analyze estimate bug. --- The script "reltuples_analyze_bug.sql" creates a table that is large compared to the analyze sample size and then repeatedly updates about 10% of it followed by an analyze each iteration. The bug is that the calculation analyze uses to update pg_class.reltuples will tend to increase each time even though the actual rowcount does not change. To run: Given a postgresql 10.x server with >= 1GB of shared buffers: createdb test psql --no-psqlrc -f analyze_reltuples_bug.sql test > test_standard.out 2>&1 awk -f analyze_counts.awk test_standard.out To verify the fix, restart postgres with a patched binary and repeat the above. Here are the results with an unpatched server: After 10 interations of: update 10% of rows; analyze reltuples has almost doubled. / estimated rows // pages / /sampled rows/ relname current proposedtotal scannedlivedead reltuples_test 1001 1055 153847 3000 195000 0 reltuples_test 10981367 9951346 169231 3000 176410 18590 reltuples_test 11948112 10039979 184615 3000 163150 31850 reltuples_test 12900718 10070666 20 3000 151060 43940 reltuples_t
Re: PL/Python SD dict wiped?
On Mon, 8 Jan 2018 14:24:42 -0500 Peter Eisentraut wrote: > On 1/8/18 11:59, Ken Huffman wrote: > > I'm fine with per-session initializing of prepared statements, but is > > there PL/Python mechanism that spans sessions to minimize ConfigParser > > calls? > > Nothing built-in. You would have to use a separate external storage > mechanism of some kind. It could be a file, separately managed shared > memory, or perhaps something like memcached. Or, given that you have a postgres connection, you could use a table. -dg -- David Gouldda...@sonic.net If simplicity worked, the world would be overrun with insects.