Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On 03/22/2018 08:51 PM, Tom Lane wrote: > I wrote: >> Tomas Vondra writes: >>> The 0002 part is the main part, unifying the definition of reltuples on >>> three main places: > >> On to this part ... > > I've pushed 0002 with several corrections: it did not seem to me that > you'd correctly propagated what ANALYZE is doing into CREATE INDEX or > pgstattuple. Also, since one of the things VACUUM does with num_tuples > is to report it as "the total number of non-removable tuples", I thought > we'd better leave that calculation alone. I made the added code count > live tuples in a new variable live_tuples, instead. > OK, makes sense. Thanks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
I wrote: > Tomas Vondra writes: >> The 0002 part is the main part, unifying the definition of reltuples on >> three main places: > On to this part ... I've pushed 0002 with several corrections: it did not seem to me that you'd correctly propagated what ANALYZE is doing into CREATE INDEX or pgstattuple. Also, since one of the things VACUUM does with num_tuples is to report it as "the total number of non-removable tuples", I thought we'd better leave that calculation alone. I made the added code count live tuples in a new variable live_tuples, instead. regards, tom lane
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Tomas Vondra writes: > 0001 fixes this by tracking the number of actually indexed rows in the > build states, just like in the other index AMs. > A VACUUM or ANALYZE will fix the estimate, of course, but for tables > that are not changing very much it may take quite a while. So I think > this is something we definitely need to back-patch. Agreed, and done. I noticed a second bug in contrib/bloom, too: it'd forget to write out the last index page if that contained only one tuple, because the new-page code path in bloomBuildCallback failed to increment the "count" field after clearing it. > The 0002 part is the main part, unifying the definition of reltuples on > three main places: On to this part ... regards, tom lane
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Hi, So here is an updated version of the patch/fix, addressing the remaining issues in v3 posted by Tom in November. The 0001 part is actually a bugfix in bloom and spgist index AM, which did something like this: reltuples = IndexBuildHeapScan(...) result->heap_tuples = result->index_tuples = reltuples; That is, these two AMs simply used the number of heap rows for the index. That does not work for partial indexes, of course, where the correct index reltuples value is likely much lower. 0001 fixes this by tracking the number of actually indexed rows in the build states, just like in the other index AMs. A VACUUM or ANALYZE will fix the estimate, of course, but for tables that are not changing very much it may take quite a while. So I think this is something we definitely need to back-patch. The 0002 part is the main part, unifying the definition of reltuples on three main places: a) acquire_sample_rows (ANALYZE) b) lazy_scan_heap (VACUUM) c) IndexBuildHeapRangeScan (CREATE INDEX) As the ANALYZE case seems the most constrained, the other two places were updated to use the same criteria for which rows to include in the reltuples estimate: * HEAPTUPLE_LIVE * HEAPTUPLE_INSERT_IN_PROGRESS (same transaction) * HEAPTUPLE_DELETE_IN_PROGRESS (not the same trasaction) This resolves the issue with oscillating reltuples estimates, produced by VACUUM and ANALYZE (with many non-removable dead tuples). I've checked all IndexBuildHeapRangeScan callers, and none of them is using the reltuples estimate for anything except for passing it to index_update_stats. Aside from the bug fixed in 0001, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-Compute-index_tuples-correctly-in-bloom-and-spgist.patch.gz Description: application/gzip 0002-Unify-the-definition-of-reltuples-in-VACUUM-ANALYZE-.patch.gz Description: application/gzip
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On 03/05/2018 04:12 PM, David Steele wrote: > Hi Tomas, > > On 1/8/18 3:28 PM, Tomas Vondra wrote: >> >> >> On 01/08/2018 08:39 PM, Tom Lane wrote: >>> Tomas Vondra writes: As I already mentioned, Tom's updated patch is better than what I posted initially, and I agree with his approach to the remaining issues he pointed out. But I somehow assumed that he's already looking into that. Tom, do you plan to look into this patch soon, or should I? >>> >>> No, I thought you were going to run with those ideas. I have a lot >>> of other stuff on my plate ... >>> >> >> OK, will do. > > What the status of this patch? It's been waiting on author since last > November, though I can see there was some confusion over who was working > on it. > > Given that it's a bug fix it would be good to see a patch for this CF, > or very soon after. > Yeah, it's the next thing on my TODO. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Hi Tomas, On 1/8/18 3:28 PM, Tomas Vondra wrote: > > > On 01/08/2018 08:39 PM, Tom Lane wrote: >> Tomas Vondra writes: >>> As I already mentioned, Tom's updated patch is better than what I >>> posted initially, and I agree with his approach to the remaining >>> issues he pointed out. But I somehow assumed that he's already >>> looking into that. Tom, do you plan to look into this patch soon, >>> or should I? >> >> No, I thought you were going to run with those ideas. I have a lot >> of other stuff on my plate ... >> > > OK, will do. What the status of this patch? It's been waiting on author since last November, though I can see there was some confusion over who was working on it. Given that it's a bug fix it would be good to see a patch for this CF, or very soon after. Thanks, -- -David da...@pgmasters.net
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On 01/08/2018 08:39 PM, Tom Lane wrote: > Tomas Vondra writes: >> As I already mentioned, Tom's updated patch is better than what I >> posted initially, and I agree with his approach to the remaining >> issues he pointed out. But I somehow assumed that he's already >> looking into that. Tom, do you plan to look into this patch soon, >> or should I? > > No, I thought you were going to run with those ideas. I have a lot > of other stuff on my plate ... > OK, will do. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Tomas Vondra writes: > As I already mentioned, Tom's updated patch is better than what I posted > initially, and I agree with his approach to the remaining issues he > pointed out. But I somehow assumed that he's already looking into that. > Tom, do you plan to look into this patch soon, or should I? No, I thought you were going to run with those ideas. I have a lot of other stuff on my plate ... regards, tom lane
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On 01/05/2018 05:25 AM, Stephen Frost wrote: > Tomas, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra >> wrote: >>> Thanks for looking into this. I agree your patch is more consistent and >>> generally cleaner. >> >> This is classified as a bug fix, and is marked as waiting on author. I >> am moving it to next CF as work continues. > > This is still in waiting-on-author state; it'd be great to get your > thoughts on where this patch is and what the next steps are to move > it forward. If you need additional feedback or there needs to be > more discussion (perhaps with Tom) then maybe this should be in > needs-review instead, but it'd be great if you could review the > discussion and patch again and at least provide an update on it (last > update from you was in November). > Hmmm, I'm not sure, TBH. As I already mentioned, Tom's updated patch is better than what I posted initially, and I agree with his approach to the remaining issues he pointed out. But I somehow assumed that he's already looking into that. Tom, do you plan to look into this patch soon, or should I? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Tomas, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra > wrote: > > Thanks for looking into this. I agree your patch is more consistent and > > generally cleaner. > > This is classified as a bug fix, and is marked as waiting on author. I > am moving it to next CF as work continues. This is still in waiting-on-author state; it'd be great to get your thoughts on where this patch is and what the next steps are to move it forward. If you need additional feedback or there needs to be more discussion (perhaps with Tom) then maybe this should be in needs-review instead, but it'd be great if you could review the discussion and patch again and at least provide an update on it (last update from you was in November). Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra wrote: > Thanks for looking into this. I agree your patch is more consistent and > generally cleaner. This is classified as a bug fix, and is marked as waiting on author. I am moving it to next CF as work continues. -- Michael
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
BTW see bug #14863 which is related to this: https://postgr.es/m/CAEBTBzu5j_E1K1jb9OKwTZj98MPeM7V81-vadp5adRm=nhj...@mail.gmail.com -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Tomas Vondra writes: > On 11/02/2017 08:15 PM, Tom Lane wrote: >> However, I'm not sure we're there yet, because there remains a fairly >> nasty discrepancy even once we've gotten everyone onto the same page >> about reltuples counting just live tuples: VACUUM and ANALYZE have >> different definitions of what's "live". In particular they do not treat >> INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we >> try to do something about that? If so, what? It looks like ANALYZE's >> behavior is pretty tightly constrained, judging by the comments in >> acquire_sample_rows. > ISTM we need to unify those definitions, probably so that VACUUM adopts > what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats > will be updated at the end of transaction, why shouldn't VACUUM do the > same thing? That was the way I was leaning. I haven't thought very hard about the implications, but as long as the change in VACUUM's behavior extends only to the live-tuple count it reports, it seems like adjusting it couldn't break anything too badly. >> Another problem is that it looks like CREATE INDEX will set reltuples >> to the total number of heap entries it chose to index, because that >> is what IndexBuildHeapScan counts. Maybe we should adjust that? > You mean by only counting live tuples in IndexBuildHeapRangeScan, > following whatever definition we end up using in VACUUM/ANALYZE? Right. One issue is that, as I mentioned, the index AMs probably want to think about total-tuples-indexed not live-tuples; so for their purposes, what IndexBuildHeapScan currently counts is the right thing. We need to look and see if any AMs are actually using that value rather than just silently passing it back. If they are, we might need to go to the trouble of computing/returning two values. regards, tom lane
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Hi, On 11/02/2017 08:15 PM, Tom Lane wrote: > Haribabu Kommi writes: >> The changes are fine and now it reports proper live tuples in both >> pg_class and stats. The other issue of continuous vacuum operation >> leading to decrease of number of live tuples is not related to this >> patch and that can be handled separately. > > I did not like this patch too much, because it did nothing to fix > the underlying problem of confusion between "live tuples" and "total > tuples"; in fact, it made that worse, because now the comment on > LVRelStats.new_rel_tuples is a lie. And there's at least one usage > of that field value where I think we do want total tuples not live > tuples: where we pass it to index AM cleanup functions. Indexes > don't really care whether heap entries are live or not, only that > they're there. Plus the VACUUM VERBOSE log message that uses the > field is supposed to be reporting total tuples not live tuples. > > I hacked up the patch trying to make that better, finding along the > way that contrib/pgstattuple shared in the confusion about what > it was trying to count. Results attached. > Thanks for looking into this. I agree your patch is more consistent and generally cleaner. > However, I'm not sure we're there yet, because there remains a fairly > nasty discrepancy even once we've gotten everyone onto the same page > about reltuples counting just live tuples: VACUUM and ANALYZE have > different definitions of what's "live". In particular they do not treat > INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we > try to do something about that? If so, what? It looks like ANALYZE's > behavior is pretty tightly constrained, judging by the comments in > acquire_sample_rows. > Yeah :-( For the record (and people following this thread who are too lazy to open the analyze.c and search for the comments), the problem is that acquire_sample_rows does not count HEAPTUPLE_INSERT_IN_PROGRESS as live (and HEAPTUPLE_DELETE_IN_PROGRESS as dead) as it assumes the transaction will send the stats at the end. Which is a correct assumption, but it means that when there is a long-running transaction that inserted/deleted many rows, the reltuples value will oscillate during VACUUM / ANALYZE runs. ISTM we need to unify those definitions, probably so that VACUUM adopts what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats will be updated at the end of transaction, why shouldn't VACUUM do the same thing? The one reason I can think of is that VACUUM is generally expected to run longer than ANALYZE, so the assumption that large transactions complete later is not quite reliable. Or is there some other reason why VACUUM can't treat the in-progress tuples the same way? > Another problem is that it looks like CREATE INDEX will set reltuples > to the total number of heap entries it chose to index, because that > is what IndexBuildHeapScan counts. Maybe we should adjust that? > You mean by only counting live tuples in IndexBuildHeapRangeScan, following whatever definition we end up using in VACUUM/ANALYZE? Seems like a good idea I guess, although CREATE INDEX not as frequent as the other commands. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services