Re: [HACKERS] visibilitymap_count() at the end of vacuum
On Tuesday, December 04, 2012 5:14 AM Pavan Deolasee wrote: On Tue, Dec 4, 2012 at 3:16 AM, Robert Haas robertmh...@gmail.com wrote: Also, if someone does have a 100GB relation, rereading 2MB of visibility map pages at the end probably isn't a significant part of the total cost. Even if 99.9% of the relation is all-visible and we skip reading those parts, the visibility map reads will still be only about 2% of the total read activity, and most of the time you won't be that lucky. Hmm. I fully agree its a very small percentage of the total cost. But I still don't see why it should not be optimised, if possible. Of course, if not recounting at the end will generate bad query plans most of the time for most of the workloads or even a few workloads, then the minuscule cost will pay of. But nobody convincingly argued about that. Even if the current way is the right way, we should probably just add a comment there. I also noticed that we call vacuum_delay_point() after testing every visibility map bit in lazy_scan_heap() which looks overkill, but visibilitymap_count() runs to the end without even a single call to vacuum_delay_point(). Is that intended ? Or should we at least check for interrupts there ? I think calling vacuum_delay_point(), after every visibility map bit test in lazy_scan_heap() might not be necessary. Shouldn't both places be consistent and call vacuum_delay_point() after one vm page processing? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] visibilitymap_count() at the end of vacuum
I wonder if we really need to make another pass over the entire visibility map to count the number of all-visible pages at the end of the vacuum. The code that I'm looking at is in src/backend/commands/vacuumlazy.c: 247 new_rel_allvisible = visibilitymap_count(onerel); 248 if (new_rel_allvisible new_rel_pages) 249 new_rel_allvisible = new_rel_pages; We would have just scanned every bit of the visibility map and can remember information about the number of all-visible pages in vacrelstats, just like many other statistical information that we track and update the end of the vacuum. Sure, there might be some more updates to the VM, especially a few bits may get cleared while we are vacuuming the table, but that can happen even while we are recounting at the end. AFAICS we can deal with that much staleness of the data. If we agree that this is worth improving, I can write a patch to do so. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] visibilitymap_count() at the end of vacuum
Hi, On 2012-12-03 23:44:36 +0530, Pavan Deolasee wrote: I wonder if we really need to make another pass over the entire visibility map to count the number of all-visible pages at the end of the vacuum. The code that I'm looking at is in src/backend/commands/vacuumlazy.c: 247 new_rel_allvisible = visibilitymap_count(onerel); 248 if (new_rel_allvisible new_rel_pages) 249 new_rel_allvisible = new_rel_pages; We would have just scanned every bit of the visibility map and can remember information about the number of all-visible pages in vacrelstats, just like many other statistical information that we track and update the end of the vacuum. Sure, there might be some more updates to the VM, especially a few bits may get cleared while we are vacuuming the table, but that can happen even while we are recounting at the end. AFAICS we can deal with that much staleness of the data. A full-table vacuum can take a *long* (as in days) time, so I think recounting makes sense. And normally the cost is pretty small, so I don't see a problem in this. Why change it? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] visibilitymap_count() at the end of vacuum
On Mon, Dec 3, 2012 at 11:50 PM, Andres Freund and...@2ndquadrant.comwrote: A full-table vacuum can take a *long* (as in days) time, so I think recounting makes sense. And normally the cost is pretty small, so I don't see a problem in this. Well, may be the cost is low. But it can still run into several hundred or thousand pages that are read into the buffer pool again. If there is indeed too much churn happening, an ANALYZE will kick in which will count the bits anyway or the next VACUUM will correct it (though it may become out dated again) Why change it? Why not ? As I said, we would have just counted the bits and will be doing it again which looks overkill unless someone really wants to argue that the staleness of the data is going to cause the planner to really start producing way too bad plans. But note that we have lived with the staleness of reltuples and relpages for so long. So I don't see why relallvisible needs any special treatment, just because its relatively easy to recount them. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] visibilitymap_count() at the end of vacuum
On 3 December 2012 18:20, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2012-12-03 23:44:36 +0530, Pavan Deolasee wrote: I wonder if we really need to make another pass over the entire visibility map to count the number of all-visible pages at the end of the vacuum. The code that I'm looking at is in src/backend/commands/vacuumlazy.c: 247 new_rel_allvisible = visibilitymap_count(onerel); 248 if (new_rel_allvisible new_rel_pages) 249 new_rel_allvisible = new_rel_pages; We would have just scanned every bit of the visibility map and can remember information about the number of all-visible pages in vacrelstats, just like many other statistical information that we track and update the end of the vacuum. Sure, there might be some more updates to the VM, especially a few bits may get cleared while we are vacuuming the table, but that can happen even while we are recounting at the end. AFAICS we can deal with that much staleness of the data. A full-table vacuum can take a *long* (as in days) time, so I think recounting makes sense. And normally the cost is pretty small, so I don't see a problem in this. Why change it? There's another reason for doing it this way: if VACUUM sets everything as all visible, but during the VACUUM that state is quickly reset by others, it would be a mistake not to allow for that. We want a realistic value not a best possible case. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] visibilitymap_count() at the end of vacuum
On Mon, Dec 3, 2012 at 1:36 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: Well, may be the cost is low. But it can still run into several hundred or thousand pages that are read into the buffer pool again. If there is indeed too much churn happening, an ANALYZE will kick in which will count the bits anyway or the next VACUUM will correct it (though it may become out dated again) Several hundred pages? Each visibility map page covers 512 MB of heap pages. If you read several hundred of them, you're talking about a relation that is over 100GB in size. If you read several thousand, you're over a terabyte. There are probably a few people who have PostgreSQL relations that large, but not many. Also, if someone does have a 100GB relation, rereading 2MB of visibility map pages at the end probably isn't a significant part of the total cost. Even if 99.9% of the relation is all-visible and we skip reading those parts, the visibility map reads will still be only about 2% of the total read activity, and most of the time you won't be that lucky. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] visibilitymap_count() at the end of vacuum
On Tue, Dec 4, 2012 at 3:16 AM, Robert Haas robertmh...@gmail.com wrote: Also, if someone does have a 100GB relation, rereading 2MB of visibility map pages at the end probably isn't a significant part of the total cost. Even if 99.9% of the relation is all-visible and we skip reading those parts, the visibility map reads will still be only about 2% of the total read activity, and most of the time you won't be that lucky. Hmm. I fully agree its a very small percentage of the total cost. But I still don't see why it should not be optimised, if possible. Of course, if not recounting at the end will generate bad query plans most of the time for most of the workloads or even a few workloads, then the minuscule cost will pay of. But nobody convincingly argued about that. Even if the current way is the right way, we should probably just add a comment there. I also noticed that we call vacuum_delay_point() after testing every visibility map bit in lazy_scan_heap() which looks overkill, but visibilitymap_count() runs to the end without even a single call to vacuum_delay_point(). Is that intended ? Or should we at least check for interrupts there ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee