Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-04 Thread Amit Kapila

On Tuesday, December 04, 2012 5:14 AM Pavan Deolasee wrote:
On Tue, Dec 4, 2012 at 3:16 AM, Robert Haas robertmh...@gmail.com wrote:

Also, if someone does have a 100GB relation, rereading 2MB of
visibility map pages at the end probably isn't a significant part of
the total cost.  Even if 99.9% of the relation is all-visible and we
skip reading those parts, the visibility map reads will still be only
about 2% of the total read activity, and most of the time you won't be
that lucky.

Hmm. I fully agree its a very small percentage of the total cost. But I
still don't see why it should not be optimised, if possible. Of course, if
not recounting at the end will generate bad query plans most of the time for
most of the workloads or even a few workloads, then the minuscule cost will
pay of. But nobody convincingly argued about that.

Even if the current way is the right way, we should probably just add a
comment there. I also noticed that we call vacuum_delay_point() after
testing every visibility map bit in lazy_scan_heap() which looks overkill,
but visibilitymap_count() runs to the end without even a single call to
vacuum_delay_point(). Is that intended ? Or should we at least check for
interrupts there ?

I think calling vacuum_delay_point(), after every visibility map bit test in
lazy_scan_heap() might not be necessary.
Shouldn't both places be consistent and call vacuum_delay_point() after one
vm page processing?

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Pavan Deolasee
I wonder if we really need to make another pass over the entire visibility
map to count the number of all-visible pages at the end of the vacuum. The
code that I'm looking at is in src/backend/commands/vacuumlazy.c:

 247 new_rel_allvisible = visibilitymap_count(onerel);
 248 if (new_rel_allvisible  new_rel_pages)
 249 new_rel_allvisible = new_rel_pages;

We would have just scanned every bit of the visibility map and can remember
information about the number of all-visible pages in vacrelstats, just like
many other statistical information that we track and update the end of the
vacuum. Sure, there might be some more updates to the VM, especially a few
bits may get cleared while we are vacuuming the table, but that can happen
even while we are recounting at the end. AFAICS we can deal with that much
staleness of the data.

If we agree that this is worth improving, I can write a patch to do so.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Andres Freund
Hi,

On 2012-12-03 23:44:36 +0530, Pavan Deolasee wrote:
 I wonder if we really need to make another pass over the entire visibility
 map to count the number of all-visible pages at the end of the vacuum. The
 code that I'm looking at is in src/backend/commands/vacuumlazy.c:

  247 new_rel_allvisible = visibilitymap_count(onerel);
  248 if (new_rel_allvisible  new_rel_pages)
  249 new_rel_allvisible = new_rel_pages;

 We would have just scanned every bit of the visibility map and can remember
 information about the number of all-visible pages in vacrelstats, just like
 many other statistical information that we track and update the end of the
 vacuum. Sure, there might be some more updates to the VM, especially a few
 bits may get cleared while we are vacuuming the table, but that can happen
 even while we are recounting at the end. AFAICS we can deal with that much
 staleness of the data.

A full-table vacuum can take a *long* (as in days) time, so I think
recounting makes sense. And normally the cost is pretty small, so I
don't see a problem in this.

Why change it?

Andres
--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Pavan Deolasee
On Mon, Dec 3, 2012 at 11:50 PM, Andres Freund and...@2ndquadrant.comwrote:



 A full-table vacuum can take a *long* (as in days) time, so I think
 recounting makes sense. And normally the cost is pretty small, so I
 don't see a problem in this.


Well, may be the cost is low. But it can still run into several hundred or
thousand pages that are read into the buffer pool again. If there is indeed
too much churn happening, an ANALYZE will kick in which will count the bits
anyway or the next VACUUM will correct it (though it may become out dated
again)


 Why change it?


Why not ? As I said, we would have just counted the bits and will be doing
it again which looks overkill unless someone really wants to argue that the
staleness of the data is going to cause the planner to really start
producing way too bad plans. But note that we have lived with the staleness
of reltuples and relpages for so long. So I don't see why relallvisible
needs any special treatment, just because its relatively easy to recount
them.

Thanks,
Pavan


-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Simon Riggs
On 3 December 2012 18:20, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2012-12-03 23:44:36 +0530, Pavan Deolasee wrote:
 I wonder if we really need to make another pass over the entire visibility
 map to count the number of all-visible pages at the end of the vacuum. The
 code that I'm looking at is in src/backend/commands/vacuumlazy.c:

  247 new_rel_allvisible = visibilitymap_count(onerel);
  248 if (new_rel_allvisible  new_rel_pages)
  249 new_rel_allvisible = new_rel_pages;

 We would have just scanned every bit of the visibility map and can remember
 information about the number of all-visible pages in vacrelstats, just like
 many other statistical information that we track and update the end of the
 vacuum. Sure, there might be some more updates to the VM, especially a few
 bits may get cleared while we are vacuuming the table, but that can happen
 even while we are recounting at the end. AFAICS we can deal with that much
 staleness of the data.

 A full-table vacuum can take a *long* (as in days) time, so I think
 recounting makes sense. And normally the cost is pretty small, so I
 don't see a problem in this.

 Why change it?

There's another reason for doing it this way: if VACUUM sets
everything as all visible, but during the VACUUM that state is quickly
reset by others, it would be a mistake not to allow for that. We want
a realistic value not a best possible case.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Robert Haas
On Mon, Dec 3, 2012 at 1:36 PM, Pavan Deolasee pavan.deola...@gmail.com wrote:
 Well, may be the cost is low. But it can still run into several hundred or
 thousand pages that are read into the buffer pool again. If there is indeed
 too much churn happening, an ANALYZE will kick in which will count the bits
 anyway or the next VACUUM will correct it (though it may become out dated
 again)

Several hundred pages?  Each visibility map page covers 512 MB of heap
pages.  If you read several hundred of them, you're talking about a
relation that is over 100GB in size.   If you read several thousand,
you're over a terabyte.  There are probably a few people who have
PostgreSQL relations that large, but not many.

Also, if someone does have a 100GB relation, rereading 2MB of
visibility map pages at the end probably isn't a significant part of
the total cost.  Even if 99.9% of the relation is all-visible and we
skip reading those parts, the visibility map reads will still be only
about 2% of the total read activity, and most of the time you won't be
that lucky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-03 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 3:16 AM, Robert Haas robertmh...@gmail.com wrote:


 Also, if someone does have a 100GB relation, rereading 2MB of
 visibility map pages at the end probably isn't a significant part of
 the total cost.  Even if 99.9% of the relation is all-visible and we
 skip reading those parts, the visibility map reads will still be only
 about 2% of the total read activity, and most of the time you won't be
 that lucky.


Hmm. I fully agree its a very small percentage of the total cost. But I
still don't see why it should not be optimised, if possible. Of course, if
not recounting at the end will generate bad query plans most of the time
for most of the workloads or even a few workloads, then the minuscule cost
will pay of. But nobody convincingly argued about that.

Even if the current way is the right way, we should probably just add a
comment there. I also noticed that we call vacuum_delay_point() after
testing every visibility map bit in lazy_scan_heap() which looks overkill,
but visibilitymap_count() runs to the end without even a single call to
vacuum_delay_point(). Is that intended ? Or should we at least check for
interrupts there ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee