On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
Simon Riggs wrote:
On 15 April 2015 at 09:10, Andres Freund <and...@anarazel.de> wrote:

I don't really see the downside to this suggestion.

The suggestion makes things better than they are now but is still less
than I have proposed.

If what you both mean is "IMHO this is an acceptable compromise", I
can accept it also, at this point in the CF.

Let me see if I understand things.

What we have now is: when reading a page, we also HOT-clean it.  This
runs HOT-cleanup a large number of times, and causes many pages to
become dirty.

Your patch is "when reading a page, HOT-clean it, but only 5 times in
each scan".  This runs HOT-cleanup at most 5 times, and causes at most 5
pages to become dirty.

Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
dirty, also HOT-clean it but only 5 times in each scan".  This runs
HOT-cleanup some number of times (as many as there are dirty), and
causes at most 5 pages to become dirty.


Am I right in thinking that HOT-clean in a dirty page is something that
runs completely within CPU cache?  If so, it would be damn fast and
would have benefits for future readers, for very little cost.

If there are many tuples on the page, it takes some CPU effort to scan all the HOT chains and move tuples around. Also, it creates a WAL record, which isn't free.

Another question is whether the patch can reliably detect whether it's doing a "read-only" scan or not. I haven't tested, but I suspect it'd not do pruning when you do something like "INSERT INTO foo SELECT * FROM foo WHERE blah". I.e. when the target relation is referenced twice in the same statement: once as the target, and second time as a source. Maybe that's OK, though.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to