Re: [HACKERS] Reduce pinning in btree indexes

Heikki Linnakangas Mon, 23 Feb 2015 11:50:02 -0800

On 02/15/2015 02:19 AM, Kevin Grittner wrote:

Interestingly, the btree README points out that using the old TID
with a new tuple poses no hazard for a scan using an MVCC snapshot,
because the new tuple would not be visible to a snapshot created
that long ago.

The first question is: Do we really need that interlock for the non-MVCCsnapshots either?

If we do: For non-MVCC snapshots, we need to ensure that all index scansthat started before the VACUUM did complete before the VACUUM does. Iwonder if we could find a different mechanism to enforce that. Using thepin-interlock for that has always seemed a bit silly to me. Perhaps graba new heavy-weight lock on the table whenever a non-MVCC index scan onthe table begins, and have VACUUM wait on it.

I found that the LP_DEAD hinting
would be a problem with an old TID, but I figured we could work
around that by storing the page LSN into the scan position structure
when the page contents were read, and only doing hinting if that
matched when we were ready to do the hinting.  That wouldn't work
for an index which was not WAL-logged, so the patch still holds
pins for those.


Or you could use GetFakeLSNForUnloggedRel().

Robert pointed out that the visibility information
for an index-only scan wasn't checked while the index page READ
lock was held, so those scans also still hold the pins.


Why does an index-only scan need to hold the pin?

Finally, there was an "optimization" for marking buffer position
for possible restore that was incompatible with releasing the pin.
I use quotes because the optimization adds overhead to every move
to the next page in order set some variables in a structure when a
mark is requested instead of running two fairly small memcpy()
statements.  The two-day benchmark of the customer showed no
performance hit, and looking at the code I would be amazed if the
optimization yielded a measurable benefit.  In general, optimization
by adding overhead to moving through a scan to save time in a mark
operation seems dubious.

Hmm. Did your test case actually exercise mark/restore? The memcpy()sare not that small. Especially if it's an index-only scan, you'recopying a large portion of the page. Some scans call markpos on everytuple, so that could add up.

At some point we could consider building on this patch to recheck
index conditions for heap access when a non-MVCC snapshot is used,
check the visibility map for referenced heap pages when the TIDs
are read for an index-only scan, and/or skip LP_DEAD hinting for
non-WAL-logged indexes.  But all those are speculative future work;
this is a conservative implementation that just didn't modify
pinning where there were any confounding factors.

Understood. Still, I'd like to see if we can easily get rid of thepinning altogether.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce pinning in btree indexes

Reply via email to