While talking to various people during pgCon, I was reminded that the
nbtree README does a poor job of explaining the actual practical
advantages of L&Y from a high level. The "move right" trick is
initially mentioned only as an adjunct to a discussion of the
special-case handling of root page splits, even though it's of huge
significance. We also say this within nbtinsert.c:

/*
 * If the page was split between the time that we surrendered our read
 * lock and acquired our write lock, then this page may no longer be the
 * right place for the key we want to insert.  In this case, we need to
 * move right in the tree.  See Lehman and Yao for an excruciatingly
 * precise description.
 */

(Why we need to say something about _bt_moveright() within
_bt_doinsert() that is equally true of both _bt_moveright() callers
isn't clear).

I think that this isn't enough. Attached patch proposes to add a small
paragraph at the top of the nbtree README, to clarify the advantages
of L&Y from a high level. I don't think it's appropriate to state the
advantages of an algorithm in a README file generally, but in this
instance it makes it easier to understand the algorithm.

-- 
Peter Geoghegan
*** a/src/backend/access/nbtree/README
--- b/src/backend/access/nbtree/README
*************** use a simplified version of the deletion
*** 11,16 ****
--- 11,36 ----
  Shasha (V. Lanin and D. Shasha, A Symmetric Concurrent B-Tree Algorithm,
  Proceedings of 1986 Fall Joint Computer Conference, pp 380-389).
  
+ Lehman and Yao don't require read locks, but assume that in-memory
+ copies of tree pages are unshared.  Postgres shares in-memory buffers
+ among backends.  As a result, we do page-level read locking on btree
+ pages in order to guarantee that no record is modified while we are
+ examining it.  This reduces concurrency but guarantees correct
+ behavior.  An advantage is that when trading in a read lock for a
+ write lock, we need not re-read the page after getting the write lock.
+ Since we're also holding a pin on the shared buffer containing the
+ page, we know that buffer still contains the page and is up-to-date.
+ 
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling).  The addition of right-links at all levels, as well as the
+ addition of a page "high key" allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent).  L&Y Trees are sometimes referred to as "B-Link
+ trees" in the literature.
+ 
  The Lehman and Yao Algorithm and Insertions
  -------------------------------------------
  
*************** to be inserted has a choice whether or n
*** 42,57 ****
  key could go on either page.  (Currently, we try to find a page where
  there is room for the new key without a split.)
  
- Lehman and Yao don't require read locks, but assume that in-memory
- copies of tree pages are unshared.  Postgres shares in-memory buffers
- among backends.  As a result, we do page-level read locking on btree
- pages in order to guarantee that no record is modified while we are
- examining it.  This reduces concurrency but guarantees correct
- behavior.  An advantage is that when trading in a read lock for a
- write lock, we need not re-read the page after getting the write lock.
- Since we're also holding a pin on the shared buffer containing the
- page, we know that buffer still contains the page and is up-to-date.
- 
  We support the notion of an ordered "scan" of an index as well as
  insertions, deletions, and simple lookups.  A scan in the forward
  direction is no problem, we just use the right-sibling pointers that
--- 62,67 ----
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to