On 05/27/2014 09:47 PM, Peter Geoghegan wrote:
On Tue, May 27, 2014 at 4:54 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
Also note that _bt_moveright() also finishes any incomplete splits it
encounters (when called for an insertion). So _bt_findinsertloc() never gets
called on a page with the incomplete-split flag set. It might encounter one
when it moves right, but never the first page.

Fair enough, but I don't think that affects correctness either way (I
don't think that you meant to imply that this was a necessary
precaution that you'd taken - right?).

Right.

If I understood correctly, the tree looks like this before the insertion:

Parent page:
+-------------+
|             |
| 9 -> A      |
+-------------+

Leaf A
+-------------+
| HI-key: 9   |
|             |
| 7 8 9       |
+-------------+

And after insertion and incomplete split:

Parent page
+-------------+
|             |
| 9 -> A      |
+-------------+

Leaf A              Leaf B
+--------------+     +-------------+
| HI-key: 8    |     | HI-key: 9   |
| (INCOMPLETE_ |     |             |
| SPLIT)       | <-> |             |
|              |     |             |
|  7   7*   8  |     |   9         |
+--------------+     +-------------+

After the split is finished, the tree looks like this:

Parent page
+-------------+
| 8 -> A      |
| 9 -> B      |
+-------------+

Leaf A              Leaf B
+-------------+     +-------------+
| HI-key: 8   |     | HI-key: 9   |
|             | <-> |             |
|  7   7*  8  |     |   9         |
+-------------+     +-------------+

How did the parent page change between before and after the final
atomic operation (page split completion)? What happened to "9 -> A"?

Ah, sorry, I got that wrong. The downlinks store the *low* key of the child page, not the high key as I depicted. Let me try again:

On 05/27/2014 09:17 AM, Peter Geoghegan wrote:
Anyway, the concern is that there may be problems when we call
_bt_finish_split() with that left sibling locked thoughout (i.e.
finish a split where the right sibling is BTP_INCOMPLETE_SPLIT, and
itself has a right sibling from the incomplete split (which is usually
the value lock page's right-right sibling)). I'm not concerned about
performance, since as the comment says it ought to be an infrequent
occurrence. I also believe that there are no deadlock hazards. But
consider this scenario:

* We insert the value 7 into an int4 unique index. We need to split
the leaf page. We run out of memory or something, and ours is an
incomplete page split. Our transaction is aborted. For the sake of
argument, suppose that there are also already a bunch of dead tuples
within the index with values of 7, 8 and 9.

So, initially the tree looks like this:

Parent page:
+-------------+
|             |
| 7 -> A      |
+-------------+

Leaf A
+-------------+
| HI-key: 9   |
|             |
| 7 8 9       |
+-------------+

And after insertion and incomplete split:

Parent page
+-------------+
|             |
| 7 -> A      |
+-------------+

Leaf A              Leaf B
+--------------+     +-------------+
| HI-key: 8    |     | HI-key: 9   |
| (INCOMPLETE_ |     |             |
| SPLIT)       | <-> |             |
|              |     |             |
|  7   7*  8   |     |    9        |
+--------------+     +-------------+

where 7* is the newly inserted key, with value 7.

(you didn't mention at what point the split happens, but in the next
paragraph you said the new downlink has value 8, which implies the above split)

* Another inserter of the value 7 comes along. It follows exactly the
same downlink as the first, now aborted inserter (suppose the
downlink's value is 9). It also locks the same leaf page to establish
a "value lock" in precisely the same manner. Finding no room on the
first page, it looks further right, maintaining its original "value
lock" throughout. It finishes the first inserter's split of the
non-value-lock page - a new downlink is inserted into the parent page,
with the value 8. It then releases all buffer locks except the first
one/original "value lock". A physical insertion has yet to occur.

The downlink of the original page cannot contain 9. Because, as I now remember ;-), the downlinks contain low-keys, not high keys.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to