On 13/07/18 16:41, Andrey Borodin wrote:
12 июля 2018 г., в 21:07, Andrey Borodin <x4...@yandex-team.ru <mailto:x4...@yandex-team.ru>> написал(а): 12 июля 2018 г., в 20:40, Heikki Linnakangas <hlinn...@iki.fi <mailto:hlinn...@iki.fi>> написал(а):
Actually, now that I think about it more, I'm not happy with leaving orphaned pages like that behind. Let's WAL-log the removal of the downlink, and marking the leaf pages as deleted, in one WAL record, to avoid that.

OK, will do this. But this will complicate WAL replay seriously, and I do not know a proper way to test that (BTW there is GiST amcheck in progress, but I decided to leave it for a while).
Done. Now WAL record for deleted page also removes downlink from internal page.
I had to use IndexPageTupleDelete() instead of IndexPageTupleMultiDelete(), but
I do not think it will have any impact on performance.

Yeah, I think that's fine, this isn't that performance critical

But the situation in gistdoinsert(), where you encounter a deleted leaf page, could happen during normal operation, if vacuum runs concurrently with an insert. Insertion locks only one page at a time, as it descends the tree, so after it has released the lock on the parent, but before it has locked the child, vacuum might have deleted the page. In the latest patch, you're checking for that just before swapping the shared lock for an exclusive one, but I think that's wrong; you need to check for that after swapping the lock, because otherwise vacuum might delete the page while you're not holding the lock.
Looks like a valid concern, I'll move that code again.
Done.

Ok, the comment now says:

+                       /*
+                        * Leaf pages can be left deleted but still referenced 
in case of
+                        * crash during VACUUM's gistbulkdelete()
+                        */

But that's not accurate, right? You should never see deleted pages after a crash, because the parent is updated in the same WAL record as the child page, right?

I'm still a bit scared about using pd_prune_xid to store the XID that prevents recycling the page too early. Can we use some field in GISTPageOpaqueData for that, similar to how the B-tree stores it in BTPageOpaqueData?

- Heikki

Reply via email to