Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-08 Thread Asim Praveen
Hi Michael

On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier 
wrote:

>
> Did you really test WAL replay? This still ignores that PageGetLSN is
> as well taken in some code paths, like recovery, where actions on the
> page are guaranteed to be serialized, like during recovery, so this
> patch would cause the system to blow up. Note that pageinspect,
> amcheck and wal_consistency_checking also process on page copies. So
> the assertion failure of 0002 would trigger in those cases.
>

Indeed, the assertion tripped during WAL replay on the standby.  This was
caught by TAP tests under src/test/recovery.  The assertion is now fixed so
that WAL replay is exempt from the check.  Please find the new patch
attached.  The tests now pass with the fix.  I also manually verified that
recovery works with "wal_consistency_checking=all".

Asim


0002-PageGetLSN-assert-that-locks-are-properly-held.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-06 Thread Asim Praveen
Hi Michael
On Mon, Oct 2, 2017 at 6:48 PM, Michael Paquier 
wrote:
>
> Jacob, here are some ideas to make this thread move on. I would
> suggest to produce a set of patches that do things incrementally:
> 1) One patch that changes the calls of PageGetLSN to
> BufferGetLSNAtomic which are now not appropriate. You have spotted
> some of them in the btree and gist code, but not all based on my first
> lookup. There is still one in gistFindCorrectParent and one in btree
> _bt_split. The monitoring of the other calls (sequence.c and
> vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble
> that should be fixed I think.

Thank you for your suggestions.  Please find the first patch attached as
"0001-...".  We verified both, gistFindCorrectParent and _bt_split and all
calls to PageGetLSN are made with exclusive lock on the buffer contents
held.

> 2) A second patch that strengthens a bit checks around
> BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you
> are originally suggesting.
> A comment could be as well added in bufpage.h for PageGetLSN to let
> users know that it should be used carefully, in the vein of what is
> mentioned in src/backend/access/transam/README.


The second patch "0002-..." does the above.  We have a comment added to
AssertPageIsLockedForLSN as suggested.

The assertion added caught at least one code path where TestForOldSnapshot
calls PageGetLSN without holding any lock.  The snapshot_too_old test in
"check-world" failed due to the assertion failure.  This needs to be fixed,
see the open question in the opening mail on this thread.

Asim and Jacob


0001-Change-incorrect-calls-to-PageGetLSN-to-BufferGetLSN.patch
Description: Binary data


0002-PageGetLSN-assert-that-locks-are-properly-held.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers