> I'm suggesting we take the approach that if there is a problem we can
> recreate it as a way of exploring what conditions are required and
> therefore work out the impact. Nikhil Sontakke appears to have
> re-created something, but not quite what I had expected. I think he
> will post here tomorrow with an update for us to discuss.
>
>

So, I reverted commit 0874d4f3e183757ba15a4b3f3bf563e0393dd9c2 to go back
to the earlier bad swapped arguments to SubTransSetParent resulting in
incorrect parent linkages and used the attached TAP test patch.

The test prepares a 2PC with more than 64 subtransactions. It then stops
the master and promotes the standby.

A SELECT query on the newly promoted master on any of the tables involved
in the 2PC hangs. The hang is due to a loop in
SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
circular reference in parentxid <-> subxid inducing the infinite loop.

Any further DML on these objects which will need to check visibility of
these tuples hangs as well. All unrelated objects and new transactions are
ok AFAICS.

I do not see any data loss, which is good. However tables involved in the
2PC are inaccessible till after a hard restart.

The attached TAP test patch can be considered for commit to test handling
2PC with large subtransactions on promoted standby instances.

Regards,
Nikhils
-- 
 Nikhil Sontakke                   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment: subxid_bug_with_test_case_v1.0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to