> I'm suggesting we take the approach that if there is a problem we can > recreate it as a way of exploring what conditions are required and > therefore work out the impact. Nikhil Sontakke appears to have > re-created something, but not quite what I had expected. I think he > will post here tomorrow with an update for us to discuss. > >
So, I reverted commit 0874d4f3e183757ba15a4b3f3bf563e0393dd9c2 to go back to the earlier bad swapped arguments to SubTransSetParent resulting in incorrect parent linkages and used the attached TAP test patch. The test prepares a 2PC with more than 64 subtransactions. It then stops the master and promotes the standby. A SELECT query on the newly promoted master on any of the tables involved in the 2PC hangs. The hang is due to a loop in SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a circular reference in parentxid <-> subxid inducing the infinite loop. Any further DML on these objects which will need to check visibility of these tuples hangs as well. All unrelated objects and new transactions are ok AFAICS. I do not see any data loss, which is good. However tables involved in the 2PC are inaccessible till after a hard restart. The attached TAP test patch can be considered for commit to test handling 2PC with large subtransactions on promoted standby instances. Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
subxid_bug_with_test_case_v1.0.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers