On 25/07/17 01:33, Alvaro Herrera wrote: > Alvaro Herrera wrote: >> I'm back at looking into this again, after a rather exhausting week. I >> think I have a better grasp of what is going on in this code now, > > Here's an updated patch, which I intend to push tomorrow. >
I think the ConditionVariablePrepareToSleep() call in ReplicationSlotAcquire() needs to be done inside the mutex lock otherwise there is tiny race condition where the process which has slot acquired might release the slot between the mutex unlock and the ConditionVariablePrepareToSleep() call which means we'll never get signaled and ConditionVariableSleep() will wait forever. Otherwise the patch looks good to me. As a side note, the ConditionVariablePrepareToSleep()'s comment could be improved because currently it says the only advantage is that we skip double-test in the beginning of ConditionVariableSleep(). But that's not true, it's essential for preventing race conditions like the one above because it puts the current process into waiting list so we can be sure it will be signaled on broadcast once ConditionVariablePrepareToSleep() has been called. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers