On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmh...@gmail.com> wrote: >> [condition-variable-v1.patch] > > Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?
I poked at this a bit... OK, a lot... and have some feedback: 1. As above, we need to clear cvSleeping before setting the latch. 2. The following schedule corrupts the waitlist by trying to delete something from it that isn't in it: P1: ConditionVariablePrepareToSleep: push self onto waitlist P2: ConditionVariableSignal: pop P1 from waitlist P1: <check user condition, decide to break without ever sleeping> P3: ConditionVariablePrepareToSleep: push self onto waitlist P1: ConditionVariableCancelSleep: delete self from waitlist (!) Without P3 coming along you probably wouldn't notice because the waitlist will be happily cleared and look valid, but P3's entry gets lost and then it hangs forever. One solution is to teach ConditionVariableCancelSleep to check if we're still actually there first. That can be done in O(1) time by clearing nodes' next and prev pointers when deleting, so that we can rely on that in a new proclist_contains() function. See attached. 3. The following schedule corrupts the waitlist by trying to insert something into it that is already in it: P1: ConditionVariablePrepareToSleep: push self onto waitlist P1: <check user condition, decide to sleep> P1: ConditionVariableSleep P1: ConditionVariablePrepareToSleep: push self onto waitlist (!) Nodes before and after self's pre-existing position can be forgotten when self's node is pushed to the front of the list. That can be fixed by making ConditionVariablePrepareToSleep also check if we're already in the list. 4. The waitlist is handled LIFO-ly. Would it be better for the first guy to start waiting to be woken up first, like we do for locks? The Pthreads equivalent says that it depends on "scheduling policy". I don't know if it matters much, just an observation. 5. The new proclist function you added is the first to work in terms of PGPROC* rather than procno. Maybe the whole interface should work with either PGPROC pointers or procnos? No strong view. Please find attached a -v2 of your patch which includes suggestions 1-3 above. Like the -v1, it applies on top of lwlocks-in-dsm-v3.patch. Also, I have attached a v2->v3 diff to show just my proposed changes. -- Thomas Munro http://www.enterprisedb.com
diff.patch
Description: Binary data
condition-variable-v2.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers