Hi,
Recently I inherited some code in production that was hitting an error case 
that I didn't think should be possible to hit. I reduced it down to this 
test case. To be clear, there are several different ways to improve the 
code here, but I'd like to understand why it's behaving the way it does 
first. 

You should be able to just do "go test ." here to reproduce the error: 
https://github.com/kevinburke/sync-cond-experiment

What the code is doing:

   - Multiple different goroutines are taking a sync.Cond lock and then 
   appending data to a shared buffer.
   - A "flush" goroutine calls sync.Cond.Wait() to wait for an incoming 
   signal that data has been appended
   - Each goroutine that appends to the buffer calls Signal() after the 
   write, to try to wake up the "flush" goroutine

I *expect* that the flush goroutine will wake up after each call to 
Signal(), check whether the batch is ready to be flushed, and if not go 
back to sleep. 

What I see instead is that lots of other goroutines are taking out the lock 
before the flush goroutine can get to it, and as a result we're dropping 
data.

I didn't expect that to happen based on my reading of the docs for 
sync.Cond, which (to me) indicate that Signal() will prioritize a goroutine 
that calls Wait() (instead of any other goroutines that are waiting on 
sync.Cond.L). Instead it looks like it's just unlocking any goroutine? 
Maybe this is because the thread that is calling Signal() holds the lock?

Thanks for your help,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com.

Reply via email to