Hi, Recently I inherited some code in production that was hitting an error case that I didn't think should be possible to hit. I reduced it down to this test case. To be clear, there are several different ways to improve the code here, but I'd like to understand why it's behaving the way it does first.
You should be able to just do "go test ." here to reproduce the error: https://github.com/kevinburke/sync-cond-experiment What the code is doing: - Multiple different goroutines are taking a sync.Cond lock and then appending data to a shared buffer. - A "flush" goroutine calls sync.Cond.Wait() to wait for an incoming signal that data has been appended - Each goroutine that appends to the buffer calls Signal() after the write, to try to wake up the "flush" goroutine I *expect* that the flush goroutine will wake up after each call to Signal(), check whether the batch is ready to be flushed, and if not go back to sleep. What I see instead is that lots of other goroutines are taking out the lock before the flush goroutine can get to it, and as a result we're dropping data. I didn't expect that to happen based on my reading of the docs for sync.Cond, which (to me) indicate that Signal() will prioritize a goroutine that calls Wait() (instead of any other goroutines that are waiting on sync.Cond.L). Instead it looks like it's just unlocking any goroutine? Maybe this is because the thread that is calling Signal() holds the lock? Thanks for your help, Kevin -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com.