That's a very clear explanation, it's obvious what the problem is now. Thank you!
On Saturday, June 11, 2022 at 9:36:59 AM UTC-7 se...@liao.dev wrote: > sync.Cond does not affect goroutine scheduling priority, and Signal only > makes the waiting goroutine available to be scheduled, but not force it to > be. > After a Signal() (and Unlock()), every other waiting worker and the > flusher then contends (fairly) for the lock. > What you want appears a better fit for either channels (send everything to > the flusher) or just inlining the check+flush logic into writeEvent, > essentially a proper serialization of events > > See also: https://github.com/golang/go/issues/21165 > > On top of that, condition variables are fiendishly difficult to use: > they are prone to either missed or spurious signals [citation needed] > > - sean > > > On Sat, Jun 11, 2022 at 3:27 PM Kevin Burke <ke...@burke.dev> wrote: > >> Hi, >> Recently I inherited some code in production that was hitting an error >> case that I didn't think should be possible to hit. I reduced it down to >> this test case. To be clear, there are several different ways to improve >> the code here, but I'd like to understand why it's behaving the way it does >> first. >> >> You should be able to just do "go test ." here to reproduce the error: >> https://github.com/kevinburke/sync-cond-experiment >> >> What the code is doing: >> >> - Multiple different goroutines are taking a sync.Cond lock and then >> appending data to a shared buffer. >> - A "flush" goroutine calls sync.Cond.Wait() to wait for an incoming >> signal that data has been appended >> - Each goroutine that appends to the buffer calls Signal() after the >> write, to try to wake up the "flush" goroutine >> >> I *expect* that the flush goroutine will wake up after each call to >> Signal(), check whether the batch is ready to be flushed, and if not go >> back to sleep. >> >> What I see instead is that lots of other goroutines are taking out the >> lock before the flush goroutine can get to it, and as a result we're >> dropping data. >> >> I didn't expect that to happen based on my reading of the docs for >> sync.Cond, which (to me) indicate that Signal() will prioritize a goroutine >> that calls Wait() (instead of any other goroutines that are waiting on >> sync.Cond.L). Instead it looks like it's just unlocking any goroutine? >> Maybe this is because the thread that is calling Signal() holds the lock? >> >> Thanks for your help, >> Kevin >> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/6f11b3cd-4eb3-499c-a8d3-a3b88a332097n%40googlegroups.com.