On Tue, Mar 05, 2002 at 05:03:16PM -0500, Jeff Trawick wrote: > > > axe the entry on graceful restart problems with worker > > > > > > I was too stupid to read the code to determine that the accept mutex > > > failure log messages were harmless and not indicative of a real problem. > > > > > > I'll try to understand the conditions where I'm seeing connections > > > dropped. > > > > I suspect these are one and the same problem. > > What do you mean by "these"?
I was referring to the dropped connections and the errors on mutex acquire. > > When we get a failure > > while trying to acquire a mutex it probably means that the mutex was > > already destroyed. Is it possible that we also destroyed the fdqueue > > while there were connections waiting to be picked up by a worker? > > worker threads exit as soon as workers_may_exit is set... I don't see > any logic to make sure we don't lose any accepted connections (stuff > in the queue) > > so yes, it looks normal to destroy worker_queue without looking to see > if any accepted connections are in the queue > > Once the listener thread realizes that we're terminating and it will > no longer call accept, it needs some way to trigger an error on the > queue so that once the last connection is dequeued by a worker thread > subsequent pop operations fail in a way that worker treads know they > should exit. And instead of exiting as soon as workers_may_exit is > set, worker threads should exit once they get the magic queue-is-dead > error from a pop operation. I would agree that this is a limitation of the current fdqueue logic and that it is a bug. I had some code that did something similiar to what you were talking about, but it was related to a different worker implementtion variant. If I get some time I'll try to take a look. -aaron