I'll see if I can connect to it. I haven't been able to reproduce it in my development environment, but it happens often in production. I've written a script to detect it and kill mongrel2 I'll see if I can modify that script on one machine to also take the server out of the load balancer so I can get a stack trace.
This issue was occurring when I was running from the tip of the develop branch, I've recently moved over to a fork that includes my pull request that allows filters to handle the CLOSE event. Running that build I notice that the CLOSE event occurs more often than HANDLER event (for the same connection) and my filter was occasionally getting called for the HANDLER event with the same connection object (before CLOSE was called). Both cases are things I didn't expect to happen, I'm not sure if it's related. Anyway I'll try and get a stack trace to you. -Rob On Wed, Jun 27, 2012 at 5:43 PM, Jason Miller <[email protected]> wrote: > Hi Rob, > > I've not been able to reproduce this (though I found and fixed an > unrelated bug in the process of trying to). Any chance you could use > gdb to attach to mongre2 next time this happens, set a breakpoint on the > line where the error message happens and print a backtrace? > > If you're not familiar with gdb from the shell: > > sudo gdb attach <mongrel2 pid> > That will attach gdb to mongrel2 > b src/task/fd.c:217 > This will set the breakpoint, which ought to be hit immediately > backtrace > This will print the backtrace > > > -Jason > > On 20:36 Fri 22 Jun , Rob LaRubbio wrote: > > I'm not sure if this is relevant to the issue but I figure I'd throw it > > out there in case it is. > > > > I added a new server to our production env. running a build of mongrel > > with my fix for filters getting the CLOSE transition. My filter > > increments a counter on the HANDLER event and decrements it on the CLOSE > > event. I then send that count to statsd. Looking at my stats I can see > > that CLOSE is happening much more frequently than HANDLER so it seems > > the same connection is getting closed multiple times. > > > > -Rob > > > > On 6/22/12 1:40 PM, Rob LaRubbio wrote: > > > Thanks for looking into this. > > > > > > We aren't using websockets or proxies, just mongrel2 and Tir. We have > > > 4 mongrel2 servers behind a load balance each has 300 handlers. The > > > handlers are not shared across servers (I have a pull request into Tir > > > to make it easier to run Tir on a server other than mongrel2) > > > > > > ==== mongrel2.conf ===== > > > houston = Filter( > > > name="/opt/mongrel2-1.8-dev/lib/mongrel2/filters/houston.so", > > > settings = { > > > <removed> > > > } > > > ) > > > > > > apollo = Handler(send_spec='tcp://127.0.0.1:9999', > > > send_ident='38f857b8-cbaa-4b58-9271-0d36c27813c4', > > > recv_spec='tcp://127.0.0.1:9998', recv_ident='', > > > protocol='tnetstring') > > > > > > static = Dir(base='static/', > > > index_file='index.html', > > > default_ctype='text/plain') > > > > > > main = Server( > > > uuid="505417b8-1de4-454f-98b6-07eb9225cca1", > > > access_log="/logs/access.log", > > > error_log="/logs/error.log", > > > chroot="/opt/mongrel2-1.8-dev", > > > default_host="(.+)", > > > name="main", > > > pid_file="/run/mongrel2.pid", > > > port=6767, > > > hosts = [ > > > Host(name="(.+)", > > > routes={ '/(.*/.*)': apollo, > > > '/([^/]*)$': static }) > > > ], > > > filters = [ > > > houston > > > ] > > > ) > > > > > > settings = { > > > "limits.content_length": 20480000 > > > } > > > > > > On 6/22/12 1:11 PM, Tordek wrote: > > >> On 22/06/12 13:12, Rob LaRubbio wrote: > > >>> Is the dev branch ready for a release? We're running it production > > >>> and at least three times a week it starts spinning and writing this > > >>> to the logs in an endless loop: > > >>> > > >>> Fri, 22 Jun 2012 16:04:50 GMT [ERROR] (src/task/fd.c:217: errno: > > >>> None) Attempt to wait on a dead socket/fd: (nil) or -1 > > >>> > > >>> The server fills up a 500G disk in about 11 hours and we need to > > >>> kill the server to get it handling requests again. > > >> Jason and I are looking into this; could you show us your > > >> mongrel2.conf? Are you using websockets or proxies? > > >> > > >>> -Rob > > > > > > > > > > > > > >
