Hi Rob,
I've not been able to reproduce this (though I found and fixed an
unrelated bug in the process of trying to). Any chance you could use
gdb to attach to mongre2 next time this happens, set a breakpoint on the
line where the error message happens and print a backtrace?
If you're not familiar with gdb from the shell:
sudo gdb attach <mongrel2 pid>
That will attach gdb to mongrel2
b src/task/fd.c:217
This will set the breakpoint, which ought to be hit immediately
backtrace
This will print the backtrace
-Jason
On 20:36 Fri 22 Jun , Rob LaRubbio wrote:
> I'm not sure if this is relevant to the issue but I figure I'd throw it
> out there in case it is.
>
> I added a new server to our production env. running a build of mongrel
> with my fix for filters getting the CLOSE transition. My filter
> increments a counter on the HANDLER event and decrements it on the CLOSE
> event. I then send that count to statsd. Looking at my stats I can see
> that CLOSE is happening much more frequently than HANDLER so it seems
> the same connection is getting closed multiple times.
>
> -Rob
>
> On 6/22/12 1:40 PM, Rob LaRubbio wrote:
> > Thanks for looking into this.
> >
> > We aren't using websockets or proxies, just mongrel2 and Tir. We have
> > 4 mongrel2 servers behind a load balance each has 300 handlers. The
> > handlers are not shared across servers (I have a pull request into Tir
> > to make it easier to run Tir on a server other than mongrel2)
> >
> > ==== mongrel2.conf =====
> > houston = Filter(
> > name="/opt/mongrel2-1.8-dev/lib/mongrel2/filters/houston.so",
> > settings = {
> > <removed>
> > }
> > )
> >
> > apollo = Handler(send_spec='tcp://127.0.0.1:9999',
> > send_ident='38f857b8-cbaa-4b58-9271-0d36c27813c4',
> > recv_spec='tcp://127.0.0.1:9998', recv_ident='',
> > protocol='tnetstring')
> >
> > static = Dir(base='static/',
> > index_file='index.html',
> > default_ctype='text/plain')
> >
> > main = Server(
> > uuid="505417b8-1de4-454f-98b6-07eb9225cca1",
> > access_log="/logs/access.log",
> > error_log="/logs/error.log",
> > chroot="/opt/mongrel2-1.8-dev",
> > default_host="(.+)",
> > name="main",
> > pid_file="/run/mongrel2.pid",
> > port=6767,
> > hosts = [
> > Host(name="(.+)",
> > routes={ '/(.*/.*)': apollo,
> > '/([^/]*)$': static })
> > ],
> > filters = [
> > houston
> > ]
> > )
> >
> > settings = {
> > "limits.content_length": 20480000
> > }
> >
> > On 6/22/12 1:11 PM, Tordek wrote:
> >> On 22/06/12 13:12, Rob LaRubbio wrote:
> >>> Is the dev branch ready for a release? We're running it production
> >>> and at least three times a week it starts spinning and writing this
> >>> to the logs in an endless loop:
> >>>
> >>> Fri, 22 Jun 2012 16:04:50 GMT [ERROR] (src/task/fd.c:217: errno:
> >>> None) Attempt to wait on a dead socket/fd: (nil) or -1
> >>>
> >>> The server fills up a 500G disk in about 11 hours and we need to
> >>> kill the server to get it handling requests again.
> >> Jason and I are looking into this; could you show us your
> >> mongrel2.conf? Are you using websockets or proxies?
> >>
> >>> -Rob
> >
> >
>
>
>