I don't know if this contributes much to the conversation, but I wonder if some of these occasional deadlocks are related to this:
http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/11756? 11593-11836 Corey On Jul 31, 2007, at 5:15 PM, Zed A. Shaw wrote: > On Sun, 29 Jul 2007 22:57:29 +0100 > "Olly Lylo" <[EMAIL PROTECTED]> wrote: > >> Hi >> I posted this to the Ruby on Rails Talk group but I thought I'd >> post it here >> too as it's probably a more appropriate group. Hope this is ok. > > > Alright, I've got a few minutes to write this down since it seems > to be biting a few people. > > Here's the rules for resolving a "RAILS in Mongrel dies" problem > (notice the RAILS part?): > > 1) 1000's of sites run mongrel without this problem so look to what > you have installed first. > 2) Make sure that every single bit of software you are running is > the most recent version and everything is installed and used > correctly. Common culprits are: > a) MySQL -- install the gem manually and make sure that the most > recent is the only one. DO NOT USE THE RAILS DEFAULT. > b) Memcached -- Use the very latest from Eric Hodel's project and > do NOT put any keys in that have spaces or null (\0) chars. Yes, > your keys cannot have spaces. YES YOUR KEYS CANNOT HAVE SPACES. > c) net::http to some web site. This can cycle forever. > 3) Next, once you've made sure all the above is isolated then you > can proceed to stage 2. > > STAGE 2 > > 1) You must let your application run in the best configuration and > be ready to pounce on it the second a mongrel dies. > 2) Log in to your server and find out the PID file of the mongrel > that isn't responding. You do this by hitting the process on it's > actual port (like 8000, 8001, 8002, NOT apache/nginx's port of 80). > 3) Once you know what port it is, then use: sudo lsof -i -P | grep > PORT to find what process PID is on that port. > 4) Next, attach to this process with: sudo strace -p PID. What > you'll see in a healthy mongrel is lots of variation. What you'll > see in a dead mongrel is probably either a bunch of calls to select/ > poll for the exact same filedescriptors, or nothing. > 5) If you see it doing a select on the file descriptors, then you > need to find out what is on that FD that is causing it to wait. > Again, use: lsof | grep FILEDESCR > a) This will potentially tell you where it's connected, etc. > Once you do this you know what is being read/written and can go > find out what in your rails app is using it, and apply even more > debugging tools. > b) You can also just force it closed externally. There's a few > mentions of this but I don't remember the exact procedure. > c) The reason this typically happens is that you have a socket > that ruby has written to and not read from yet, but that socket is > closed. Another cause is that you are simply waiting for data > (which is what happens with memcached and putting a space in your > keys). > 6) If your mongrel is completely dead then move on to stage 3 and > also you're kind of fucked. > > STAGE 3 > > 1) Get your good shoes on because you're now in GDB land and C. > 2) http://eigenclass.org/hiki.rb?ruby+live+process+introspection > Explains attaching to your mongrel process (you've identified > above) using GDB, loading some GDB scripts, and then stopping it, > inspecting it, and forcing an exception. > 3) Do those thigns. Attach. Pause it. Inspect variables. Get > stack traces. Force an exception. See where it's stopped. Look > for where OUTSIDE of mongrel it's coming from. > 4) That's all I can say right now. Let's hope other people can > expand on this. > > -- > Zed A. Shaw > - Hate: http://savingtheinternetwithhate.com/ > - Good: http://www.zedshaw.com/ > - Evil: http://yearofevil.com/ > _______________________________________________ > Mongrel-users mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/mongrel-users _______________________________________________ Mongrel-users mailing list [email protected] http://rubyforge.org/mailman/listinfo/mongrel-users
