Are your running two copies of BackgrounDRb server on the same machine? I see, two server instances, in your top output.
On Fri, 2008-10-10 at 14:12 +0200, Jack Nutting wrote: > Hi all, > > I've been having trouble for a long time with backgroundrb processes > that suddenly vanish without a trace. What happens is that at some > point I discover that all the backgroundrb processes are suddenly > gone. Nothing special is seen in any of the log files. This has > happened intermittently for a long time, and I was hoping that > upgrading to 1.0.4 would somehow help me out, but I seem to encounter > the same problem. > > It happens infrequently, sometimes two-three times a week, sometimes > not at all for several weeks. Yesterday it actually happened twice in > ten minutes during a period when the server was heavily loaded, but > that's unusual. Usually when it happens the server is not under a > heavy load. > > Yesterday when it happened, I had the fortune of having a "top" log > running in a terminal window, so I'm able to present some more data. > top was displaying all threads, so most of the processes show up twice > or more. > > I have 5 background workers running, each apparently has 2 threads, > plus log_worker with 1 thread and script/backgroundrb with 2 threads. > My architecture is set up so that only "master" is started > automatically when backgroundrb starts up, and it in turn starts the > rest. > > I'm pasting in data for all the backgroundrb processes, sorry for the > terrible formatting but I can't really think of a better way to > present this all. > > Here's what it normally looks like while everything is up and running. > This is the last "normal" state I found before it starting going > haywire: > > top - 15:11:13 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 17508 deploy 15 0 49300 35m 2688 S 11.8 1.7 7:54.65 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 17504 deploy 15 0 49648 35m 2688 S 8.2 1.7 8:01.64 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 14141 deploy 15 0 20796 17m 1612 S 0.3 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.3 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17523 deploy 17 0 132m 115m 3316 R 0.3 5.6 6:43.89 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ > 14102 deploy 17 0 48320 31m 1364 R 0.0 1.5 3:08.97 ruby > /home/deploy/mbargo/script/backgroundrb start > 14144 deploy 15 0 48320 31m 1364 S 0.0 1.5 0:45.35 ruby > /home/deploy/mbargo/script/backgroundrb start > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17486 deploy 15 0 59500 41m 3500 S 0.0 2.0 11:45.15 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 24042 deploy 15 0 49300 35m 2688 S 0.0 1.7 0:43.58 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 24053 deploy 15 0 132m 115m 3316 S 0.0 5.6 0:43.70 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ > > Next snapshot, 3 seconds later. script/backgroundrb is gone, and each > of my workers (except for master) is down to 1 thread. > > top - 15:11:16 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 17504 deploy 15 0 49648 35m 2688 S 12.6 1.7 8:02.02 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 17486 deploy 17 0 59500 41m 3500 R 0.3 2.0 11:45.16 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > > Next, 3 seconds after that, all I have left is master (still 2 > threads) and log_worker: > > top - 15:11:19 up 5 days, 5:05, 3 users, load average: 2.85, 3.03, 3.01 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > > At the next snapshot, all backgroundrb processes are gone. > > This is running on Ubuntu 7.10, backgroundrb 1.0.4. I'm nowhere near > maxing out system memory, and there are no memory or other limits set > on user processes as far as I can tell. If anyone has any ideas about > what might cause this, or how to dig deeper, please let me know! I'm > nearly at my wits' end. > _______________________________________________ Backgroundrb-devel mailing list [email protected] http://rubyforge.org/mailman/listinfo/backgroundrb-devel
