Re: Restart TG apps for high mem-usage
Luke Macken wrote: On Sun, Nov 25, 2007 at 01:00:53PM -0800, Toshio Kuratomi wrote: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? Probably not a bad idea; I think koji does something similar with apache. However, I don't think we need this for bodhi, at least for the moment. The only time bodhi's memory usage jumps is when it's pushing updates -- so if we were to use this script for bodhi, it would have to check if it is currently running mash. But for now, I'm not sure that it is necessary seeing as how most of the time puppetd eats more memory than bodhi. Sounds good. I've taken both Bodhi and transifex out of the script for now as neither one is load balanced (the idea being that the apps which are load balanced should continue to serve requests off the other instance while one is restarting). I've also changed it to take a different memory limit for each app. It currently has some generous guesses as to what the memory limit should be. I'm running a cron that logs the rss of the apps on app3&4 and will refine the limits after we have more data. -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
On Mon, Nov 26, 2007 at 09:59:44AM -0800, Toshio Kuratomi wrote: > Matt Domsch wrote: >> On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote: +1, but does it make sure all transactions are finished? I know smolt does not have good transaction protection. If a transaction fails halfway through, we might have a mess. >>> Not if the app doesn't. From a brief test, TG apps do not do this. >> >> MirrorManager doesn't use transactions, I never figured out how to get >> them to work right. Advice welcome. >> > By not being able to get transactions working, do you mean explicit > transactions or implicit transactions? I see that mirrormanager, bodhi, > and noc (not running currently) are using a dburi that disables implicit > transactions:: > mirrormanager-prod.cfg.erb: > sqlobject.dburi="notrans_postgres://mirroradmin: > <%= mirrorPassword %>@db2.fedora.phx.redhat.com/mirrormanager" > > If that was changed to:: > sqlobject.dburi="postgres://mirroradmin:[...] > > TurboGears would at least attempt to use an implicit transaction per http > request which should protect the database from shutting down the > application in the middle of processing a multi-table update. I don't know > if that's the problem you're referring to, though. Removing the notrans_postgres:// from bodhi's sqlobject.dburi causes problems. Modifications don't seem to go through; I'm not sure if they hit the DB or not. I remember encountering this issue early on in bodhi development, and it was mitigated by calling hub.sync() all over the place. I have since removed them, and use notrans_postgres, which has been working fine since day 1 of our production instance. I'm not a db guru, so I'm not sure which is better or worse. I'll have to investigate this further. luke ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
On Sun, Nov 25, 2007 at 01:00:53PM -0800, Toshio Kuratomi wrote: > Here's a short script to test our TG apps run via supervisor for excessive > memory usage and restart them if necessary. We could run this via cron in > alternate hours on each app server. Does this seem like a good or bad idea > to people? Probably not a bad idea; I think koji does something similar with apache. However, I don't think we need this for bodhi, at least for the moment. The only time bodhi's memory usage jumps is when it's pushing updates -- so if we were to use this script for bodhi, it would have to check if it is currently running mash. But for now, I'm not sure that it is necessary seeing as how most of the time puppetd eats more memory than bodhi. luke ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Mike McGrath wrote: Toshio Kuratomi wrote: Mike McGrath wrote: Bill Nottingham wrote: Toshio Kuratomi ([EMAIL PROTECTED]) said: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? I was wondering this myself, I know smolt recently had some major changes to keep memory usage down. Which TG apps are having this issue and how often? I know MM uses a lot of memory but, AFAIK, it was determined that there's not much of a leak if there is one and that all of that memory is actually used. Looks like smolt was upgraded just before Thanksgiving so it could be that we've plugged the leaks we had to deal with that inspired me to write this. Would it be a good idea to have this in place anyways? With it periodically checking, we would find out that we had problems when cron emails us a notice that the script had to restart a process. Without it, we'll be notified when nagios or a user tells us they're getting timeouts. I think its a good idea if for no other reason then allows us to more actively monitor this stuff, we'll get notified when the app restarts. +1 from me with the intention that, over time, we get fewer and fewer restarts. Cool. I'll check it in and set up a cron job. One further piece of information since I have output from testing this on app3 yesterday: AppNameUptime RSS 11/25 RSS 11/26 mirrormanager 2d4h714336 962268 packagedb --- restarted 13h ago -- smolt 5d3h299556 299556 transifex 5d3h42744 42768 -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Toshio Kuratomi wrote: Mike McGrath wrote: Bill Nottingham wrote: Toshio Kuratomi ([EMAIL PROTECTED]) said: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? I was wondering this myself, I know smolt recently had some major changes to keep memory usage down. Which TG apps are having this issue and how often? I know MM uses a lot of memory but, AFAIK, it was determined that there's not much of a leak if there is one and that all of that memory is actually used. Looks like smolt was upgraded just before Thanksgiving so it could be that we've plugged the leaks we had to deal with that inspired me to write this. Would it be a good idea to have this in place anyways? With it periodically checking, we would find out that we had problems when cron emails us a notice that the script had to restart a process. Without it, we'll be notified when nagios or a user tells us they're getting timeouts. I think its a good idea if for no other reason then allows us to more actively monitor this stuff, we'll get notified when the app restarts. +1 from me with the intention that, over time, we get fewer and fewer restarts. -Mike ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Mike McGrath wrote: Bill Nottingham wrote: Toshio Kuratomi ([EMAIL PROTECTED]) said: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? I was wondering this myself, I know smolt recently had some major changes to keep memory usage down. Which TG apps are having this issue and how often? I know MM uses a lot of memory but, AFAIK, it was determined that there's not much of a leak if there is one and that all of that memory is actually used. Looks like smolt was upgraded just before Thanksgiving so it could be that we've plugged the leaks we had to deal with that inspired me to write this. Would it be a good idea to have this in place anyways? With it periodically checking, we would find out that we had problems when cron emails us a notice that the script had to restart a process. Without it, we'll be notified when nagios or a user tells us they're getting timeouts. I noticed that mirrormanager is currently at 761MB of RSS. If that's steady-state for mm we'd want to bump the value the script checks for a bit higher before deploying it or set different values per app. -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Bill Nottingham wrote: Toshio Kuratomi ([EMAIL PROTECTED]) said: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? We have small memory leaks with all our TG apps. This has only been a problem for mirrormanager in the past and smolt this past month. At first I thought that the leaks were directly related to how many requests were served (mirrormanager had troubles when it was serving every request for a mirror and smolt had trouble when it was serving the updating clients at the beginning of this month.) I thought that it was caused purely by the number of requests served, however, the packagedb has been serving a large number of requests lately and hasn't climbed at nearly the same rate. So something in the design of smolt and mirrormanager is leaking beyond the baseline that we're seeing with all the TG apps. -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Bill Nottingham wrote: Toshio Kuratomi ([EMAIL PROTECTED]) said: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? I was wondering this myself, I know smolt recently had some major changes to keep memory usage down. Which TG apps are having this issue and how often? I know MM uses a lot of memory but, AFAIK, it was determined that there's not much of a leak if there is one and that all of that memory is actually used. -Mike ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Toshio Kuratomi ([EMAIL PROTECTED]) said: > Here's a short script to test our TG apps run via supervisor for excessive > memory usage and restart them if necessary. We could run this via cron in > alternate hours on each app server. Does this seem like a good or bad idea > to people? It's a good idea if it's needed, but it's a bad idea that it is needed. What's wrong with TG that it leads to this situation? Bill ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Matt Domsch wrote: On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote: +1, but does it make sure all transactions are finished? I know smolt does not have good transaction protection. If a transaction fails halfway through, we might have a mess. Not if the app doesn't. From a brief test, TG apps do not do this. MirrorManager doesn't use transactions, I never figured out how to get them to work right. Advice welcome. By not being able to get transactions working, do you mean explicit transactions or implicit transactions? I see that mirrormanager, bodhi, and noc (not running currently) are using a dburi that disables implicit transactions:: mirrormanager-prod.cfg.erb: sqlobject.dburi="notrans_postgres://mirroradmin: <%= mirrorPassword %>@db2.fedora.phx.redhat.com/mirrormanager" If that was changed to:: sqlobject.dburi="postgres://mirroradmin:[...] TurboGears would at least attempt to use an implicit transaction per http request which should protect the database from shutting down the application in the middle of processing a multi-table update. I don't know if that's the problem you're referring to, though. -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote: > >+1, but does it make sure all transactions are finished? I know smolt > >does not have good transaction protection. If a transaction fails > >halfway through, we might have a mess. > > > Not if the app doesn't. From a brief test, TG apps do not do this. MirrorManager doesn't use transactions, I never figured out how to get them to work right. Advice welcome. -- Matt Domsch Linux Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
On Nov 26, 2007 12:08 AM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote: > Yaakov Nemoy wrote: > > On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote: > Not if the app doesn't. From a brief test, TG apps do not do this. > > The script is asking supervisor to shutdown the application. supervisor > sends a TERM to the TG app (we can configure it to send something other > than TERM if we want but I don't see any documentation that leads me to > believe it will be different with a HUP or QUIT). At that point it > looks like a TG app will immediately shutdown and rollback any current > transactions. It's got my vote then > smolt is on shaky ground if it's not using transactions correctly... At > the beginning of the month when smolt was getting hit hard we did pretty > much this same thing except manually instead of via a script when we > noticed that smolt was giving timeouts and taking up 1G+ of RAM. I > think the current smolt code is using SQLAlchemy, correct? It's pretty > easy to use transactions so that you don't leave the db in an > inconsistent state with that configuration. Using the session's > implicit transaction flushed just before the return should do the safe > thing. You can look through the code later and find additional places > where you can safely flush the transaction if there's a need. We do use transactions where we can, but since most of the code is not tested at all, let alone stress tested, i can't vouch for it doing The Right Thing. (Winter break is only a few weeks away) -Yaakov ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
Yaakov Nemoy wrote: On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote: Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? -Tosjio +1, but does it make sure all transactions are finished? I know smolt does not have good transaction protection. If a transaction fails halfway through, we might have a mess. Not if the app doesn't. From a brief test, TG apps do not do this. The script is asking supervisor to shutdown the application. supervisor sends a TERM to the TG app (we can configure it to send something other than TERM if we want but I don't see any documentation that leads me to believe it will be different with a HUP or QUIT). At that point it looks like a TG app will immediately shutdown and rollback any current transactions. smolt is on shaky ground if it's not using transactions correctly... At the beginning of the month when smolt was getting hit hard we did pretty much this same thing except manually instead of via a script when we noticed that smolt was giving timeouts and taking up 1G+ of RAM. I think the current smolt code is using SQLAlchemy, correct? It's pretty easy to use transactions so that you don't leave the db in an inconsistent state with that configuration. Using the session's implicit transaction flushed just before the return should do the safe thing. You can look through the code later and find additional places where you can safely flush the transaction if there's a need. -Toshio ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote: > Here's a short script to test our TG apps run via supervisor for > excessive memory usage and restart them if necessary. We could run this > via cron in alternate hours on each app server. Does this seem like a > good or bad idea to people? > > -Tosjio +1, but does it make sure all transactions are finished? I know smolt does not have good transaction protection. If a transaction fails halfway through, we might have a mess. -Yaakov ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Re: Restart TG apps for high mem-usage
sounds sane to me On Nov 25, 2007 9:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote: > Here's a short script to test our TG apps run via supervisor for > excessive memory usage and restart them if necessary. We could run this > via cron in alternate hours on each app server. Does this seem like a > good or bad idea to people? > > -Tosjio > > ___ > Fedora-infrastructure-list mailing list > Fedora-infrastructure-list@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list > > ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
Restart TG apps for high mem-usage
Here's a short script to test our TG apps run via supervisor for excessive memory usage and restart them if necessary. We could run this via cron in alternate hours on each app server. Does this seem like a good or bad idea to people? -Tosjio restart-memhogs.sh Description: application/shellscript ___ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list