Re: Restart TG apps for high mem-usage

2007-11-28 Thread Toshio Kuratomi

Luke Macken wrote:

On Sun, Nov 25, 2007 at 01:00:53PM -0800, Toshio Kuratomi wrote:
Here's a short script to test our TG apps run via supervisor for excessive 
memory usage and restart them if necessary.  We could run this via cron in 
alternate hours on each app server.  Does this seem like a good or bad idea 
to people?


Probably not a bad idea; I think koji does something similar with
apache.  However, I don't think we need this for bodhi, at least for the
moment.  The only time bodhi's memory usage jumps is when it's pushing
updates -- so if we were to use this script for bodhi, it would have to check
if it is currently running mash.  But for now, I'm not sure that it is necessary
seeing as how most of the time puppetd eats more memory than bodhi.

Sounds good.  I've taken both Bodhi and transifex out of the script for 
now as neither one is load balanced (the idea being that the apps which 
are load balanced should continue to serve requests off the other 
instance while one is restarting).


I've also changed it to take a different memory limit for each app.  It 
currently has some generous guesses as to what the memory limit should 
be.  I'm running a cron that logs the rss of the apps on app3&4 and will 
refine the limits after we have more data.


-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-28 Thread Luke Macken
On Mon, Nov 26, 2007 at 09:59:44AM -0800, Toshio Kuratomi wrote:
> Matt Domsch wrote:
>> On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote:
 +1, but does it make sure all transactions are finished?  I know smolt
 does not have good transaction protection.  If a transaction fails
 halfway through, we might have a mess.

>>> Not if the app doesn't.  From a brief test, TG apps do not do this.
>>
>> MirrorManager doesn't use transactions, I never figured out how to get
>> them to work right.  Advice welcome.
>>
> By not being able to get transactions working, do you mean explicit 
> transactions or implicit transactions?  I see that mirrormanager, bodhi,  
> and noc (not running currently) are using a dburi that disables implicit 
> transactions::
>   mirrormanager-prod.cfg.erb:
> sqlobject.dburi="notrans_postgres://mirroradmin:
> <%= mirrorPassword %>@db2.fedora.phx.redhat.com/mirrormanager"
>
> If that was changed to::
>   sqlobject.dburi="postgres://mirroradmin:[...]
>
> TurboGears would at least attempt to use an implicit transaction per http 
> request which should protect the database from shutting down the 
> application in the middle of processing a multi-table update.  I don't know 
> if that's the problem you're referring to, though.

Removing the notrans_postgres:// from bodhi's sqlobject.dburi causes
problems.  Modifications don't seem to go through; I'm not sure if they
hit the DB or not.  I remember encountering this issue early on in bodhi
development, and it was mitigated by calling hub.sync() all over the
place.  I have since removed them, and use notrans_postgres, which has
been working fine since day 1 of our production instance.  I'm not a
db guru, so I'm not sure which is better or worse.  I'll have to 
investigate this further.

luke

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-28 Thread Luke Macken
On Sun, Nov 25, 2007 at 01:00:53PM -0800, Toshio Kuratomi wrote:
> Here's a short script to test our TG apps run via supervisor for excessive 
> memory usage and restart them if necessary.  We could run this via cron in 
> alternate hours on each app server.  Does this seem like a good or bad idea 
> to people?

Probably not a bad idea; I think koji does something similar with
apache.  However, I don't think we need this for bodhi, at least for the
moment.  The only time bodhi's memory usage jumps is when it's pushing
updates -- so if we were to use this script for bodhi, it would have to check
if it is currently running mash.  But for now, I'm not sure that it is necessary
seeing as how most of the time puppetd eats more memory than bodhi.

luke

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Toshio Kuratomi

Mike McGrath wrote:

Toshio Kuratomi wrote:

Mike McGrath wrote:

Bill Nottingham wrote:

Toshio Kuratomi ([EMAIL PROTECTED]) said:
Here's a short script to test our TG apps run via supervisor for 
excessive memory usage and restart them if necessary.  We could run 
this via cron in alternate hours on each app server.  Does this 
seem like a good or bad idea to people?



It's a good idea if it's needed, but it's a bad idea that it is 
needed. What's

wrong with TG that it leads to this situation?
  


I was wondering this myself, I know smolt recently had some major 
changes to keep memory usage down.  Which TG apps are having this 
issue and how often?  I know MM uses a lot of memory but, AFAIK, it 
was determined that there's not much of a leak if there is one and 
that all of that memory is actually used.


Looks like smolt was upgraded just before Thanksgiving so it could be 
that we've plugged the leaks we had to deal with that inspired me to 
write this.  Would it be a good idea to have this in place anyways? 
With it periodically checking, we would find out that we had problems 
when cron emails us a notice that the script had to restart a process. 
Without it, we'll be notified when nagios or a user tells us they're 
getting timeouts.


I think its a good idea if for no other reason then allows us to more 
actively monitor this stuff, we'll get notified when the app restarts.  
+1 from me with the intention that, over time, we get fewer and fewer 
restarts.



Cool.  I'll check it in and set up a cron job.

One further piece of information since I have output from testing this 
on app3 yesterday:


AppNameUptime  RSS 11/25   RSS 11/26
mirrormanager  2d4h714336  962268
packagedb  --- restarted 13h ago --
smolt  5d3h299556  299556
transifex  5d3h42744   42768

-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Mike McGrath

Toshio Kuratomi wrote:

Mike McGrath wrote:

Bill Nottingham wrote:
Toshio Kuratomi ([EMAIL PROTECTED]) said: 
Here's a short script to test our TG apps run via supervisor for 
excessive memory usage and restart them if necessary.  We could run 
this via cron in alternate hours on each app server.  Does this 
seem like a good or bad idea to people?



It's a good idea if it's needed, but it's a bad idea that it is 
needed. What's

wrong with TG that it leads to this situation?
  


I was wondering this myself, I know smolt recently had some major 
changes to keep memory usage down.  Which TG apps are having this 
issue and how often?  I know MM uses a lot of memory but, AFAIK, it 
was determined that there's not much of a leak if there is one and 
that all of that memory is actually used.


Looks like smolt was upgraded just before Thanksgiving so it could be 
that we've plugged the leaks we had to deal with that inspired me to 
write this.  Would it be a good idea to have this in place anyways? 
With it periodically checking, we would find out that we had problems 
when cron emails us a notice that the script had to restart a process. 
Without it, we'll be notified when nagios or a user tells us they're 
getting timeouts.


I think its a good idea if for no other reason then allows us to more 
actively monitor this stuff, we'll get notified when the app restarts.  
+1 from me with the intention that, over time, we get fewer and fewer 
restarts.


   -Mike

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Toshio Kuratomi

Mike McGrath wrote:

Bill Nottingham wrote:
Toshio Kuratomi ([EMAIL PROTECTED]) said:  
Here's a short script to test our TG apps run via supervisor for 
excessive memory usage and restart them if necessary.  We could run 
this via cron in alternate hours on each app server.  Does this seem 
like a good or bad idea to people?



It's a good idea if it's needed, but it's a bad idea that it is 
needed. What's

wrong with TG that it leads to this situation?
  


I was wondering this myself, I know smolt recently had some major 
changes to keep memory usage down.  Which TG apps are having this issue 
and how often?  I know MM uses a lot of memory but, AFAIK, it was 
determined that there's not much of a leak if there is one and that all 
of that memory is actually used.


Looks like smolt was upgraded just before Thanksgiving so it could be 
that we've plugged the leaks we had to deal with that inspired me to 
write this.  Would it be a good idea to have this in place anyways? 
With it periodically checking, we would find out that we had problems 
when cron emails us a notice that the script had to restart a process. 
Without it, we'll be notified when nagios or a user tells us they're 
getting timeouts.


I noticed that mirrormanager is currently at 761MB of RSS.  If that's 
steady-state for mm we'd want to bump the value the script checks for a 
bit higher before deploying it or set different values per app.


-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Toshio Kuratomi

Bill Nottingham wrote:
Toshio Kuratomi ([EMAIL PROTECTED]) said: 
Here's a short script to test our TG apps run via supervisor for excessive 
memory usage and restart them if necessary.  We could run this via cron in 
alternate hours on each app server.  Does this seem like a good or bad idea 
to people?


It's a good idea if it's needed, but it's a bad idea that it is needed. What's
wrong with TG that it leads to this situation?

We have small memory leaks with all our TG apps.  This has only been a 
problem for mirrormanager in the past and smolt this past month.  At 
first I thought that the leaks were directly related to how many 
requests were served (mirrormanager had troubles when it was serving 
every request for a mirror and smolt had trouble when it was serving the 
 updating clients at the beginning of this month.)


I thought that it was caused purely by the number of requests served, 
however, the packagedb has been serving a large number of requests 
lately and hasn't climbed at nearly the same rate.  So something in the 
design of smolt and mirrormanager is leaking beyond the baseline that 
we're seeing with all the TG apps.


-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Mike McGrath

Bill Nottingham wrote:
Toshio Kuratomi ([EMAIL PROTECTED]) said: 
  
Here's a short script to test our TG apps run via supervisor for excessive 
memory usage and restart them if necessary.  We could run this via cron in 
alternate hours on each app server.  Does this seem like a good or bad idea 
to people?



It's a good idea if it's needed, but it's a bad idea that it is needed. What's
wrong with TG that it leads to this situation?
  


I was wondering this myself, I know smolt recently had some major 
changes to keep memory usage down.  Which TG apps are having this issue 
and how often?  I know MM uses a lot of memory but, AFAIK, it was 
determined that there's not much of a leak if there is one and that all 
of that memory is actually used.


   -Mike

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Bill Nottingham
Toshio Kuratomi ([EMAIL PROTECTED]) said: 
> Here's a short script to test our TG apps run via supervisor for excessive 
> memory usage and restart them if necessary.  We could run this via cron in 
> alternate hours on each app server.  Does this seem like a good or bad idea 
> to people?

It's a good idea if it's needed, but it's a bad idea that it is needed. What's
wrong with TG that it leads to this situation?

Bill

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Toshio Kuratomi

Matt Domsch wrote:

On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote:

+1, but does it make sure all transactions are finished?  I know smolt
does not have good transaction protection.  If a transaction fails
halfway through, we might have a mess.


Not if the app doesn't.  From a brief test, TG apps do not do this.


MirrorManager doesn't use transactions, I never figured out how to get
them to work right.  Advice welcome.

By not being able to get transactions working, do you mean explicit 
transactions or implicit transactions?  I see that mirrormanager, bodhi, 
 and noc (not running currently) are using a dburi that disables 
implicit transactions::

  mirrormanager-prod.cfg.erb:
sqlobject.dburi="notrans_postgres://mirroradmin:
<%= mirrorPassword %>@db2.fedora.phx.redhat.com/mirrormanager"

If that was changed to::
  sqlobject.dburi="postgres://mirroradmin:[...]

TurboGears would at least attempt to use an implicit transaction per 
http request which should protect the database from shutting down the 
application in the middle of processing a multi-table update.  I don't 
know if that's the problem you're referring to, though.


-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-26 Thread Matt Domsch
On Sun, Nov 25, 2007 at 09:08:30PM -0800, Toshio Kuratomi wrote:
> >+1, but does it make sure all transactions are finished?  I know smolt
> >does not have good transaction protection.  If a transaction fails
> >halfway through, we might have a mess.
> >
> Not if the app doesn't.  From a brief test, TG apps do not do this.

MirrorManager doesn't use transactions, I never figured out how to get
them to work right.  Advice welcome.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-25 Thread Yaakov Nemoy
On Nov 26, 2007 12:08 AM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote:
> Yaakov Nemoy wrote:
> > On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote:
> Not if the app doesn't.  From a brief test, TG apps do not do this.
>
> The script is asking supervisor to shutdown the application.  supervisor
> sends a TERM to the TG app (we can configure it to send something other
> than TERM if we want but I don't see any documentation that leads me to
> believe it will be different with a HUP or QUIT).  At that point it
> looks like a TG app will immediately shutdown and rollback any current
> transactions.

It's got my vote then

> smolt is on shaky ground if it's not using transactions correctly... At
> the beginning of the month when smolt was getting hit hard we did pretty
> much this same thing except manually instead of via a script when we
> noticed that smolt was giving timeouts and taking up 1G+ of RAM.  I
> think the current smolt code is using SQLAlchemy, correct?  It's pretty
> easy to use transactions so that you don't leave the db in an
> inconsistent state with that configuration.  Using the session's
> implicit transaction flushed just before the return should do the safe
> thing.  You can look through the code later and find additional places
> where you can safely flush the transaction if there's a need.

We do use transactions where we can, but since most of the code is not
tested at all, let alone stress tested, i can't vouch for it doing The
Right Thing.

(Winter break is only a few weeks away)

-Yaakov

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-25 Thread Toshio Kuratomi

Yaakov Nemoy wrote:

On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote:

Here's a short script to test our TG apps run via supervisor for
excessive memory usage and restart them if necessary.  We could run this
via cron in alternate hours on each app server.  Does this seem like a
good or bad idea to people?

-Tosjio


+1, but does it make sure all transactions are finished?  I know smolt
does not have good transaction protection.  If a transaction fails
halfway through, we might have a mess.


Not if the app doesn't.  From a brief test, TG apps do not do this.

The script is asking supervisor to shutdown the application.  supervisor 
sends a TERM to the TG app (we can configure it to send something other 
than TERM if we want but I don't see any documentation that leads me to 
believe it will be different with a HUP or QUIT).  At that point it 
looks like a TG app will immediately shutdown and rollback any current 
transactions.


smolt is on shaky ground if it's not using transactions correctly... At 
the beginning of the month when smolt was getting hit hard we did pretty 
much this same thing except manually instead of via a script when we 
noticed that smolt was giving timeouts and taking up 1G+ of RAM.  I 
think the current smolt code is using SQLAlchemy, correct?  It's pretty 
easy to use transactions so that you don't leave the db in an 
inconsistent state with that configuration.  Using the session's 
implicit transaction flushed just before the return should do the safe 
thing.  You can look through the code later and find additional places 
where you can safely flush the transaction if there's a need.


-Toshio

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-25 Thread Yaakov Nemoy
On Nov 25, 2007 4:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote:
> Here's a short script to test our TG apps run via supervisor for
> excessive memory usage and restart them if necessary.  We could run this
> via cron in alternate hours on each app server.  Does this seem like a
> good or bad idea to people?
>
> -Tosjio

+1, but does it make sure all transactions are finished?  I know smolt
does not have good transaction protection.  If a transaction fails
halfway through, we might have a mess.

-Yaakov

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Re: Restart TG apps for high mem-usage

2007-11-25 Thread Paulo Santos
sounds sane to me

On Nov 25, 2007 9:00 PM, Toshio Kuratomi <[EMAIL PROTECTED]> wrote:
> Here's a short script to test our TG apps run via supervisor for
> excessive memory usage and restart them if necessary.  We could run this
> via cron in alternate hours on each app server.  Does this seem like a
> good or bad idea to people?
>
> -Tosjio
>
> ___
> Fedora-infrastructure-list mailing list
> Fedora-infrastructure-list@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
>
>

___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list


Restart TG apps for high mem-usage

2007-11-25 Thread Toshio Kuratomi
Here's a short script to test our TG apps run via supervisor for 
excessive memory usage and restart them if necessary.  We could run this 
via cron in alternate hours on each app server.  Does this seem like a 
good or bad idea to people?


-Tosjio


restart-memhogs.sh
Description: application/shellscript
___
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list