Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-09 Thread Chmouel Boudjnah
On Tue, Sep 9, 2014 at 12:24 AM, Joe Gordon joe.gord...@gmail.com wrote:

 1) Should we explicitly set the number of workers that services use in
 devstack? Why have so many workers in a small all-in-one environment? What
 is the right balance here?



This is what we  do for Swift, without setting this up it would killed
devstack even before the tempest runs.

Chmouel
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-09 Thread Mike Bayer
yes.  guppy seems to have some nicer string formatting for this dump as well, 
but i was unable to figure out how to get this string format to write to a 
file, it seems like the tool is very geared towards interactive console use.   
We should pick a nice memory formatter we like, there’s a bunch of them, and 
then add it to our standard toolset.


On Sep 9, 2014, at 10:35 AM, Doug Hellmann d...@doughellmann.com wrote:

 
 On Sep 8, 2014, at 8:12 PM, Mike Bayer mba...@redhat.com wrote:
 
 Hi All - 
 
 Joe had me do some quick memory profiling on nova, just an FYI if anyone 
 wants to play with this technique, I place a little bit of memory profiling 
 code using Guppy into nova/api/__init__.py, or anywhere in your favorite app 
 that will definitely get imported when the thing first runs:
 
 from guppy import hpy
 import signal
 import datetime
 
 def handler(signum, frame):
 print guppy memory dump
 
 fname = /tmp/memory_%s.txt % 
 datetime.datetime.now().strftime(%Y%m%d_%H%M%S)
 prof = hpy().heap()
 with open(fname, 'w') as handle:
 prof.dump(handle)
 del prof
 
 signal.signal(signal.SIGUSR2, handler)
 
 This looks like something we could build into our standard service startup 
 code. Maybe in 
 http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/service.py
  for example?
 
 Doug
 
 
 
 
 Then, run nova-api, run some API calls, then you hit the nova-api process 
 with a SIGUSR2 signal, and it will dump a profile into /tmp/ like this:
 
 http://paste.openstack.org/show/108536/
 
 Now obviously everyone is like, oh boy memory lets go beat up SQLAlchemy 
 again…..which is fine I can take it.  In that particular profile, there’s a 
 bunch of SQLAlchemy stuff, but that is all structural to the classes that 
 are mapped in Nova API, e.g. 52 classes with a total of 656 attributes 
 mapped.   That stuff sets up once and doesn’t change.   If Nova used less 
 ORM,  e.g. didn’t map everything, that would be less.  But in that profile 
 there’s no “data” lying around.
 
 But even if you don’t have that many objects resident, your Python process 
 might still be using up a ton of memory.  The reason for this is that the 
 cPython interpreter has a model where it will grab all the memory it needs 
 to do something, a time consuming process by the way, but then it really 
 doesn’t ever release it  (see 
 http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
  for the “classic” answer on this, things may have improved/modernized in 
 2.7 but I think this is still the general idea).
 
 So in terms of SQLAlchemy, a good way to suck up a ton of memory all at once 
 that probably won’t get released is to do this:
 
 1. fetching a full ORM object with all of its data
 
 2. fetching lots of them all at once
 
 
 So to avoid doing that, the answer isn’t necessarily that simple.   The 
 quick wins to loading full objects are to …not load the whole thing!   E.g. 
 assuming we can get Openstack onto 0.9 in requirements.txt, we can start 
 using load_only():
 
 session.query(MyObject).options(load_only(“id”, “name”, “ip”))
 
 or with any version, just load those columns - we should be using this as 
 much as possible for any query that is row/time intensive and doesn’t need 
 full ORM behaviors (like relationships, persistence):
 
 session.query(MyObject.id, MyObject.name, MyObject.ip)
 
 Another quick win, if we *really* need an ORM object, not a row, and we have 
 to fetch a ton of them in one big result, is to fetch them using yield_per():
 
for obj in session.query(MyObject).yield_per(100):
 # work with obj and then make sure to lose all references to it
 
 yield_per() will dish out objects drawing from batches of the number you 
 give it.   But it has two huge caveats: one is that it isn’t compatible with 
 most forms of eager loading, except for many-to-one joined loads.  The other 
 is that the DBAPI, e.g. like the MySQL driver, does *not* stream the rows; 
 virtually all DBAPIs by default load a result set fully before you ever see 
 the first row.  psycopg2 is one of the only DBAPIs that even offers a 
 special mode to work around this (server side cursors).
 
 Which means its even *better* to paginate result sets, so that you only ask 
 the database for a chunk at a time, only storing at most a subset of objects 
 in memory at once.  Pagination itself is tricky, if you are using a naive 
 LIMIT/OFFSET approach, it takes awhile if you are working with a large 
 OFFSET.  It’s better to SELECT into windows of data, where you can specify a 
 start and end criteria (against an indexed column) for each window, like a 
 timestamp.
 
 Then of course, using Core only is another level of fastness/low memory.  
 Though querying for individual columns with ORM is not far off, and I’ve 
 also made some major improvements to that in 1.0 so that query(*cols) is 
 pretty competitive with straight Core (and Core is…well I’d say 

Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-08 Thread Mike Bayer
Hi All - 

Joe had me do some quick memory profiling on nova, just an FYI if anyone wants 
to play with this technique, I place a little bit of memory profiling code 
using Guppy into nova/api/__init__.py, or anywhere in your favorite app that 
will definitely get imported when the thing first runs:

from guppy import hpy
import signal
import datetime

def handler(signum, frame):
print guppy memory dump

fname = /tmp/memory_%s.txt % 
datetime.datetime.now().strftime(%Y%m%d_%H%M%S)
prof = hpy().heap()
with open(fname, 'w') as handle:
prof.dump(handle)
del prof

signal.signal(signal.SIGUSR2, handler)



Then, run nova-api, run some API calls, then you hit the nova-api process with 
a SIGUSR2 signal, and it will dump a profile into /tmp/ like this:

http://paste.openstack.org/show/108536/

Now obviously everyone is like, oh boy memory lets go beat up SQLAlchemy 
again…..which is fine I can take it.  In that particular profile, there’s a 
bunch of SQLAlchemy stuff, but that is all structural to the classes that are 
mapped in Nova API, e.g. 52 classes with a total of 656 attributes mapped.   
That stuff sets up once and doesn’t change.   If Nova used less ORM,  e.g. 
didn’t map everything, that would be less.  But in that profile there’s no 
“data” lying around.

But even if you don’t have that many objects resident, your Python process 
might still be using up a ton of memory.  The reason for this is that the 
cPython interpreter has a model where it will grab all the memory it needs to 
do something, a time consuming process by the way, but then it really doesn’t 
ever release it  (see 
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
 for the “classic” answer on this, things may have improved/modernized in 2.7 
but I think this is still the general idea).

So in terms of SQLAlchemy, a good way to suck up a ton of memory all at once 
that probably won’t get released is to do this:

1. fetching a full ORM object with all of its data

2. fetching lots of them all at once


So to avoid doing that, the answer isn’t necessarily that simple.   The quick 
wins to loading full objects are to …not load the whole thing!   E.g. assuming 
we can get Openstack onto 0.9 in requirements.txt, we can start using 
load_only():

session.query(MyObject).options(load_only(“id”, “name”, “ip”))

or with any version, just load those columns - we should be using this as much 
as possible for any query that is row/time intensive and doesn’t need full ORM 
behaviors (like relationships, persistence):

session.query(MyObject.id, MyObject.name, MyObject.ip)

Another quick win, if we *really* need an ORM object, not a row, and we have to 
fetch a ton of them in one big result, is to fetch them using yield_per():

   for obj in session.query(MyObject).yield_per(100):
# work with obj and then make sure to lose all references to it

yield_per() will dish out objects drawing from batches of the number you give 
it.   But it has two huge caveats: one is that it isn’t compatible with most 
forms of eager loading, except for many-to-one joined loads.  The other is that 
the DBAPI, e.g. like the MySQL driver, does *not* stream the rows; virtually 
all DBAPIs by default load a result set fully before you ever see the first 
row.  psycopg2 is one of the only DBAPIs that even offers a special mode to 
work around this (server side cursors).

Which means its even *better* to paginate result sets, so that you only ask the 
database for a chunk at a time, only storing at most a subset of objects in 
memory at once.  Pagination itself is tricky, if you are using a naive 
LIMIT/OFFSET approach, it takes awhile if you are working with a large OFFSET.  
It’s better to SELECT into windows of data, where you can specify a start and 
end criteria (against an indexed column) for each window, like a timestamp.

Then of course, using Core only is another level of fastness/low memory.  
Though querying for individual columns with ORM is not far off, and I’ve also 
made some major improvements to that in 1.0 so that query(*cols) is pretty 
competitive with straight Core (and Core is…well I’d say becoming visible in 
raw DBAPI’s rear view mirror, at least….).

What I’d suggest here is that we start to be mindful of memory/performance 
patterns and start to work out naive ORM use into more savvy patterns; being 
aware of what columns are needed, what rows, how many SQL queries we really 
need to emit, what the “worst case” number of rows will be for sections that 
really need to scale.  By far the hardest part is recognizing and 
reimplementing when something might have to deal with an arbitrarily large 
number of rows, which means organizing that code to deal with a “streaming” 
pattern where you never have all the rows in memory at once - on other projects 
I’ve had tasks that would normally take about a day, but in order to organize 
it to “scale”, took weeks - such as being able 

Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-08 Thread Clint Byrum
Excerpts from Joe Gordon's message of 2014-09-08 15:24:29 -0700:
 Hi All,
 
 We have recently started seeing assorted memory issues in the gate
 including the oom-killer [0] and libvirt throwing memory errors [1].
 Luckily we run ps and dstat on every devstack run so we have some insight
 into why we are running out of memory. Based on the output from job taken
 at random [2][3] a typical run consists of:
 
 * 68 openstack api processes alone
 * the following services are running 8 processes (number of CPUs on test
 nodes)
   * nova-api (we actually run 24 of these, 8 compute, 8 EC2, 8 metadata)
   * nova-conductor
   * cinder-api
   * glance-api
   * trove-api
   * glance-registry
   * trove-conductor
 * together nova-api, nova-conductor, cinder-api alone take over 45 %MEM
 (note: some of that is memory usage is counted multiple times as RSS
 includes shared libraries)
 * based on dstat numbers, it looks like we don't use that much memory
 before tempest runs, and after tempest runs we use a lot of memory.
 
 Based on this information I have two categories of questions:
 
 1) Should we explicitly set the number of workers that services use in
 devstack? Why have so many workers in a small all-in-one environment? What
 is the right balance here?

I'm kind of wondering why we aren't pushing everything to go the same
direction keystone did with apache. I may be crazy but apache gives us
all kinds of tools to tune around process forking that we'll have to
reinvent in our own daemon bits (like MaxRequestsPerChild to prevent
leaky or slow GC from eating all our memory over time).

Meanwhile, the idea on running api processes with ncpu is that we don't
want to block an API request if there is a CPU available to it. Of
course if we have enough cinder, nova, keystone, trove, etc. requests
all at one time that we do need to block, we defer to the CPU scheduler
of the box to do it, rather than queue things up at the event level.
This can lead to quite ugly CPU starvation issues, and that is a lot
easier to tune for if you have one tuning knob for apache + mod_wsgi
instead of nservices.

In production systems I'd hope that memory would be quite a bit more
available than on the bazillions of cloud instances that run tests. So,
while process-per-cpu-per-service is a large percentage of 8G, it is
a very small percentage of 24G+, which is a pretty normal amount of
memory to have on an all-in-one type of server that one might choose
as a baremetal controller. For VMs that are handling production loads,
It's a pretty easy trade-off to give them a little more RAM so they can
take advantage of all the CPU's as needed.

All this to say, since devstack is always expected to be run in a dev
context, and not production, I think it would make sense to dial it
back to 4 from ncpu.

 
 2) Should we be worried that some OpenStack services such as nova-api,
 nova-conductor and cinder-api take up so much memory? Does there memory
 usage keep growing over time, does anyone have any numbers to answer this?
 Why do these processes take up so much memory?

Yes I do think we should be worried that they grow quite a bit. I've
experienced this problem a few times in a few scripting languages, and
almost every time it turned out to be too much data being read from
the database or MQ. Moving to tighter messages, and tighter database
interaction, nearly always results in less wasted RAM.

I like the other suggestion to start graphing this. Since we have all
that dstat data, I wonder if we can just process that directly into
graphite.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev