[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
I hadn't described the details be there is not much to the work around and pretty application specific. The summary is that I moved some application level filtering that was being done in python code on the results into the query so less results are returned. This saves a great deal of memory in my cases and speeds up the system dramatically. Once I understood that there wasn't really a memory leak, I just optimized what was already there to use less memory. -Allen On Wed, Feb 25, 2009 at 6:59 PM, Peter Hansen wrote: > > Allen Bierbaum wrote: >> On Tue, Feb 24, 2009 at 4:44 AM, Chris Miles wrote: >>> On Feb 22, 6:08 am, Allen Bierbaum wrote: >>> Python 2.5 and later will free up garbage collected memory, handing it >>> back to the system. Previous versions of Python would never free up >>> memory (hence never shrink in size). >>> >>> Are you using Python 2.4? >> >> I am using Python 2.5. But now that I understand the issue better I >> have come up with a workaround. The biggest issue was that I didn't >> understand what I should be seeing as far as memory usage. > > Although your workaround may not be generally useful, it would still be > nice for posterity (i.e. those searching through this thread in future) > if you could summarize how you've actually addressed this issue to your > satisfaction, however crude or unusual that might be. Thanks. :) > > -- > Peter Hansen > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
Allen Bierbaum wrote: > On Tue, Feb 24, 2009 at 4:44 AM, Chris Miles wrote: >> On Feb 22, 6:08 am, Allen Bierbaum wrote: >> Python 2.5 and later will free up garbage collected memory, handing it >> back to the system. Previous versions of Python would never free up >> memory (hence never shrink in size). >> >> Are you using Python 2.4? > > I am using Python 2.5. But now that I understand the issue better I > have come up with a workaround. The biggest issue was that I didn't > understand what I should be seeing as far as memory usage. Although your workaround may not be generally useful, it would still be nice for posterity (i.e. those searching through this thread in future) if you could summarize how you've actually addressed this issue to your satisfaction, however crude or unusual that might be. Thanks. :) -- Peter Hansen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
On Tue, Feb 24, 2009 at 4:44 AM, Chris Miles wrote: > On Feb 22, 6:08 am, Allen Bierbaum wrote: >> The python process. The number of objects seems to remain fairly >> controlled. But the amount of resident memory used by the python >> process does not decrease. I had expected that by calling >> gc.collect(2) python would reclaim any objects that could be freed and >> free all memory associated with them, thus decreasing the consumed >> memory. Maybe this is an invalid assumption. Do you know any way to >> ask python to shrink it's process size (ie. clear unused memory that >> has been freed but evidently not given back to the OS)? > > Python 2.5 and later will free up garbage collected memory, handing it > back to the system. Previous versions of Python would never free up > memory (hence never shrink in size). > > Are you using Python 2.4? I am using Python 2.5. But now that I understand the issue better I have come up with a workaround. The biggest issue was that I didn't understand what I should be seeing as far as memory usage. Thanks to everyone for the help. -Allen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
Allen Bierbaum schrieb: > The python process. The number of objects seems to remain fairly > controlled. But the amount of resident memory used by the python > process does not decrease. I had expected that by calling > gc.collect(2) python would reclaim any objects that could be freed and > free all memory associated with them, thus decreasing the consumed > memory. Maybe this is an invalid assumption. Do you know any way to > ask python to shrink it's process size (ie. clear unused memory that > has been freed but evidently not given back to the OS)? Thats an invalid assumption for most systems with pooling allocators. Its usually called a high-watermark allocator which assumes your process has a steady state of memory usage. Its usually not a problem, unless your running short of virtual address space/swap space because any decent OS will just page out the unused memory blocks. But you may be out of luck still if the layout of the pool prevents such e.g. if one object in every memory page is active the system cannot swap out anything and you need a harder working GC to readjust all references to really get completly free blocks. In addition you need the cooperation of the OS free() call, not all systems really give memory back, even if you free() it, some just keep it in a pool for the process. One typical pattern to get around this is to just fork a worker for the memory intensive stuff and let it quit when done so the memory gets returned to the system. Michael -- Michael Schlenker Software Engineer CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Straße 1-3 Fax:+49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: m...@contact.de Sitz der Gesellschaft: Bremen Geschäftsführer: Karl Heinz Zachries, Ralf Holtgrefe Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
On Feb 22, 6:08 am, Allen Bierbaum wrote: > The python process. The number of objects seems to remain fairly > controlled. But the amount of resident memory used by the python > process does not decrease. I had expected that by calling > gc.collect(2) python would reclaim any objects that could be freed and > free all memory associated with them, thus decreasing the consumed > memory. Maybe this is an invalid assumption. Do you know any way to > ask python to shrink it's process size (ie. clear unused memory that > has been freed but evidently not given back to the OS)? Python 2.5 and later will free up garbage collected memory, handing it back to the system. Previous versions of Python would never free up memory (hence never shrink in size). Are you using Python 2.4? Cheers, Chris Miles --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
i dont. you may want to look into using the server_side_cursors=True option we provide on the postgres dialect, which prevents psycopg2 from fully pre-fetching result sets. On Feb 21, 2009, at 2:08 PM, Allen Bierbaum wrote: > > The python process. The number of objects seems to remain fairly > controlled. But the amount of resident memory used by the python > process does not decrease. I had expected that by calling > gc.collect(2) python would reclaim any objects that could be freed and > free all memory associated with them, thus decreasing the consumed > memory. Maybe this is an invalid assumption. Do you know any way to > ask python to shrink it's process size (ie. clear unused memory that > has been freed but evidently not given back to the OS)? > > -Allen > > On Sat, Feb 21, 2009 at 12:15 PM, Michael Bayer > wrote: >> >> are you referring to memory of the python process or number of >> unclaimed objects still present ? psycopg2 will buffer the full >> results from the database in memory before fetchone() is ever called, >> which will force the python process to grow to accomodate it. a >> python process generally stays large once it grows. >> >> >> >> On Feb 21, 2009, at 11:16 AM, Allen Bierbaum wrote: >> >>> I spent some more time and came up with a completely standalone test >>> application. (attached) (requires SA 0.5.2 and some database >>> backend). >>> I have tested this with postgres and with sqlite and both of them >>> exhibit the behavior I was talking about. Namely the memory keeps >>> increasing and stays topped out even though all handles to the >>> results >>> have been cleared in python. >>> >>> I am sure that I must be doing something very wrong, but I can't >>> figure it out. Can anyone point out my error? >>> >>> -Allen >>> >>> On Sat, Feb 21, 2009 at 7:52 AM, Allen Bierbaum >>> wrote: Hello all: We are running into a very strange memory leak issue. Here are the details: - The co executes select statements directly on a table using SA 0.5.2, Postgres 8.3, Psycopg2, all running on Linux (Ubuntu 8.04) - Querying a table with approximately 7 records - When we query the table, convert the results to a list, and then free both the list and the results proxy, memory seems to stay used - If we query the table with ever increasing size limits (limit clause), the amount of memory consumed increases - If we query with a limit of say 5 and then perform smaller queries, the smaller queries do not increase the amount of memory consumed overall - In other words, the amount of memory caps out in a way that looks like it is linear with the maximum number of table rows ever returned It looks like something in SA, Psycopg2, or PG is holding references to the returned rows but we have not been able to track it down. I am working on a completely standalone application to demonstrate the issue. Before I went much further though I wanted to post this question and code sample (not standalone, but shows the commands we are using) in case anyone sees something fishy. Any ideas? Thanks, Allen -- import os, sys, gc import psycopg2 import sqlalchemy as sa import sqlalchemy.exc class DataProvider(object): def __init__(self, dbUri, tableName): self._dbUri = dbUri self._dbEngine= None self._metadata= None self._table = None self._tableName = tableName def initialize(self): self._dbEngine = sa.create_engine(self._dbUri) self._metadata = sa.MetaData() self._metadata.bind = self._dbEngine # Build up the table metadata reflectively name_parts = self._tableName.split('.') if(len(name_parts) > 1): self._table = sa.Table(name_parts[1], self._metadata, autoload = True, schema = name_parts[0]) else: self._table = sa.Table(name_parts[0], self._metadata, autoload = True) def query(self, limit = None): s = self._table.select() if limit is not None: s = s.limit(limit) print "objs before: ", len(gc.get_objects()) print "garbage: ", len(gc.garbage) result_proxy = s.execute() result_objects = [x for x in result_proxy] #print "num results: ", len(result_objects) result_proxy.close() result_objects = [] results = [] gc.collect(2) print "objs after: ", len(gc.get_objects()) print "garbage: ", len(gc.garbage) return results db_uri = sys.argv[0] collection_name = sys.argv[1] prov = DataProvider(db_uri, collection_name) prov.initialize() >
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
The python process. The number of objects seems to remain fairly controlled. But the amount of resident memory used by the python process does not decrease. I had expected that by calling gc.collect(2) python would reclaim any objects that could be freed and free all memory associated with them, thus decreasing the consumed memory. Maybe this is an invalid assumption. Do you know any way to ask python to shrink it's process size (ie. clear unused memory that has been freed but evidently not given back to the OS)? -Allen On Sat, Feb 21, 2009 at 12:15 PM, Michael Bayer wrote: > > are you referring to memory of the python process or number of > unclaimed objects still present ? psycopg2 will buffer the full > results from the database in memory before fetchone() is ever called, > which will force the python process to grow to accomodate it. a > python process generally stays large once it grows. > > > > On Feb 21, 2009, at 11:16 AM, Allen Bierbaum wrote: > >> I spent some more time and came up with a completely standalone test >> application. (attached) (requires SA 0.5.2 and some database backend). >> I have tested this with postgres and with sqlite and both of them >> exhibit the behavior I was talking about. Namely the memory keeps >> increasing and stays topped out even though all handles to the results >> have been cleared in python. >> >> I am sure that I must be doing something very wrong, but I can't >> figure it out. Can anyone point out my error? >> >> -Allen >> >> On Sat, Feb 21, 2009 at 7:52 AM, Allen Bierbaum >> wrote: >>> Hello all: >>> >>> We are running into a very strange memory leak issue. Here are the >>> details: >>> >>> - The co executes select statements directly on a table using SA >>> 0.5.2, Postgres 8.3, Psycopg2, all running on Linux (Ubuntu 8.04) >>> - Querying a table with approximately 7 records >>> - When we query the table, convert the results to a list, and then >>> free both the list and the results proxy, memory seems to stay used >>> - If we query the table with ever increasing size limits (limit >>> clause), the amount of memory consumed increases >>> - If we query with a limit of say 5 and then perform smaller >>> queries, the smaller queries do not increase the amount of memory >>> consumed overall >>> - In other words, the amount of memory caps out in a way that looks >>> like it is linear with the maximum number of table rows ever returned >>> >>> It looks like something in SA, Psycopg2, or PG is holding references >>> to the returned rows but we have not been able to track it down. I >>> am >>> working on a completely standalone application to demonstrate the >>> issue. Before I went much further though I wanted to post this >>> question and code sample (not standalone, but shows the commands we >>> are using) in case anyone sees something fishy. >>> >>> Any ideas? >>> >>> Thanks, >>> Allen >>> >>> -- >>> import os, sys, gc >>> import psycopg2 >>> import sqlalchemy as sa >>> import sqlalchemy.exc >>> >>> class DataProvider(object): >>> def __init__(self, dbUri, tableName): >>> self._dbUri = dbUri >>> self._dbEngine= None >>> self._metadata= None >>> self._table = None >>> self._tableName = tableName >>> >>> def initialize(self): >>> self._dbEngine = sa.create_engine(self._dbUri) >>> self._metadata = sa.MetaData() >>> self._metadata.bind = self._dbEngine >>> >>> # Build up the table metadata reflectively >>> name_parts = self._tableName.split('.') >>> >>> if(len(name_parts) > 1): >>>self._table = sa.Table(name_parts[1], self._metadata, >>> autoload = True, >>> schema = name_parts[0]) >>> else: >>>self._table = sa.Table(name_parts[0], self._metadata, >>> autoload = True) >>> >>> def query(self, limit = None): >>> s = self._table.select() >>> >>> if limit is not None: >>>s = s.limit(limit) >>> >>> print "objs before: ", len(gc.get_objects()) >>> print "garbage: ", len(gc.garbage) >>> result_proxy = s.execute() >>> result_objects = [x for x in result_proxy] >>> #print "num results: ", len(result_objects) >>> result_proxy.close() >>> result_objects = [] >>> results = [] >>> gc.collect(2) >>> print "objs after: ", len(gc.get_objects()) >>> print "garbage: ", len(gc.garbage) >>> return results >>> >>> db_uri = sys.argv[0] >>> collection_name = sys.argv[1] >>> >>> prov = DataProvider(db_uri, collection_name) >>> prov.initialize() >>> >>> limits = [1, 2, 3, 4, 5, 6, >>> 1, 2, 3, 4, 5, 6] >>> # Query >>> for x in range(10): >>> limit = limits[x] >>> print "[ %s ] limit: %s " % (x, limit) >>> gc.collect(2) >>> printMemoryInformation() >>> features = prov.query(limit = limit) >>> features = None >>> gc.collect(2) >>>
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
are you referring to memory of the python process or number of unclaimed objects still present ? psycopg2 will buffer the full results from the database in memory before fetchone() is ever called, which will force the python process to grow to accomodate it. a python process generally stays large once it grows. On Feb 21, 2009, at 11:16 AM, Allen Bierbaum wrote: > I spent some more time and came up with a completely standalone test > application. (attached) (requires SA 0.5.2 and some database backend). > I have tested this with postgres and with sqlite and both of them > exhibit the behavior I was talking about. Namely the memory keeps > increasing and stays topped out even though all handles to the results > have been cleared in python. > > I am sure that I must be doing something very wrong, but I can't > figure it out. Can anyone point out my error? > > -Allen > > On Sat, Feb 21, 2009 at 7:52 AM, Allen Bierbaum > wrote: >> Hello all: >> >> We are running into a very strange memory leak issue. Here are the >> details: >> >> - The co executes select statements directly on a table using SA >> 0.5.2, Postgres 8.3, Psycopg2, all running on Linux (Ubuntu 8.04) >> - Querying a table with approximately 7 records >> - When we query the table, convert the results to a list, and then >> free both the list and the results proxy, memory seems to stay used >> - If we query the table with ever increasing size limits (limit >> clause), the amount of memory consumed increases >> - If we query with a limit of say 5 and then perform smaller >> queries, the smaller queries do not increase the amount of memory >> consumed overall >> - In other words, the amount of memory caps out in a way that looks >> like it is linear with the maximum number of table rows ever returned >> >> It looks like something in SA, Psycopg2, or PG is holding references >> to the returned rows but we have not been able to track it down. I >> am >> working on a completely standalone application to demonstrate the >> issue. Before I went much further though I wanted to post this >> question and code sample (not standalone, but shows the commands we >> are using) in case anyone sees something fishy. >> >> Any ideas? >> >> Thanks, >> Allen >> >> -- >> import os, sys, gc >> import psycopg2 >> import sqlalchemy as sa >> import sqlalchemy.exc >> >> class DataProvider(object): >> def __init__(self, dbUri, tableName): >> self._dbUri = dbUri >> self._dbEngine= None >> self._metadata= None >> self._table = None >> self._tableName = tableName >> >> def initialize(self): >> self._dbEngine = sa.create_engine(self._dbUri) >> self._metadata = sa.MetaData() >> self._metadata.bind = self._dbEngine >> >> # Build up the table metadata reflectively >> name_parts = self._tableName.split('.') >> >> if(len(name_parts) > 1): >>self._table = sa.Table(name_parts[1], self._metadata, >> autoload = True, >> schema = name_parts[0]) >> else: >>self._table = sa.Table(name_parts[0], self._metadata, >> autoload = True) >> >> def query(self, limit = None): >> s = self._table.select() >> >> if limit is not None: >>s = s.limit(limit) >> >> print "objs before: ", len(gc.get_objects()) >> print "garbage: ", len(gc.garbage) >> result_proxy = s.execute() >> result_objects = [x for x in result_proxy] >> #print "num results: ", len(result_objects) >> result_proxy.close() >> result_objects = [] >> results = [] >> gc.collect(2) >> print "objs after: ", len(gc.get_objects()) >> print "garbage: ", len(gc.garbage) >> return results >> >> db_uri = sys.argv[0] >> collection_name = sys.argv[1] >> >> prov = DataProvider(db_uri, collection_name) >> prov.initialize() >> >> limits = [1, 2, 3, 4, 5, 6, >> 1, 2, 3, 4, 5, 6] >> # Query >> for x in range(10): >> limit = limits[x] >> print "[ %s ] limit: %s " % (x, limit) >> gc.collect(2) >> printMemoryInformation() >> features = prov.query(limit = limit) >> features = None >> gc.collect(2) >> printMemoryInformation() >> >> -- >> > > > > import os, sys, gc > import psycopg2 > > import sqlalchemy as sa > import sqlalchemy.exc > > def getMemoryUsage(VmKey): > proc_status = '/proc/%d/status' % os.getpid() > scale = {'kB': 1024.0, 'mB': 1024.0*1024.0, >'KB': 1024.0, 'MB': 1024.0*1024.0} > v = open(proc_status).read() > > # get VmKey line e.g. 'VmRSS: kB\n ...' > i = v.index(VmKey) > v = v[i:].split(None, 3) # whitespace > if len(v) < 3: > return 0.0 # invalid format? > # convert Vm value to bytes > return float(v[1]) * scale[v[2]] > > def memory(): >return g
[sqlalchemy] Re: app memory leak - execute() with SA 0.5.2, PG 8.3, Psycopg2
I spent some more time and came up with a completely standalone test application. (attached) (requires SA 0.5.2 and some database backend). I have tested this with postgres and with sqlite and both of them exhibit the behavior I was talking about. Namely the memory keeps increasing and stays topped out even though all handles to the results have been cleared in python. I am sure that I must be doing something very wrong, but I can't figure it out. Can anyone point out my error? -Allen On Sat, Feb 21, 2009 at 7:52 AM, Allen Bierbaum wrote: > Hello all: > > We are running into a very strange memory leak issue. Here are the details: > > - The co executes select statements directly on a table using SA > 0.5.2, Postgres 8.3, Psycopg2, all running on Linux (Ubuntu 8.04) > - Querying a table with approximately 7 records > - When we query the table, convert the results to a list, and then > free both the list and the results proxy, memory seems to stay used > - If we query the table with ever increasing size limits (limit > clause), the amount of memory consumed increases > - If we query with a limit of say 5 and then perform smaller > queries, the smaller queries do not increase the amount of memory > consumed overall > - In other words, the amount of memory caps out in a way that looks > like it is linear with the maximum number of table rows ever returned > > It looks like something in SA, Psycopg2, or PG is holding references > to the returned rows but we have not been able to track it down. I am > working on a completely standalone application to demonstrate the > issue. Before I went much further though I wanted to post this > question and code sample (not standalone, but shows the commands we > are using) in case anyone sees something fishy. > > Any ideas? > > Thanks, > Allen > > -- > import os, sys, gc > import psycopg2 > import sqlalchemy as sa > import sqlalchemy.exc > > class DataProvider(object): > def __init__(self, dbUri, tableName): > self._dbUri = dbUri > self._dbEngine= None > self._metadata= None > self._table = None > self._tableName = tableName > > def initialize(self): > self._dbEngine = sa.create_engine(self._dbUri) > self._metadata = sa.MetaData() > self._metadata.bind = self._dbEngine > > # Build up the table metadata reflectively > name_parts = self._tableName.split('.') > > if(len(name_parts) > 1): > self._table = sa.Table(name_parts[1], self._metadata, autoload = True, >schema = name_parts[0]) > else: > self._table = sa.Table(name_parts[0], self._metadata, autoload = True) > > def query(self, limit = None): > s = self._table.select() > > if limit is not None: > s = s.limit(limit) > > print "objs before: ", len(gc.get_objects()) > print "garbage: ", len(gc.garbage) > result_proxy = s.execute() > result_objects = [x for x in result_proxy] > #print "num results: ", len(result_objects) > result_proxy.close() > result_objects = [] > results = [] > gc.collect(2) > print "objs after: ", len(gc.get_objects()) > print "garbage: ", len(gc.garbage) > return results > > db_uri = sys.argv[0] > collection_name = sys.argv[1] > > prov = DataProvider(db_uri, collection_name) > prov.initialize() > > limits = [1, 2, 3, 4, 5, 6, >1, 2, 3, 4, 5, 6] > # Query > for x in range(10): > limit = limits[x] > print "[ %s ] limit: %s " % (x, limit) > gc.collect(2) > printMemoryInformation() > features = prov.query(limit = limit) > features = None > gc.collect(2) > printMemoryInformation() > > -- > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~--- import os, sys, gc import psycopg2 import sqlalchemy as sa import sqlalchemy.exc def getMemoryUsage(VmKey): proc_status = '/proc/%d/status' % os.getpid() scale = {'kB': 1024.0, 'mB': 1024.0*1024.0, 'KB': 1024.0, 'MB': 1024.0*1024.0} v = open(proc_status).read() # get VmKey line e.g. 'VmRSS: kB\n ...' i = v.index(VmKey) v = v[i:].split(None, 3) # whitespace if len(v) < 3: return 0.0 # invalid format? # convert Vm value to bytes return float(v[1]) * scale[v[2]] def memory(): return getMemoryUsage('VmSize:') / (1024*1024) def resident(): return getM