Thanks Antonio... Relative to cpyext... I am not sure. We are not directly, but who knows what is being used by incorporated modules (pymongo, etc..) Trying to create a stripped down test case... I filed bug here with some updates:
https://bitbucket.org/pypy/pypy/issues/2982/continual-increase-in-memory-utilization Reproduces in pypy, pypy without JIT but *does not* reproduce in CPython so there is some indications there Once I get the stripped down test case that is not tied in to our entire codebase -- I will publish it in the bug if I can and try other PyPy versions as well. Rob On Thu, Mar 28, 2019 at 10:56 AM Antonio Cuni <anto.c...@gmail.com> wrote: > Hi Robert, > are you using any package which relies on cpyext? I.e., modules written in > C and/or with Cython (cffi is fine). > IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext > objects. > So if you have a cycle which does e.g. > Py_foo -> C_bar -> Py_foo > (where Py_foo is a pure-python object and C_bar a cpyext object) they will > never be collected unless you break the cycle manually. > > Other than that: have you tried running it with PyPy 7.0 and/or 7.1? > > > On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher < > robert.whitc...@rubrik.com> wrote: > >> So I have a process that use PyPy and pymongo in a loop. >> It does basically the same thing every loop, which query a table in via >> pymongo and do a few non-save calculations and then wait and loop again >> >> The RSS of the process continually increased (the PYPY_GC_MAX is set >> pretty high). >> So I hooked in the GC stats output per: >> http://doc.pypy.org/en/latest/gc_info.html >> I also assure that gc.collect() was called at least every 3 minutes. >> >> What I see is that... The memory while high is fair constant for a long >> time: >> >> 2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736 >> ... >> 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864 >> 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312 >> >> >> Then it decides (an the exact per-loop behavior is the same each time) to >> chew up much more memory: >> >> 2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440 >> 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232 >> 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168 >> 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872 >> 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208 >> 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304 >> 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480 >> 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712 >> >> >> That's 140 MB .... Where is all that memory going... >> What's more is that the PyPy GC stats do not show anything different: >> >> Here are the GC stats from GC-Complete when we were at *144MB*: >> >> 2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064 >> 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log >> (async_worker_process.py:308): DBG0: Total memory consumed: >> GC used: 56.8MB (peak: 69.6MB) >> in arenas: 39.3MB >> rawmalloced: 14.5MB >> nursery: 3.0MB >> raw assembler used: 521.6kB >> ----------------------------- >> Total: 57.4MB >> >> Total memory allocated: >> GC allocated: 63.0MB (peak: 71.2MB) >> in arenas: 43.9MB >> rawmalloced: 22.7MB >> nursery: 3.0MB >> raw assembler allocated: 1.0MB >> ----------------------------- >> Total: 64.0MB >> >> >> Here are the GC stats from GC-Complete when we are at *285MB*: >> >> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log >> (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136 >> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log >> (async_worker_process.py:308): DBG0: Total memory consumed: >> GC used: 57.5MB (peak: 69.6MB) >> in arenas: 39.9MB >> rawmalloced: 14.6MB >> nursery: 3.0MB >> raw assembler used: 1.5MB >> ----------------------------- >> Total: 58.9MB >> >> Total memory allocated: >> GC allocated: 63.1MB (peak: 71.2MB) >> in arenas: 43.9MB >> rawmalloced: 22.7MB >> nursery: 3.0MB >> raw assembler allocated: 2.0MB >> ----------------------------- >> Total: 65.1MB >> >> >> How is this possible? >> >> I am measuring RSS with: >> >> def get_rss_mem_usage(): >> ''' >> Get the RSS memory usage in bytes >> @return: memory size in bytes; -1 if error occurs >> ''' >> try: >> process = psutil.Process(os.getpid()) >> return process.get_memory_info().rss >> except: >> return -1 >> >> >> And cross referencing with "ps -orss -p <pid>" and the RSS values >> reported are correct.... >> >> I cannot figure out where to go from here with this as it appears that >> PyPy is leaking this memory somehow... >> And I have no idea howto proceed from here... >> I end up having memory problems and getting Memory Warnings for a process >> that just loops and queries via pymongo >> Pymongo version is 3.7.1 >> >> This is: >> >> Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, >> 08:21:35) >> [PyPy 6.0.0 with GCC 7.2.0] >> >> >> >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev@python.org >> https://mail.python.org/mailman/listinfo/pypy-dev >> >
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev