New submission from Yonatan Goldschmidt <yon.goldschm...@gmail.com>:
We have an application which creates 10,000s of logging.Logger & logging.Handler objects. I encountered a major slowdown during GCs which happen to collect dead Handler objects, calling logging._removeHandlerRef() along the way. That function looks like: def _removeHandlerRef(wr): acquire, release, handlers = _acquireLock, _releaseLock, _handlerList if acquire and release and handlers: acquire() try: if wr in handlers: handlers.remove(wr) finally: release() and is called by a weakref, attached to each Handler created in _addHandlerRef(). The slowdown occurs in the "in" operator, and the remove() call. These operations on very long lists are expensive. My suggestion is that, without modifying the observed behavior of this function, it can be changed to perform only one lookup in the list, by simply calling remove() and ignoring the ValueError if it gets raised. Attached is a small script which demonstrates the issue. It creates N1 Handler objects, then deletes the strong reference to the last N2 handler objects, and triggers a GC, measuring the time it takes. Additionally, we can measure its entire execution (since all remaining Handler objects are cleaned up on exit, and eventually _removeHandlerRef() is called for each, and it drastically slows down the interpreter shutdown). Running the demo with N1=30k and N2=2k: root@8aafe9d1308b:/# python -V Python 3.9.2 root@8aafe9d1308b:/# time python demo.py 30000 2000 len(logging._handlerList)=30001 len(logging._handlerList)=28001 gc.collect() took 1.552838139992673s real 0m12.626s user 0m12.618s sys 0m0.008s then applying the improving patch... root@8aafe9d1308b:/# (cd /usr/local/lib/python3.9/ && patch -p2 < /patch ) patching file logging/__init__.py Hunk #1 succeeded at 826 (offset -19 lines). and trying again: root@8aafe9d1308b:/# time python demo.py 30000 2000 len(logging._handlerList)=30001 len(logging._handlerList)=28001 gc.collect() took 0.8691686679958366s real 0m7.134s user 0m7.130s sys 0m0.004s root@8aafe9d1308b:/# Almost a 2x speedup. I also tried removing the acquire/release locks (now that the remove operation is atomic...). I'm not sure it maintains the semantics, and it didn't make too much of a positive effect on the run-time anyway, so I didn't continue looking into it. ---------- components: Library (Lib) files: demo.py messages: 394229 nosy: Yonatan Goldschmidt priority: normal severity: normal status: open title: Improving _removeHandlerRef for a very long _handlerList type: performance versions: Python 3.11 Added file: https://bugs.python.org/file50060/demo.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44222> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com