AlexanderKaraberov edited a comment on issue #2232: Elusive refc binary memory leaks in index_updater and replicator_worker processes URL: https://github.com/apache/couchdb/issues/2232#issuecomment-539110424 Hi @davisp , Thank you for stepping in. Indeed, fix for `couch_replicator_worker` seems to be a neat and quick one especially considering that those processes ordinarily hold the most `binary` memory. I will definitely try it. Regarding other smaller leaks I was actually not very precise because there are three types of processes which consume less space but due to a sheer amount of them amplified by sharding, they contribute a lot to a total RAM consumed by `beam.smp`. These are notably (sorted by number of binaries freed): 1. `couch_db_updater` 2. `couch_index` 3. `couch_index_updater` I've repeated my tests several times on various production nodes to exclude variance but distribution is almost always the same. When I inspect binaries of the mentioned processes I see a lot of repeated `BinaryId`s: ```erlang {binary,[{139849336815280,30512,304}, {139849336815280,30512,304}, {139849336815280,30512,304}, {139849336815280,30512,304}, {139849336815280,30512,304}, ``` Typical amount of binaries when I sort processes by `length(process_info(binary))` is around 2300-3000. Unfortunately at this moment we are not running a build of BEAM VM with debug symbols therefore I can't use neither `gdb` to print raw content pointed by those `BinaryId`s nor `etp-commands` in order to look deeper and actually understand what are those binaries. But perhaps there are some NIFs which I can leverage? Otherwise I might try to repeat my tests on a debuggable VM. > For the couch_index_updater, is that the gen_server process, or the anonymous worker process actually performing the updates? Hm, it looks like it's a worker process [according to a code here](https://github.com/apache/couchdb/blob/2.3.1/src/couch_index/src/couch_index_updater.erl#L71). But calling info for the process which reclaimed the most binaries I see this: ``` {'$initial_call',{couch_index_updater,init,1}}]}, ``` Same for `couch_db_updater`. > because that process already invokes garbage collection after every document processed. Yes you [are right](https://github.com/apache/couchdb/blob/2.3.1/src/couch_index/src/couch_index_updater.erl#L182) but perhaps there exists more places which require this?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services