AlexanderKaraberov edited a comment on issue #2232: Elusive refc binary memory 
leaks in index_updater and replicator_worker processes
URL: https://github.com/apache/couchdb/issues/2232#issuecomment-539110424
 
 
   Hi @davisp ,
   Thank you for stepping in. Indeed, fix for `couch_replicator_worker` seems 
to be a neat and quick one especially considering that those processes 
ordinarily hold the most `binary` memory. I will definitely try it.
   
   Regarding other smaller leaks I was actually not very precise because there 
are three types of processes which consume less space but due to a sheer amount 
of them amplified by sharding, they contribute a lot to a total RAM consumed by 
`beam.smp`. These are notably (sorted by number of binaries freed):
   
   1. `couch_db_updater`
   2. `couch_index`
   3. `couch_index_updater`
   
   I've repeated my tests several times on various production nodes to exclude 
variance but distribution is almost always the same. When I inspect binaries of 
the mentioned processes I see a lot of repeated `BinaryId`s:
   ```erlang
   {binary,[{139849336815280,30512,304},
            {139849336815280,30512,304},
            {139849336815280,30512,304},
            {139849336815280,30512,304},
            {139849336815280,30512,304},
   ```
   Typical amount of binaries when I sort processes by 
`length(process_info(binary))` is around 2300-3000. Unfortunately at this 
moment we are not running a build of BEAM VM with debug symbols therefore I 
can't use neither `gdb` to print raw content pointed by those `BinaryId`s nor 
`etp-commands` in order to look deeper and actually understand what are those 
binaries. But perhaps there are some NIFs which I can leverage? Otherwise I 
might try to repeat my tests on a debuggable VM.
   
   > For the couch_index_updater, is that the gen_server process, or the 
anonymous worker process actually performing the updates?
   
   Hm, it looks like it's a worker process [according to a code 
here](https://github.com/apache/couchdb/blob/2.3.1/src/couch_index/src/couch_index_updater.erl#L71).
  But calling info for the process which reclaimed the most binaries I see this:
   ```
   {'$initial_call',{couch_index_updater,init,1}}]},
   ```
   Same for `couch_db_updater`.
   
   > because that process already invokes garbage collection after every 
document processed.
   
   Yes you [are 
right](https://github.com/apache/couchdb/blob/2.3.1/src/couch_index/src/couch_index_updater.erl#L182)
 but perhaps there exists more places which require this?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to