Re: [I] [Bug]: CouchDB 3.5.1 upgrade leads to elevated memory and instability (OOM on single node + full cluster drop during rolling upgrade) [couchdb]

via GitHub Wed, 11 Feb 2026 08:14:21 -0800


oisheeaa commented on issue #5879:
URL: https://github.com/apache/couchdb/issues/5879#issuecomment-3885452495


   > Thank you [@oisheeaa](https://github.com/oisheeaa) for your patience and 
for the additional info. I think what's causing the increased memory usage is 
the `max_dbs_open = 60000` value. That's entirely too high for an 8GB instance. 
Even on 128GB instances we keep it at our default of 5000 and only on some 
clusters raise to 15000 or 20000.
   > 
   > In 3.4.2 there was an additional `close_on_idle` mechanism to periodically 
close idle db handles (see 
https://github.com/apache/couchdb/blob/3.4.2/src/couch/src/couch_db_updater.erl#L235
 for details), however that mechanism had race conditions and a few other 
issues with it, so it was removed.
   > 
   > The way the db handles cache works is once the handle is opened, it will 
stay opened until used again. If at some point, if there is more room in the 
cache, the oldest unused handle is closed and replaced with the new one. So 
size the max_dbs_open accordingly to how much memory you can allocate to it. 
For an 8GB maybe start with 1000-1500 and see if you get any `all_dbs_active` 
errors. (Try on a staging cluster first if you can generate similar load as the 
production cluster). There is a balance to how much memory should be used for 
the db handle cache vs having more for the page cache.
   
   Thanks for the detailed explanation , this helps a lot.
   We reverted our preprod cluster from 3.5.1 back to 3.4.2 (same infra, same` 
max_dbs_open = 60000`) and observed that the sustained 60k open DB plateau 
dropped immediately. That seems consistent with what you described regarding 
the removal of the close_on_idle mechanism in 3.5 and the change in DB handle 
lifecycle behaviour.
   
   Given that:
   - Is the removal of `close_on_idle` considered permanent going forward, or 
is there any plan to reintroduce an idle handle eviction mechanism in future 
releases?
   - Is there an officially recommended upper bound for max_dbs_open for 8GB 
instances under 3.5? We understand it must now be sized according to memory, 
but are there general guidelines or tested ranges the project considers safe?
   - In your experience, when clusters legitimately require very high distinct 
DB counts (tens of thousands), is the expectation that users:
      - Scale memory proportionally, or
       - Rely strictly on lower `max_dbs_open` values and tolerate 
`all_dbs_active `errors under pressure?
   We're trying to determine whether our architecture should assume lower 
`max_dbs_open` permanently in 3.5+, or that future versions may introduce 
improved idle handle management again.
   
   Appreciate any clarification on the long term direction here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug]: CouchDB 3.5.1 upgrade leads to elevated memory and instability (OOM on single node + full cluster drop during rolling upgrade) [couchdb]

Reply via email to