Re: [PR] BTree engine term cache [couchdb]

via GitHub Tue, 19 Aug 2025 16:15:13 -0700


nickva commented on PR #5625:
URL: https://github.com/apache/couchdb/pull/5625#issuecomment-3202798994


   I pretty-fied a quick btree stats reporter I had since we were wondering 
what the tree looked like from above (this is q=8, 100k docs) just one shard 
copy:
   
   ```json
      "sizes": {
           "active": 5812423,
           "external": 4040930,
           "file": 7188696,
           "id_tree": {
               "1": {
                   "kp_node": {
                       "cnt": 1,
                       "max": 5,
                       "min": 5
                   }
               },
               "2": {
                   "kp_node": {
                       "cnt": 5,
                       "max": 21,
                       "min": 15
                   }
               },
               "3": {
                   "kp_node": {
                       "cnt": 91,
                       "max": 23,
                       "min": 11
                   }
               },
               "4": {
                   "kv_node": {
                       "cnt": 1450,
                       "max": 15,
                       "min": 1
                   }
               }
           },
           "seq_tree": {
               "1": {
                   "kp_node": {
                       "cnt": 1,
                       "max": 2,
                       "min": 2
                   }
               },
               "2": {
                   "kp_node": {
                       "cnt": 2,
                       "max": 26,
                       "min": 2
                   }
               },
               "3": {
                   "kp_node": {
                       "cnt": 28,
                       "max": 47,
                       "min": 6
                   }
               },
               "4": {
                   "kv_node": {
                       "cnt": 975,
                       "max": 15,
                       "min": 2
                   }
               }
           }
       }
   ```
   
   The key is depth, then node type, then `cnt` is the number of nodes at that 
level, `min` is the smallest node size (number of kvs/kps), `max` is the 
largest size.
   
   It's not as shallow as we'd expect due to how complete_root works, and 
chunk_size is probably not the best any longer (doesn't count for compression). 
I was going to look into maybe having a different chunk size, or different per 
node type (kps get more), but that's for a different PR.
   
   So caching the top 2 nodes makes sense, there are not that many and they 
bring the biggest benefit. Top 3 could also be an option but maybe start 
smaller at first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] BTree engine term cache [couchdb]

Reply via email to