nickva commented on PR #5625:
URL: https://github.com/apache/couchdb/pull/5625#issuecomment-3202798994
I pretty-fied a quick btree stats reporter I had since we were wondering
what the tree looked like from above (this is q=8, 100k docs) just one shard
copy:
```json
"sizes": {
"active": 5812423,
"external": 4040930,
"file": 7188696,
"id_tree": {
"1": {
"kp_node": {
"cnt": 1,
"max": 5,
"min": 5
}
},
"2": {
"kp_node": {
"cnt": 5,
"max": 21,
"min": 15
}
},
"3": {
"kp_node": {
"cnt": 91,
"max": 23,
"min": 11
}
},
"4": {
"kv_node": {
"cnt": 1450,
"max": 15,
"min": 1
}
}
},
"seq_tree": {
"1": {
"kp_node": {
"cnt": 1,
"max": 2,
"min": 2
}
},
"2": {
"kp_node": {
"cnt": 2,
"max": 26,
"min": 2
}
},
"3": {
"kp_node": {
"cnt": 28,
"max": 47,
"min": 6
}
},
"4": {
"kv_node": {
"cnt": 975,
"max": 15,
"min": 2
}
}
}
}
```
The key is depth, then node type, then `cnt` is the number of nodes at that
level, `min` is the smallest node size (number of kvs/kps), `max` is the
largest size.
It's not as shallow as we'd expect due to how complete_root works, and
chunk_size is probably not the best any longer (doesn't count for compression).
I was going to look into maybe having a different chunk size, or different per
node type (kps get more), but that's for a different PR.
So caching the top 2 nodes makes sense, there are not that many and they
bring the biggest benefit. Top 3 could also be an option but maybe start
smaller at first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]