davisp opened a new pull request #504: Opimize writing KV node append writes URL: https://github.com/apache/couchdb/pull/504 ## Overview As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b]', [c] [a, b]', [c, d] [a, b]', [c, d]', [e] [a, b]', [c, d]', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b]', [c, d] [a, b]', [c, d, e] [a, b]', [c, d]', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes that have been orphaned. Where as in the second case we end up with [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b], [c] which leads us to writing [a, b] a second time. This patch changes the write function to realize when we're merely appending KVs and saves us this extra write and generation of garbage. Its node patterns look like such: [] [a] [a, b] [a, b], [c] [a, b], [c, d] [a, b], [c, d], [e] [a, b], [c, d], [e, f] Which means we only end up generating [a], [c], and [e] as garbage (with respect to kv nodes, kp nodes retain their historical behavior). ## Testing recommendations Normal `make check` ## JIRA issue number This is related to COUCHDB-3298 and is a follow on work ## Checklist - [ ] Code is written and works correctly; - [ ] Changes are covered by tests; - [ ] Documentation reflects the changes; ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
