On May 14, 2010, at 11:09 AM, Wout Mertens wrote: > Old thread I know, but I was wondering about a way to make compaction more > fluid: > > On Dec 21, 2009, at 23:20 , Damien Katz wrote: > >> I saw recently some issues people where having with compaction, and I >> thought I'd get some thoughts down about ways to improve the compaction >> code/experience. >> >> 1. Multi-process pipeline processing. Similar to the enhancements to the >> view indexing, there is opportunities for pipelining operations instead of >> the current read/write batch operations it does. This can reduce memory >> usage and make compaction faster. >> 2. Multiple disks/mount points. CouchDB could easily have 2 or more database >> dirs, and each time it compacts, it copies the new database file to another >> dir/disk/mountpoint. For servers with multiple disks this will greatly >> smooth the copying as the disk heads won't need to seek between reads and >> writes. >> 3. Better compaction algorithms. There are all sorts of clever things that >> could be done to make the compaction faster. Right now it rebuilds the >> database in a similar manner as if it would if it clients were bulk updating >> it. This was the simplest way to do it, but certainly not the fastest. There >> are a lot of ways to make this much more efficient, they just take more work. >> 4. Tracking wasted space. This can be used to determine threshold for >> compaction. We don't need to track with 100% accuracy how much disk space >> is being wasted, but it would be a big improvement to at least know how much >> disk space the raw docs take, and maybe calculate an estimate of the indexes >> necessary to support them in a freshly compacted database. >> 5. Better Low level file driver support. Because we are using the Erlang >> built-in file system drivers, we don't have access to a lot of flags. If we >> had our own drivers, one option we'd like to use is to not OS cache the >> reads and write during the compaction, it's unnecessary for compaction and >> it could completely consume the cache with rarely accessed data, evicting >> lots of recently used live data, greatly hurting performance of other >> databases. >> >> Anyway, just getting these thoughts out. More ideas and especially code >> welcome. > > > How about > > 6. Store the databases in multiple files. Instead of one really big file, use > several big chunk-files of fixed maximum length. One chunk-file is "active" > and receives writes. Once that chunk-file grows past a certain size, for > example 25MB, start a new file. Then, at compaction time, you can do the > compaction one chunk-file at a time. > Possible optimization: If a certain chunk-file has no outdated documents (or > only a small %), leave it alone. > > I'm armchair-programming here, I have only a vague idea of what the on-disk > format looks like, but this could allow continuous compaction, by only > compacting (slowly) the completed chunk-files. Furthermore, it would allow > spreading the database across multiple disks (since there are now multiple > files per db), although one disk would still be receiving all the writes. A > smart write scheduler could make sure different databases have different > active disks. Possibly, multiple chunk-files could be active at the same > time, providing all sorts of interesting failure scenarios ;-) > > Thoughts? > > Wout.
Hi Wout, Robert Newson suggested the very same in the original thread. It's a solid idea, to be sure. In related work, there's COUCHDB-738 https://issues.apache.org/jira/browse/COUCHDB-738 I wrote a patch to change the internal database format that allows compaction to skip an extra lookup in the by_id tree. Its a huge win for write-once DBs with random docids -- something like a 6x improvement in compaction speed in one test. However, DBs with frequently edited documents become 35-40% larger pre- and post-compaction. Damien has proposed a better alternative in that thread which is a much bigger rewrite of the compaction algorithm. Best, Adam