On May 14, 2010, at 11:09 AM, Wout Mertens wrote:

> Old thread I know, but I was wondering about a way to make compaction more 
> fluid:
> 
> On Dec 21, 2009, at 23:20 , Damien Katz wrote:
> 
>> I saw recently some issues people where having with compaction, and I 
>> thought I'd get some thoughts down about ways to improve the compaction 
>> code/experience.
>> 
>> 1. Multi-process pipeline processing. Similar to the enhancements to the 
>> view indexing, there is opportunities for pipelining operations instead of 
>> the current read/write batch operations it does. This can reduce memory 
>> usage and make compaction faster.
>> 2. Multiple disks/mount points. CouchDB could easily have 2 or more database 
>> dirs, and each time it compacts, it copies the new database file to another 
>> dir/disk/mountpoint. For servers with multiple disks this will greatly 
>> smooth the copying as the disk heads won't need to seek between reads and 
>> writes.
>> 3. Better compaction algorithms. There are all sorts of clever things that 
>> could be done to make the compaction faster. Right now it rebuilds the 
>> database in a similar manner as if it would if it clients were bulk updating 
>> it. This was the simplest way to do it, but certainly not the fastest. There 
>> are a lot of ways to make this much more efficient, they just take more work.
>> 4. Tracking wasted space. This can be used to determine threshold for 
>> compaction. We don't  need to track with 100% accuracy how much disk space 
>> is being wasted, but it would be a big improvement to at least know how much 
>> disk space the raw docs take, and maybe calculate an estimate of the indexes 
>> necessary to support them in a freshly compacted database.
>> 5. Better Low level file driver support. Because we are using the Erlang 
>> built-in file system drivers, we don't have access to a lot of flags. If we 
>> had our own drivers, one option we'd like to use is to not OS cache the 
>> reads and write during the compaction, it's unnecessary for compaction and 
>> it could completely consume the cache with rarely accessed data, evicting 
>> lots of recently used live data, greatly hurting performance of other 
>> databases.
>> 
>> Anyway, just getting these thoughts out. More ideas and especially code 
>> welcome.
> 
> 
> How about
> 
> 6. Store the databases in multiple files. Instead of one really big file, use 
> several big chunk-files of fixed maximum length. One chunk-file is "active" 
> and receives writes. Once that chunk-file grows past a certain size, for 
> example 25MB, start a new file. Then, at compaction time, you can do the 
> compaction one chunk-file at a time.
> Possible optimization: If a certain chunk-file has no outdated documents (or 
> only a small %), leave it alone.
> 
> I'm armchair-programming here, I have only a vague idea of what the on-disk 
> format looks like, but this could allow continuous compaction, by only 
> compacting (slowly) the completed chunk-files. Furthermore, it would allow 
> spreading the database across multiple disks (since there are now multiple 
> files per db), although one disk would still be receiving all the writes. A 
> smart write scheduler could make sure different databases have different 
> active disks. Possibly, multiple chunk-files could be active at the same 
> time, providing all sorts of interesting failure scenarios ;-)
> 
> Thoughts?
> 
> Wout.

Hi Wout, Robert Newson suggested the very same in the original thread.  It's a 
solid idea, to be sure.

In related work, there's COUCHDB-738

https://issues.apache.org/jira/browse/COUCHDB-738

I wrote a patch to change the internal database format that allows compaction 
to skip an extra lookup in the by_id tree.  Its a huge win for write-once DBs 
with random docids -- something like a 6x improvement in compaction speed in 
one test.  However, DBs with frequently edited documents become 35-40% larger 
pre- and post-compaction.

Damien has proposed a better alternative in that thread which is a much bigger 
rewrite of the compaction algorithm.  Best,

Adam



Reply via email to