Hi Vladimir, these are very interesting. A few comments: - bulkload - you mean not by loading pre-created HFiles? If you do that there would be no compaction during the import as the files are simply moved into place. - local compaction IO limit. Limiting the number of compaction threads (1 by default) is not good enough ... ? You can cause too much harm even with a single thread compacting per region server?
- rack IO throttle. We should add that to accommodate for over subscription at the ToR level. - cluster wide compaction storms. Yeah, that's bad. Can be alleviated by spreading timed major compactions out. (in our clusters we set the interval to 1 week and the jitter to 1/2 week) - what do you think about off-peak compaction? We have that in part as the compaction ratio can be set differently for off peak hours Generally I like the idea of being able to pace compaction better. Do you want to file jiras for these? Doesn't mean you have to do all the work :) -- Lars ________________________________ From: Vladimir Rodionov <[email protected]> To: "[email protected]" <[email protected]> Sent: Friday, October 3, 2014 10:34 PM Subject: Compactions nice to have features I am thinking about the following: 1. Compaction On/Off per CF, Table, cluster. Both: minor and major Good during bulk load. - Disable compaction for table 'T' - Load 1B rows - Enable compaction for table 'T' 2. Local Compaction I/O throttle Set I/O limit per RS 3. Rack Compaction I/O throttle Set I/O limit per server rack. Good to control uplink bandwidth. 4. Cluster Compaction I/O throttle. Good to avoid compaction storms -Vladimir Rodionov
