[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088332#comment-13088332 ]
Robert Newson commented on COUCHDB-1153: ---------------------------------------- as a small note, and one Paul mentioned in passing, it seems simpler if the percentage values were expressed as ratios instead. That is, 0.6 instead of "60%". I'd also like more detail on the impact of periodically crawling all_dbs on a very active system. Having seen a significant negative impact of that approach in production I remain skeptical that it's a viable approach. I can contribute a patch to hook actively updated databases, for example. > Database and view index compaction daemon > ----------------------------------------- > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk > Reporter: Filipe Manana > Assignee: Filipe Manana > Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ; of old data (and its supporting metadata) over the > view > ; index (view group) file size is equal to or greater > then > ; this value, then this view index compaction > condition is > ; satisfied. This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained > when > ; querying a view group's information URI > ; (GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ; is allowed. This value must obey the following format: > ; > ; HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60% > ; The `foo` database is compacted if its fragmentation is 70% or more. > ; Any view index of this database is compacted only if its fragmentation > ; is 60% or more. > ; > ; 2) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = > 00:00-04:00 > ; Similar to the preceding example but a compaction (database or view > index) > ; is only triggered if the current time is between midnight and 4 AM. > ; > ; 3) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = > 00:00-04:00, strict_window = yes > ; Similar to the preceding example - a compaction (database or view index) > ; is only triggered if the current time is between midnight and 4 AM. If at > ; 4 AM the database or one of its views is still compacting, the compaction > ; process will be canceled. > ; > ;_default = db_fragmentation = 70%, view_fragmentation = 60%, period = 23:00 > - 04:00 > (from https://github.com/fdmanana/couchdb/compare/compaction_daemon#L0R195) > The full patch is mostly a new module but also does some minimal changes and > a small refactoring to the view compaction code, not changing the current > behaviour. > Patch is at: > https://github.com/fdmanana/couchdb/compare/compaction_daemon.patch > By default the daemon is idle, without any configuration enabled. I'm open to > suggestions on additional parameters and a better configuration system. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira