[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090078#comment-13090078 ] Jan Lehnardt commented on COUCHDB-1153: --- Paul, this came across wrong, sorry for the added confusion and feelings hurt. My post was aimed at "let's get back to work", not disregarding your efforts. Just to close the loop, I was going through this ticket and dev@ looking for "open issues" and then categorised them into "this code needs work" and "this code needs work and thus should not be in trunk at this point" (with the obvious goal of removing all of the latter ones, so we can retroactively have the patch in trunk okayed). I used the catch-all "bikedshed" for the former category and hence totally disregarded your comments. I was meaning to say that these aren't blockers (in my opinion) that would warrant a revert of the patch and I didn't mean to imply that this is some opinion you pulled out of thin air. Sorry, again. I hope we can put this to rest now. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089991#comment-13089991 ] Paul Joseph Davis commented on COUCHDB-1153: I've deleted at least four versions of a ranty comment. Basically it all boils down to this: Can we get past this meta discussion bullshit and get back to engineering yet? > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fragmentation =
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089978#comment-13089978 ] Paul Joseph Davis commented on COUCHDB-1153: Gonna make this in two comments. This one will be the non-ranty version. @Jan Calling waving your hands at something and calling it bike-shedding is not overly productive. I listed specific points on why I prefer 0.6 over "60%". If you would like to argue something along the lines of "I see your point, but I believe that the percent format will be more intuitive to users," that's fine. But to dismiss it as bike-shedding is a bit off putting. Discarding the point about reducing the redundancy of code as bike-shedding is equally off putting. If some asks me to review code and I make suggestions, I'd rather not expect people to walk by and say "that's just bike-shedding." On the other hand I do admit that the order of functions can come off as bike-shedding. Then again I also feel obliged to apologize to other Erlangers when they mention they've looked at our code base. @Filipe No worries. couch_server is quite a hard bit of code to test in any sort of thorough manner at the scale that matters. I'm still mulling over how to test it so that I can have a way to decide if a fix has fixed anything or not. Its on my agenda to address in the next couple weeks so I'll make sure and ping you with anything I find. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089727#comment-13089727 ] Filipe Manana commented on COUCHDB-1153: Paul, Thanks for your long analysis. My use case test, with a fairly large amount of databases, was under the scenario where they're constantly being updated with new documents, with delayed commits set to true (which makes couch_db:is_idle/1 return false very often) and several indexers running at the same time (not more than 25 indexers). I agree this could be much better in all aspects, and that's the motivation for having it disabled by default and having details such as the periodicity of the scans configurable. The allowed period parameter also helps people ensuring compactions will happen only in low activity periods (not an uncommon case). I would like to see all these details improved, together with a better configuration system (either using such a _meta object or _local documents per database, or whatever else), but that is far beyond this feature. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ;
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089512#comment-13089512 ] Damien Katz commented on COUCHDB-1153: -- Robert, Benoit, your issues can still be addressed. You can submit patches that improve upon Filipes work. But telling Filipe to code the patch your way, without code is not how this community works. Filipe's work is a feature people care about, and any objections of correctness have been addressed. Switching the code to an evented model, or any other improvements is welcome from you or any other community member, but users want this feature, and Filipe should not be expected to code it up to everyone else expectations before any check-in can occur. Improvement can, and should, happen continuously. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It de
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089419#comment-13089419 ] Jan Lehnardt commented on COUCHDB-1153: --- Robert, I wasn't expecting anything else, but without being able to quantify either situation, andy discussion is moot. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60% > ;The `foo` database is compacted if its fragmentation is
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089391#comment-13089391 ] Robert Newson commented on COUCHDB-1153: For the record, my "interrupt is better than polling _all_dbs" thing is derived from operational experience. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60% > ;The `foo` database is compacted if its fragmentation is 70% or mor
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089380#comment-13089380 ] Jan Lehnardt commented on COUCHDB-1153: --- Thanks for the reminder Benoit. Your -1 was one of the points that went over dev@ and got lost in this ticket. My rebuttal was that while I agree that a per-db config would be nice, we don't have an infrastructure for that currently and that adding one is out of the scope of this ticket. Once we have one, we should totally make use of it for this feature. I hope that convinces you to retract the -1. Sorry for not making this more implicit in this ticket earlier. As for the practical concerns, we need real world testing and benchmarks for problematic situations, no amount of armchair jiraing will get us any closer to a solution, so I'd again propose to retroactively OK that this patch lands in trunk. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ;
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089361#comment-13089361 ] Benoit Chesneau commented on COUCHDB-1153: -- My point hasn't be addressed too. I also put a formal -1 that have been ignored. I'm not so happy with that but that's not big deal either. One of the point I made join rnewson & davisp concern. I would prefer all this this thing more evented/asynchronous and in anycase (except on startup) be based on polling db lists and such. That open the door to a lot of expected problem. This daemon is also not the only one to run around. I've the feeling we could have a generic service in couch handling a pool of workers reacting on db events with different kind of workers could be used for that and useful for others purposes too. I will provide such thing asap (developed for refuge) probably on thursday on the release. Second is config / db. Having it configured in an ini file is not the best thing to do. Having to parse n lines / dbs is awkward. I would prefer this config like readers/admins set on a db level. On that part , it can of course be added later but I would really prefer we handle it when 1.2 is out. I will open another ticket for this _meta thing. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults t
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089354#comment-13089354 ] Jan Lehnardt commented on COUCHDB-1153: --- Robert, thanks for raising the review concerns, I thought between this ticket and dev@ they were all sorted out. I was wrong. Filipe, thanks for addressing Paul's initial concerns. Let's look at the open items: 1. 0.6 vs 60%: total bikeshed, I like 60% better, I think it is more user-friendly. But this is in no way a blocker and if someone feels strongly about this, I'm happy to be convinced in passing a patch. 2. background-sweep: I think both Paul's and Robert's concerns are valid. There's a reason this feature is disabled by default. We need to see some field testing before we know how it all works out in the end. I think it is futile to a) pretend we know what it looks like in practice* and b) aim for an "improvement" without having a reproducible benchmark. Paul, of the four detailed points you raise, I only found the "receive timeout" one to be a non-bikeshed. Filipe, could you look at that one? * I specifically don't want to discard the operational experience of the Cloudant folks here, their input is very valuable here. In summary: This is a major new feature. This is our first stab at it. It won't work perfectly in all conditions, not the ones we have thought of and not the ones we haven't thought of. But we'll never know if we don't put it out there. That's why it is disabled by default. This isn't to say we should start half-baked features to users. We shouldn't, but I believe this patch is in shape enough to go out there and get tested. There are plenty of features we added over the time that we had to refine later on, that's how this works. I don't see how this should be suddenly different for this specific feature. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size valu
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089077#comment-13089077 ] Paul Joseph Davis commented on COUCHDB-1153: @Filipe Re you're earlier comment: "It behaves fairly well, specially for the case where the number of databases is <= max_dbs_open." Yes. That's the be expected. Re-reading my earlier comment I wasn't as clear as I could have been. The issue here is that couch_server's LRU cache can easily turn into a table scan for every incoming open/create message that it receives. There are a couple of conditions that you need to satisfy for it to become noticeable, but when it happens it turns into a positive feedback loop that grinds couch_server to a halt and eventually crashes the VM due to running out of memory because it can't process its mailbox fast enough. First condition is that you have a large amount of active databases that is near the max_dbs_open limit. The reason that this is important is that you need a large number of databases for which couch_db:is_idle/1 returns false. The way that couch_server's LRU works is by checking the oldest used DB, and if its idle it scans for the next one. If there are lots of active db's and a largish max_dbs_open setting, this can turn into a largish loop as it scans through ETS looking for an idle db. I haven't tried triggering this on purpose yet, but if I were going to I'd start by setting the max_dbs_open to something like 1000, open 990 or so clients that are all listening to continuous changes to make sure that is_idle returns false for most db's. The test then would be to run a load test under this condition while the auto-compactor loops through all dbs. This would be especially painful where max_dbs_open covers say a 20% list of hot databases with some breathing room for the less often used databases. Obviously the "correct" solution here is to fix couch_server to not suck, but a proper fix there is going to take some serious engineering and will require modifying some critical pieces of code. The worry with the auto-compactor is that its going to make hitting this error condition more likely as it churns through a possibly large number of databases eating up open db slots in couch_server's ets tables. Then again it may be fine, but it doesn't sound like anyone's addressed it. Turning our attention to the patch itself, you've addressed most of what I commented on before but there are still things that I'd like to see changed that don't relate to the performance questions: * The issue with adding value formats like "60%" is that you have to spend time writing and maintaing code that's essentially useless. There's nothing that a % sign indicates that a simple comment wouldn't handle. And yet the parsing itself is prone to barfing on users if they happen to make a small typo or similar error. Specific error conditions that are apparent just reading the code " 60%" "60% ", "60%5" "60%%" etc etc. There's also no default clause when setting record members which will puke on users as well. * There are some record tricks you can use to remove some of the redundancy in this config code. Also, I've found it more sane to have two passes for this sort of thing. The first pass sets the values and the second pass enforces constraints. This makes things like the handling for #period much nicer. * There's still not a timeout on that receive clause for the parallel view builds. I understand there's a receive clause further down and "it could never happen" that we get stuck there. But we could. Cause there's no timeout. * The order of function definitions in this module is giving me the rage eyes. But maybe I'm the only crotchety bastard that grumbles to himself about such things. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] >
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088676#comment-13088676 ] Robert Newson commented on COUCHDB-1153: The comment about percentages was just a minor thing and certainly not part of my earlier concerns. I still feel that "60%" is a UI-level thing, but it's not a blocker for me. It sounds like you've tested under the conditions that concerned me, though your findings contradict my own. I also agree that it's a matter of tradeoffs. My proposal would be to have a much less frequent crawl of _all_dbs but to also build a higher priority queue of actively updated databases by hooked into db_update_notifier. This seems to work out quite well in practice. The infrequent background sweep would ensure that eventually everything gets compacted. In any case, my objections were relatively minor and easily addressable, my main issue was the apparent disregard for consensus building in this instance. It turns out to have been a simple misunderstanding, fortunately. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ;
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088483#comment-13088483 ] Filipe Manana commented on COUCHDB-1153: Thanks for your concerns Robert. Regarding the percentage, I haven't seen before any comment about it. I think it's more clear to users what the ranges are when explicitly using percentages. Either way this is a very minor thing imho. I've been testing this over the last 2 months or so in systems with a high number of databases where all of them are constantly being updated. It behaves fairly well, specially for the case where the number of databases is <= max_dbs_open. I haven't seen the case where a compaction lasts forever due to retries (2 to 5 retries were the worst scenario I think I ever saw). There's a tradeoff always. Asking the filesystem for a list of .couch files and opening each respective database to check if it needs to be compacted - There's a price to pay here yes, but there are also costs of not compacting databases often -> databases (and views) with high amounts of wasted space are much less cache friendly, and the server can run out of disk space quickly, something clearly undesirable as well. This is far from perfect yes (and so are many features of CouchDB or any other software). The periodicity of the scans if configurable, and doing such scans in systems with a small amount of databases (<= max_dbs_open) is not a major problem. Such deployments aren't an uncommon case. If you have a patch or concrete proposal, I'll be happy to review and get it into the repository. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088332#comment-13088332 ] Robert Newson commented on COUCHDB-1153: as a small note, and one Paul mentioned in passing, it seems simpler if the percentage values were expressed as ratios instead. That is, 0.6 instead of "60%". I'd also like more detail on the impact of periodically crawling all_dbs on a very active system. Having seen a significant negative impact of that approach in production I remain skeptical that it's a viable approach. I can contribute a patch to hook actively updated databases, for example. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corre
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088331#comment-13088331 ] Robert Newson commented on COUCHDB-1153: I'm concerned that this landed on trunk without a follow-up review once you'd addressed Paul' concerns. Since we will all share the burden of maintenance once this is included in a release, a little more effort to gain consensus would have been appreciated. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; >
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086096#comment-13086096 ] Filipe Manana commented on COUCHDB-1153: Paul, I'm addressing all the concerns pointed before. Some of them are already done and be tracked in individual commits: https://github.com/fdmanana/couchdb/commits/compaction_daemon The thing I'm not 100% sure is how to make the config and load/start of os_mon. I'm not that familiar with all those OTP structuring details. I've come up with this so far: http://friendpaste.com/R43WflJ8r75MupvXuS98v Using the args_file you mentioned makes sense, but we already have some stuff that could be moved into that new file, so I think it should go into a separate change. I'll help/do it, just need to figure out exactly how to do it and integrate into the build system / startup scripts. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and
Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
On Tue, Aug 16, 2011 at 2:58 AM, Benoit Chesneau wrote: > > Could be a local docs, But why didn't we took this path for this > "_security" object ? Also since they are really "meta" informations, > i've the feeling it should be solved as a special member in the db > file, just like the "_security" object. I don't know why _security is like it is now, that predates me, and it's another topic :) > > Anyway what I really dislike is saving per db configuration in an ini > file. Per db configuration should be done on the db. What if you more > than 100 dbs. Having 100 lines in an ini file to parse is awkward. I don't think the common case is to have a separate compact config for every single database. The fragmentation parameter, which is likely the most useful, you're likely to not set a different value for 100 databases (neither the period for e.g.). For other things like the oauth tokens/secrets, the .ini system doesn't scale. But that's again another topic. > This is just as simple as this line, creating a db create an entry in > a db index (or db file) that you can use later. > >> I suspect what you think is something like rather than scanning >> periodically, to let the daemon be notified when a db (or view) can be >> compacted? >> At some point I considered reacting to db_updated events but this was >> pretty much flooding the the event handler (daemon). >> Was this your idea? >> > > Using db events is my idea yes. If t actually flood the db event > handler (not sure why), then maybe we should fix it first? The problem is when you have many dbs in the system and under a reasonable write load, the daemon (which is the receiver of db_updated events) receives too many messages. To know if you need to compact the db after such message, you need to open it, and opening it on every message is a big burden as well. I tried this on a system with 1024 databases being updated constantly. It also doesn't deal with the case on startup where if a db with a high fragmentation is not updated for a long period, it won't have compaction started. If someone can measure the current solution's impact and present another working alternative with a lower impact (and practical tests, not just theory) I would be the first one wanting to make the change asap. > > - benoit > -- Filipe David Manana, fdman...@gmail.com, fdman...@apache.org "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085620#comment-13085620 ] Jan Lehnardt commented on COUCHDB-1153: --- I'm generally with the others that we should have a per-database configuration system for various things, but I think it is out of scope for this ticket. In the meantime, using the global configuration system seems fine. Down the road, the server config system will hold factory defaults with a per-database override. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particula
Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
On Tue, Aug 16, 2011 at 11:46 AM, Filipe David Manana wrote: > On Tue, Aug 16, 2011 at 2:38 AM, Benoit Chesneau wrote: >> On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) >> wrote: >>> >>> [ >>> https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085605#comment-13085605 >>> ] >>> >>> Filipe Manana commented on COUCHDB-1153: >>> >>> >>> I'm -1 on adding such a _meta thing. >> >> why? > > From your description, that _meta sounds like something that can be > done with _local docs. But that is a whole separate discussion and > topic I think. > Could be a local docs, But why didn't we took this path for this "_security" object ? Also since they are really "meta" informations, i've the feeling it should be solved as a special member in the db file, just like the "_security" object. Anyway what I really dislike is saving per db configuration in an ini file. Per db configuration should be done on the db. What if you more than 100 dbs. Having 100 lines in an ini file to parse is awkward. meta informations (like security, db params, ...) should be saved in the db file and available in the same time. Since we have already this _security object that is available when you open why not reusing it ? >> >> >>> I don't understand either that idea of _changes nor how it can be applied. >> >> creating db, adding db document to dbs db., update -> update db document. > > You'll have to elaborate a lot more than that :) I'm not familiar with > that bigcouch special db nor elasticsearch. > > Reacting to a changes feed of some database it's not something easy > (the _replicator db is such a case and might have been the hardest > thing i did ever for couch, really) > This is just as simple as this line, creating a db create an entry in a db index (or db file) that you can use later. > I suspect what you think is something like rather than scanning > periodically, to let the daemon be notified when a db (or view) can be > compacted? > At some point I considered reacting to db_updated events but this was > pretty much flooding the the event handler (daemon). > Was this your idea? > Using db events is my idea yes. If t actually flood the db event handler (not sure why), then maybe we should fix it first? - benoit
Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
On Tue, Aug 16, 2011 at 2:38 AM, Benoit Chesneau wrote: > On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) > wrote: >> >> [ >> https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085605#comment-13085605 >> ] >> >> Filipe Manana commented on COUCHDB-1153: >> >> >> I'm -1 on adding such a _meta thing. > > why? >From your description, that _meta sounds like something that can be done with _local docs. But that is a whole separate discussion and topic I think. > > >> I don't understand either that idea of _changes nor how it can be applied. > > creating db, adding db document to dbs db., update -> update db document. You'll have to elaborate a lot more than that :) I'm not familiar with that bigcouch special db nor elasticsearch. Reacting to a changes feed of some database it's not something easy (the _replicator db is such a case and might have been the hardest thing i did ever for couch, really) I suspect what you think is something like rather than scanning periodically, to let the daemon be notified when a db (or view) can be compacted? At some point I considered reacting to db_updated events but this was pretty much flooding the the event handler (daemon). Was this your idea? > -- Filipe David Manana, fdman...@gmail.com, fdman...@apache.org "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085613#comment-13085613 ] Benoit Chesneau commented on COUCHDB-1153: -- why not? I'm -1 on -1 without any arguments. And... security object is already used for such uses around. Annotating dbs is also something people wants around. Creating a db -> create a db document, Update -> update ? Simple enough. Can be used by people who want to have a db listener for any purpose. (Also solve an old ticket). > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to >
Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085605#comment-13085605 > ] > > Filipe Manana commented on COUCHDB-1153: > > > I'm -1 on adding such a _meta thing. why? > I don't understand either that idea of _changes nor how it can be applied. creating db, adding db document to dbs db., update -> update db document.
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085605#comment-13085605 ] Filipe Manana commented on COUCHDB-1153: I'm -1 on adding such a _meta thing. I don't understand either that idea of _changes nor how it can be applied. An alternative is to store compaction settings in a _local document e.g. _local/compaction. Which could then be used to store previous compaction running times and other stats. But to determine whether compaction is needed, it requires extra IO, one btree lookup (even if _local btrees are normally very small). > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data siz
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085603#comment-13085603 ] Benoit Chesneau commented on COUCHDB-1153: -- about the _all_dbs scanning, maybe we could have a database maintaing created dbs like cloudant do. Or Elasticsearch for that purpose. Rather than scanning _all_dbs it oculd react on _changes ? > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to > compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fr
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085602#comment-13085602 ] Benoit Chesneau commented on COUCHDB-1153: -- I'm -1 on this patch. Passing db options in the ini file seems awkward. But I really like the idea of a daemon. We should rather have these options saved when creating a db via query parameters or headers. It may be the perfect time to transform this "_security" object in a "_meta" object used to save such db's settings . So we could do : create a db: PUT /db?db_fragmentation= Update setting PUT /db/_meta Options could be passed as a meta document when creating the db too rather than passing an empty body. Note the _meta object could be later used for other purposes by app developers to annotate a db.. Like some devs already do with this "_security" object. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085599#comment-13085599 ] Robert Newson commented on COUCHDB-1153: Could you hold off on this commit until after the srcmv? I'd really prefer to see it be added as a separate, optional application, not core. Different environments will need quite different approaches to compaction scheduling. It seems this patch causes a periodic scan of all_dbs? If so, I don't think that's going to fly in a hosted environment like Cloudant's (or, presumably, IrisCouch). > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space > is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view i
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085537#comment-13085537 ] Filipe Manana commented on COUCHDB-1153: Due to lack of Jira inline comments: * You have a "Pid = spawn_link/1, MonRef = erlang:monitor(process, Pid)" sequence for the parallel view compactor. One of these is redundant. You want a link if you want the compactor_loop to exit when the view compactor crashes, or you want the monitor if you just want to know when it dies. True. But if the compaction daemon crashes, i want the workers to stop. The spawn_linked process doesn't do much more then triggering view compactions and monitor the pids, so unlikely to exit with a reason other than normal. * When you wait for the view compaction process to end there's no timeout. That means that the compactor loop could never move depending on whether the view compactor process exits or not. I think you're talking for the parallel view compaction case. In that case the spawn linked process doesn't use an after timeout clause because what it calls, maybe_compact_views/2, already accounts for the period/strict_window. * You never flush monitor messages. This means the compact_loop process mailbox will slowly fill with messages over time causing hard to track memory leaks. Yep, that was forgotten, for the case the timers are triggered. * Views don't seem to be checked to see if they need to be compacted if their database doesn't need to be. Forgotten as well. * View compaction holds open a reference to the database its compacting views for. What happens if views haven't finished compacting before the main database compaction gets swapped out? Actually the daemon doesn't need to have the db open when it triggers view compactions. As for the parallel case, yes, I considered it before for the long term case. thanks again Paul > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085523#comment-13085523 ] Paul Joseph Davis commented on COUCHDB-1153: My thoughts on the loop were based on my day dreaming that it's entirely possible that there's going to be feature requests to handle multiple simultaneous compactions. I tend to have better luck reacting to messages to maintain the state of a set of long running process directly from the gen_server rather than have this middleman process looping around accepting messages. Also, the more I look at this compact_loop the more things I see wrong with it: * You have a "Pid = spawn_link/1, MonRef = erlang:monitor(process, Pid)" sequence for the parallel view compactor. One of these is redundant. You want a link if you want the compactor_loop to exit when the view compactor crashes, or you want the monitor if you just want to know when it dies. * When you wait for the view compaction process to end there's no timeout. That means that the compactor loop could never move depending on whether the view compactor process exits or not. * You never flush monitor messages. This means the compact_loop process mailbox will slowly fill with messages over time causing hard to track memory leaks. * Views don't seem to be checked to see if they need to be compacted if their database doesn't need to be. * View compaction holds open a reference to the database its compacting views for. What happens if views haven't finished compacting before the main database compaction gets swapped out? I'd prefer to either have os_mon in an app file or started as an app when the VM boots. If we're going to talk about moving towards being more OTP compliant we should be trying to avoid adding more non-OTP bits when possible. The important part to trigger the couch_server issues you need to have a lot of active databases as well as a lot of load so that try_close_lru turns into a table scan of that ets table. Adam rewrote couch_server quite a long time ago to replace this so that requests for open databases turned into a single ets lookup on a public table which helped quite a bit. Though it introduces the possibility of a race condition when opening a database that's just about to be shut. Since then other things have been fixed and couch_server has become a bottleneck again. I looked at it the other day and the only thing I came up with would require some non-trivial changes to the close semantics of databases. I think the general approach here is quite good and I'm quite fine with leaving room for improvement. On the flip side, we need to avoid just pushing features into trunk without considering how we might be asked to improve them or what sort of maintenance cost they'll incur. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over t
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085507#comment-13085507 ] Filipe Manana commented on COUCHDB-1153: Thanks Paul Not sure about what you mean with the loop weirdness. Doesn't seem complicated to me: loop() -> do_stuff(), sleep(...), loop(). An alternative ti start os_mon (i really don't care) is to add it to list it as a dependency in the .app file. You're right about the couch_server. It's part of the reason why the autocompaction is disabled by default. Haven't seen however yet a big issue with about ~1000 databases. An approach would be to wait a bit before opening a db if it's not in the lru cache perhahps. Certainly there's a lot of room for improvements in auto compaction and an initial implementation will unlikely ever be perfect for all scenarios. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the > allowed > ; period, it will be canceled if this parameter is set to > "yes". > ; It defaults to "no" and it's meaningful only if the > *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the > database > ; and view index direct
[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085491#comment-13085491 ] Paul Joseph Davis commented on COUCHDB-1153: Couple notes so far. I'm don't care much either way, but I would've just parsed proplists from Erlang terms from the config file like we do for other various options instead of creating the key=val syntax goop. Never register anonymous config change functions. Always register functions using the M:F/A pattern. This has to do with how functions are called and code reloading. If module aren't calling exported functions it'll eventually cause random processes to crash when the code they were referring to is purged. I'm not a super huge fan of how os_mon is being started. There's a -args_file command line switch that we might want to look into supporting for VM configuration. The compact_loop thing seems kinda weird. A pattern I've had luck with lately is to use erlang:send_interval to replace loops like that. Not super concerned about this, but on first skim it looks like it could clean that loop's logic up a bit. Also, I'm wondering if there should be some sort of throttling on how quickly the scan for databases to compact runs. The concern is that for installs that have non-trivial numbers of databases this could start doing mean things to couch_server as well as start thrashing system resources by opening and closing a large number of files. > Database and view index compaction daemon > - > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk >Reporter: Filipe Manana >Assignee: Filipe Manana >Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases > and they're views based on some configurable parameters. These parameters can > be global or per database and are: minimum database fragmentation, minimum > view fragmentation, allowed period and "strict_window" (whether an ongoing > compaction should be canceled if it doesn't finish within the allowed > period). These fragmentation values are based on the recently added > "data_size" parameter to the database and view group information URIs > (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I > paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view > indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all > the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the > database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the > amount > ;of old data (and its supporting metadata) over the > view > ;index (view group) file size is equal to or greater > then > ;this value, then this view index compaction > condition is > ;satisfied. This value is computed as: > ; > ;(file_size - data_size) / file_size * 100 > ; > ;The data_size and file_size values can be obtained > when > ;querying a view group's information URI > ;(GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ;is allowed. This value must obey the following format: > ; > ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * s