[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085599#comment-13085599
 ] 

Robert Newson commented on COUCHDB-1153:


Could you hold off on this commit until after the srcmv? I'd really prefer to 
see it be added as a separate, optional application, not core. Different 
environments will need quite different approaches to compaction scheduling.

It seems this patch causes a periodic scan of all_dbs? If so, I don't think 
that's going to fly in a hosted environment like Cloudant's (or, presumably, 
IrisCouch).

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085602#comment-13085602
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

I'm -1 on this patch. Passing db options in the ini file seems awkward. But I 
really like the idea of a daemon.

We should rather have these options saved  when creating a db via query 
parameters or headers.  It may be the perfect time to transform this 
_security object in a _meta object used to save such db's settings . So we 
could do :

create a db:

PUT /db?db_fragmentation= 

Update setting

PUT /db/_meta 

Options could be passed as a meta document when creating the db too rather than 
passing an empty body. 

Note the _meta object could be later used for other purposes by app developers 
to annotate a db.. Like some devs already do with this _security object.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; 

[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085603#comment-13085603
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

about the _all_dbs scanning, maybe we could have a database maintaing created 
dbs like cloudant do. Or Elasticsearch for that purpose. Rather than scanning 
_all_dbs it oculd react  on _changes ?

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
 ;The `foo` database is compacted if its fragmentation is 70% or more.
 ;Any view index 

[jira] [Commented] (COUCHDB-1012) Utility to help plugin developers manage paths

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085607#comment-13085607
 ] 

Benoit Chesneau commented on COUCHDB-1012:
--

most of multi-platforms programs provide both (pkg-config and their own config 
stuff) sicne pkg-config isn't installed by default on some platforms. Do you 
think it could be a problem to have both too with couchdb? 

Also what is the status of this ticket? What **actions** should we do to close 
it in near future? 

 Utility to help plugin developers manage paths
 --

 Key: COUCHDB-1012
 URL: https://issues.apache.org/jira/browse/COUCHDB-1012
 Project: CouchDB
  Issue Type: New Feature
  Components: Build System
Reporter: Randall Leeds
Assignee: Randall Leeds
 Fix For: 1.2

 Attachments: 
 0001-add-couch-config-file-used-to-ease-the-build-of-plug.patch, 
 0001-add-couch-config-file-used-to-ease-the-build-of-plug.patch, 
 0001-support-pkg-config-for-plugins-COUCHDB-1012.patch


 Developers may want to write plugins (like GeoCouch) for CouchDB. Many hooks 
 in the configuration system allow loading arbitrary Erlang modules to handle 
 various internal tasks, but currently there is no straightforward and 
 portable way for developers of these plugins to discover the location of the 
 CouchDB library files.
 Two options that have been proposed are to use pkg-config or install a 
 separate script that could be invoked (e.g. as couch-config --erl-libs) to 
 discover important CouchDB installation paths.
 As far as I know the loudest argument against pkg-config is lack of support 
 for Windows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau
On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085605#comment-13085605
  ]

 Filipe Manana commented on COUCHDB-1153:
 

 I'm -1 on adding such a _meta thing.

why?


 I don't understand either that idea of _changes nor how it can be applied.

creating db, adding db document to dbs db., update - update db document.


[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085613#comment-13085613
 ] 

Benoit Chesneau commented on COUCHDB-1153:
--

why not? I'm -1 on -1 without any arguments. And... security object is already 
used for such uses around. Annotating dbs is also something people wants 
around. 


Creating a db - create a db document, Update - update ? Simple enough. Can be 
used by people who want to have a db listener for any purpose. (Also solve an 
old ticket).




 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 ; needed is computed. This estimation corresponds to 2 times the data size of
 ; the database or view index. When there's not enough free disk space to 
 compact
 ; a particular database or view index, a warning message is logged.
 ;
 ; Examples:
 ;
 ; 1) foo = 

Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Benoit Chesneau
On Tue, Aug 16, 2011 at 11:46 AM, Filipe David Manana
fdman...@apache.org wrote:
 On Tue, Aug 16, 2011 at 2:38 AM, Benoit Chesneau bchesn...@gmail.com wrote:
 On Tue, Aug 16, 2011 at 11:30 AM, Filipe Manana (JIRA) j...@apache.org 
 wrote:

    [ 
 https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085605#comment-13085605
  ]

 Filipe Manana commented on COUCHDB-1153:
 

 I'm -1 on adding such a _meta thing.

 why?

 From your description, that _meta sounds like something that can be
 done with _local docs. But that is a whole separate discussion and
 topic I think.


Could be a local docs, But why didn't we took this path for this
_security object ? Also since they are really meta informations,
i've the feeling it should be solved as a special member in the db
file, just like the _security object.

Anyway what I really dislike is saving per db configuration in an ini
file. Per db configuration should be done on the db. What if you more
than 100 dbs. Having 100 lines in an ini file to parse is awkward.
meta informations (like security, db params, ...) should be saved in
the db file and available in the same time. Since we have already this
_security object that is available when you open why not reusing it ?



 I don't understand either that idea of _changes nor how it can be applied.

 creating db, adding db document to dbs db., update - update db document.

 You'll have to elaborate a lot more than that :) I'm not familiar with
 that bigcouch special db nor elasticsearch.

 Reacting to a changes feed of some database it's not something easy
 (the _replicator db is such a case and might have been the hardest
 thing i did ever for couch, really)



This is just as simple as this line, creating a db create an entry in
a db index (or db file) that you can use later.

 I suspect what you think is something like rather than scanning
 periodically, to let the daemon be notified when a db (or view) can be
 compacted?
 At some point I considered reacting to db_updated events but this was
 pretty much flooding the the event handler (daemon).
 Was this your idea?


Using db events is my idea yes.  If t actually flood the db event
handler (not sure why), then maybe we should fix it first?

- benoit


compaction plugin, auth handler, foo plugin couchdb core

2011-08-16 Thread Benoit Chesneau
Hi devs,

Today I see lot of interesting things coming in CouchDB, but also lot
of different interests and different usages.  Sometimes you need to
extend couch for your usage. But today if you except the current work
on the view engine by paul, the couchdb code become more and more an
aggregation of code fixing speci


Remove 1.0.2 release from Apache Mirrors

2011-08-16 Thread Jan Lehnardt
Hi,

in the spirit of keeping things clean I'd like to remove the 1.0.2 release from 
the mirrors and put it into the archive.

If nobody objects, I'll do this tomorrow.

Cheers
Jan
-- 



compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Benoit Chesneau
Hi devs;

Today I see lot of interesting things coming in CouchDB, but also lot of
different interests and different usages.  Sometimes you need to extend
couch for your usage. But today if you except the current work on the
view engine by paul, the couchdb code become more and more monolithic or
an aggregation of code adding some specific features/changes, while not
envisioning what could be done by others. Also the way you have to
extend couchdb make it difficult today to use/merge/... different forks
around like the one done by cloudant, couchbase and even mine in
refuge/upondata probably some others too). Couch core should be
lighter and more open (in its strict sense).

For example today, http layer(?), replicator(?), proxy, external
daemons, couchapp engine, rewriter, vhosts, compaction daemon, some auth
handler could be available as plugins. couch_config could be more
generic and not relying on an ini file. More specifically we could have
a couch core looking more like a mnesia alternative, the couchdb
application, which could be couch core + plugins, distributed as a
standalone app (like couchdb is actually). This would also maybe allow
cloudant, couchbase and other to reuse the same core rather than forking
it while adding their own plugins. Official Plugins could also be
maintained as standalone projects maybe.


I wish we could concentrate on that topic for 2.0x and make it a
priority. That would imply to define what is the couch core, split the
code [1] and what is a plugin [2]. Maybe the couchdb app can also be a
full erlang release [3] built with autotools. I think that this
plugabble structure should be done for example before adding any new
daemon like the compaction daemon. Don't get me wrong I really like the
idea to have a default compaction daemon in the couchdb app, and this is
just an example. But I also want the possibility to add mine working
differently (or not) and this should be done for the default couchdb
release, couch core imo should be more neutral.

Maybe we could start by opening tickets about different tasks to track
them? What is blocking the split currently since the 1.0.3 is out? Do
we wait for the svn-to-git conversion?

- benoît

[1] https://github.com/davisp/couchdb-srcmv
[2] https://issues.apache.org/jira/browse/COUCHDB-1012
[3] http://www.erlang.org/doc/design_principles/release_structure.html


Re: compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Robert Newson
+1 on splitting into more focused and OTP compliant applications.
Separating core from httpd in particular.

Sent from my iPhone

On 16 Aug 2011, at 12:21, Benoit Chesneau bchesn...@gmail.com wrote:

 Hi devs;

 Today I see lot of interesting things coming in CouchDB, but also lot of
 different interests and different usages.  Sometimes you need to extend
 couch for your usage. But today if you except the current work on the
 view engine by paul, the couchdb code become more and more monolithic or
 an aggregation of code adding some specific features/changes, while not
 envisioning what could be done by others. Also the way you have to
 extend couchdb make it difficult today to use/merge/... different forks
 around like the one done by cloudant, couchbase and even mine in
 refuge/upondata probably some others too). Couch core should be
 lighter and more open (in its strict sense).

 For example today, http layer(?), replicator(?), proxy, external
 daemons, couchapp engine, rewriter, vhosts, compaction daemon, some auth
 handler could be available as plugins. couch_config could be more
 generic and not relying on an ini file. More specifically we could have
 a couch core looking more like a mnesia alternative, the couchdb
 application, which could be couch core + plugins, distributed as a
 standalone app (like couchdb is actually). This would also maybe allow
 cloudant, couchbase and other to reuse the same core rather than forking
 it while adding their own plugins. Official Plugins could also be
 maintained as standalone projects maybe.


 I wish we could concentrate on that topic for 2.0x and make it a
 priority. That would imply to define what is the couch core, split the
 code [1] and what is a plugin [2]. Maybe the couchdb app can also be a
 full erlang release [3] built with autotools. I think that this
 plugabble structure should be done for example before adding any new
 daemon like the compaction daemon. Don't get me wrong I really like the
 idea to have a default compaction daemon in the couchdb app, and this is
 just an example. But I also want the possibility to add mine working
 differently (or not) and this should be done for the default couchdb
 release, couch core imo should be more neutral.

 Maybe we could start by opening tickets about different tasks to track
 them? What is blocking the split currently since the 1.0.3 is out? Do
 we wait for the svn-to-git conversion?

 - benoît

 [1] https://github.com/davisp/couchdb-srcmv
 [2] https://issues.apache.org/jira/browse/COUCHDB-1012
 [3] http://www.erlang.org/doc/design_principles/release_structure.html


Re: compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Jan Lehnardt
Hi Benoit,

thanks for raising this again. I think we have a good plan to get started but
it wouldn't hurt to get a little more organised. I think the plan is as follows:

1. Move to git, this makes all the subsequent steps more easy.

2. srcmv, reorganising the source code so we are prepared to do all the things
you mention and all the other things we talked about in the past :)

3. Profit.

--

As for my wish list, all this post the git move:

We could release 1.2 based off of current trunk + a few of the more 
useful JIRA patches that we haven't committed yet.

After 1.2.x is branched, srcmv trunk and start the internal refactoring
and pluginnifying and release 1.3 off that.

At some point merging between before and after srcmv merging
patches is going to be a pain, so I'd like to keep that time as short
as possible and thus keep the differences between 1.2 and 1.3 (given
that these are the border cases) as small as possible.

Cheers
Jan
-- 


On Aug 16, 2011, at 1:20 PM, Benoit Chesneau wrote:

 Hi devs;
 
 Today I see lot of interesting things coming in CouchDB, but also lot of
 different interests and different usages.  Sometimes you need to extend
 couch for your usage. But today if you except the current work on the
 view engine by paul, the couchdb code become more and more monolithic or
 an aggregation of code adding some specific features/changes, while not
 envisioning what could be done by others. Also the way you have to
 extend couchdb make it difficult today to use/merge/... different forks
 around like the one done by cloudant, couchbase and even mine in
 refuge/upondata probably some others too). Couch core should be
 lighter and more open (in its strict sense).
 
 For example today, http layer(?), replicator(?), proxy, external
 daemons, couchapp engine, rewriter, vhosts, compaction daemon, some auth
 handler could be available as plugins. couch_config could be more
 generic and not relying on an ini file. More specifically we could have
 a couch core looking more like a mnesia alternative, the couchdb
 application, which could be couch core + plugins, distributed as a
 standalone app (like couchdb is actually). This would also maybe allow
 cloudant, couchbase and other to reuse the same core rather than forking
 it while adding their own plugins. Official Plugins could also be
 maintained as standalone projects maybe.
 
 
 I wish we could concentrate on that topic for 2.0x and make it a
 priority. That would imply to define what is the couch core, split the
 code [1] and what is a plugin [2]. Maybe the couchdb app can also be a
 full erlang release [3] built with autotools. I think that this
 plugabble structure should be done for example before adding any new
 daemon like the compaction daemon. Don't get me wrong I really like the
 idea to have a default compaction daemon in the couchdb app, and this is
 just an example. But I also want the possibility to add mine working
 differently (or not) and this should be done for the default couchdb
 release, couch core imo should be more neutral.
 
 Maybe we could start by opening tickets about different tasks to track
 them? What is blocking the split currently since the 1.0.3 is out? Do
 we wait for the svn-to-git conversion?
 
 - benoît
 
 [1] https://github.com/davisp/couchdb-srcmv
 [2] https://issues.apache.org/jira/browse/COUCHDB-1012
 [3] http://www.erlang.org/doc/design_principles/release_structure.html



Re: Bringing automatic compaction into trunk

2011-08-16 Thread Jan Lehnardt
Good points Robert,

I replied inline and then hijacked the thread for a more general discussion, 
sorry about that  :)

On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:

 Filipe,
 
  This is neat, I can definitely see the utility of the approach. I do share 
 the concerns expressed in other comments with respect to the use of the 
 config file for per db compaction specs and the use of a compact_loop that 
 waits on config change messages when the ets table is empty. I don't think it 
 fully takes into account the use case of large numbers of small dbs and/or 
 some very large dbs interspersed with a lot of mid-size dbs.

As I seid in the ticket, per-db config is desirable, but I think outside of the 
scope of the ticket.

  Anyway I like it a lot though I've only read the code for 1/2 and hour or 
 so. I also agree with others that the code base is reaching a point of being 
 a bit crufty and it might be a good time with the git migration, etc.. to 
 take a breath and commit to making some of these OTP compliant changes and 
 design changes we've talked about.

Just curious, would it make a big difference to commit the patch before srcmv 
and migrate it with the rest of the code base rather than letting it rot in 
JIRA and leave it all to Filipe to keep it updated.

I also fear that a srcmv'd release is still out a bit and I'd really like to 
see this one (and a few others) go into 1.2 (as per my previous mail to this 
list in another thread). While it isn't the absolute perfect solution in all 
cases, it is disabled by default and manual compaction strategies work as they 
did before. In the meantime, we can refine the rest of the system to make it 
more fully fledged and maybe even enable it by default a few versions down when 
we are all comfortable with it. I'm not very comfortable keeping good patches 
in JIRA and not trunk until they solve every little edge case. We haven't 
worked like this in the past and I don't think it is worth doing.

Cheers
Jan
-- 




 
 Regards,
 
 Bob
 
 
 
 
 
 On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
 
 Developers, users,
 
 It's been a while now since I opened a Jira ticket for it (
 https://issues.apache.org/jira/browse/COUCHDB-1153 ).
 I won't describe it here with detail since it's already done in the Jira 
 ticket.
 
 Unless there are objections, I would like to get this moving soon.
 
 Thanks
 
 
 -- 
 Filipe David Manana,
 fdman...@gmail.com, fdman...@apache.org
 
 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
 



Re: Bringing automatic compaction into trunk

2011-08-16 Thread Robert Newson
I'm -1 on the approach (as I understand it) taken by the scheduler as
it will be problematic in precisely the circumstance when you'd most
want auto compaction (large numbers of databases and views).

To this point Just curious, would it make a big difference to commit
the patch before srcmv and migrate it with the rest of the code base
rather than letting it rot in JIRA and leave it all to Filipe to keep
it updated. -- I'm -∞ on any suggestion that code should be put in
trunk to stop it from rotting. Code should land when it's ready. I
hope we're all agreed on that and that this paragraph was redundant.

After srcmv, and then some work to OTP-ify each of the resultant
subdirs, we should add this as a separate application. We might also
mark it as beta in the first release to gather feedback from the
community.

I'll be accused of 'stop energy' within nanoseconds of this post so I
should end by saying I'm +1 on couchdb gaining the ability to
automatically compact its databases and views in principle.

B.

On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,

 I replied inline and then hijacked the thread for a more general discussion, 
 sorry about that  :)

 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:

 Filipe,

  This is neat, I can definitely see the utility of the approach. I do share 
 the concerns expressed in other comments with respect to the use of the 
 config file for per db compaction specs and the use of a compact_loop that 
 waits on config change messages when the ets table is empty. I don't think 
 it fully takes into account the use case of large numbers of small dbs 
 and/or some very large dbs interspersed with a lot of mid-size dbs.

 As I seid in the ticket, per-db config is desirable, but I think outside of 
 the scope of the ticket.

  Anyway I like it a lot though I've only read the code for 1/2 and hour or 
 so. I also agree with others that the code base is reaching a point of being 
 a bit crufty and it might be a good time with the git migration, etc.. to 
 take a breath and commit to making some of these OTP compliant changes and 
 design changes we've talked about.

 Just curious, would it make a big difference to commit the patch before srcmv 
 and migrate it with the rest of the code base rather than letting it rot in 
 JIRA and leave it all to Filipe to keep it updated.

 I also fear that a srcmv'd release is still out a bit and I'd really like to 
 see this one (and a few others) go into 1.2 (as per my previous mail to this 
 list in another thread). While it isn't the absolute perfect solution in all 
 cases, it is disabled by default and manual compaction strategies work as 
 they did before. In the meantime, we can refine the rest of the system to 
 make it more fully fledged and maybe even enable it by default a few versions 
 down when we are all comfortable with it. I'm not very comfortable keeping 
 good patches in JIRA and not trunk until they solve every little edge case. 
 We haven't worked like this in the past and I don't think it is worth doing.

 Cheers
 Jan
 --





 Regards,

 Bob





 On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:

 Developers, users,

 It's been a while now since I opened a Jira ticket for it (
 https://issues.apache.org/jira/browse/COUCHDB-1153 ).
 I won't describe it here with detail since it's already done in the Jira 
 ticket.

 Unless there are objections, I would like to get this moving soon.

 Thanks


 --
 Filipe David Manana,
 fdman...@gmail.com, fdman...@apache.org

 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.





Re: Bringing automatic compaction into trunk

2011-08-16 Thread Jan Lehnardt

On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:

 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).

As Filipe mentions in the ticket, this was tested with large numbers of
databases.

In addition, your most want assumption doesn't hold for the average
user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
plus that a software doesn't start wasting a system resource without
cleaning up after itself. But this isn't even suggesting to enable this by
default. We have plenty of other features that need proper documentation
to be used correctly and that we are improving over time to make them
more obvious by removing common errors or odd behaviour.

 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.

I was suggesting that the the patch is ready enough for trunk and that
the level of readiness should not be solves all possible cases. Especially
for something that is disabled by default. If we take this to the extreme,
we'd never add any new features.

I'm not suggesting it compiles for me, lets throw it into trunk.

 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.

I don't see how that is any different from adding it before srcmv and
avoiding leaving the front-porting effort to a single person.

Ideally we'd already have srcmv done, but we don't and I don't want
to hold off progress for an architecture change.

 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.

:)

Cheers
Jan
-- 


 
 B.
 
 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,
 
 I replied inline and then hijacked the thread for a more general discussion, 
 sorry about that  :)
 
 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
 
 Filipe,
 
  This is neat, I can definitely see the utility of the approach. I do share 
 the concerns expressed in other comments with respect to the use of the 
 config file for per db compaction specs and the use of a compact_loop that 
 waits on config change messages when the ets table is empty. I don't think 
 it fully takes into account the use case of large numbers of small dbs 
 and/or some very large dbs interspersed with a lot of mid-size dbs.
 
 As I seid in the ticket, per-db config is desirable, but I think outside of 
 the scope of the ticket.
 
  Anyway I like it a lot though I've only read the code for 1/2 and hour or 
 so. I also agree with others that the code base is reaching a point of 
 being a bit crufty and it might be a good time with the git migration, 
 etc.. to take a breath and commit to making some of these OTP compliant 
 changes and design changes we've talked about.
 
 Just curious, would it make a big difference to commit the patch before 
 srcmv and migrate it with the rest of the code base rather than letting it 
 rot in JIRA and leave it all to Filipe to keep it updated.
 
 I also fear that a srcmv'd release is still out a bit and I'd really like to 
 see this one (and a few others) go into 1.2 (as per my previous mail to this 
 list in another thread). While it isn't the absolute perfect solution in all 
 cases, it is disabled by default and manual compaction strategies work as 
 they did before. In the meantime, we can refine the rest of the system to 
 make it more fully fledged and maybe even enable it by default a few 
 versions down when we are all comfortable with it. I'm not very comfortable 
 keeping good patches in JIRA and not trunk until they solve every little 
 edge case. We haven't worked like this in the past and I don't think it is 
 worth doing.
 
 Cheers
 Jan
 --
 
 
 
 
 
 Regards,
 
 Bob
 
 
 
 
 
 On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
 
 Developers, users,
 
 It's been a while now since I opened a Jira ticket for it (
 https://issues.apache.org/jira/browse/COUCHDB-1153 ).
 I won't describe it here with detail since it's already done in the Jira 
 ticket.
 
 Unless there are objections, I would like to get this moving soon.
 
 Thanks
 
 
 --
 Filipe David Manana,
 fdman...@gmail.com, fdman...@apache.org
 
 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
 
 
 



Re: Bringing automatic compaction into trunk

2011-08-16 Thread Robert Newson
All good points Jan, thanks.

Having large numbers of databases is one thing, but I'm focused on the
impact on ongoing operations with this running in the background. What
does it do to the users experience to have all dbs scanned
periodically, etc?

The reason I suggest doing it after the move, and in its own app, is
to reduce the work needed to not use this code in some circumstances
(Cloudant hosting, for example). Yes, it's a separate module and
disabled by default, but putting it in its own application will make
the separation much more explicit and preclude unintended
entanglements with core over time.

B.

On 16 August 2011 14:31, Jan Lehnardt j...@apache.org wrote:

 On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:

 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).

 As Filipe mentions in the ticket, this was tested with large numbers of
 databases.

 In addition, your most want assumption doesn't hold for the average
 user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
 plus that a software doesn't start wasting a system resource without
 cleaning up after itself. But this isn't even suggesting to enable this by
 default. We have plenty of other features that need proper documentation
 to be used correctly and that we are improving over time to make them
 more obvious by removing common errors or odd behaviour.

 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.

 I was suggesting that the the patch is ready enough for trunk and that
 the level of readiness should not be solves all possible cases. Especially
 for something that is disabled by default. If we take this to the extreme,
 we'd never add any new features.

 I'm not suggesting it compiles for me, lets throw it into trunk.

 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.

 I don't see how that is any different from adding it before srcmv and
 avoiding leaving the front-porting effort to a single person.

 Ideally we'd already have srcmv done, but we don't and I don't want
 to hold off progress for an architecture change.

 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.

 :)

 Cheers
 Jan
 --



 B.

 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,

 I replied inline and then hijacked the thread for a more general 
 discussion, sorry about that  :)

 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:

 Filipe,

  This is neat, I can definitely see the utility of the approach. I do 
 share the concerns expressed in other comments with respect to the use of 
 the config file for per db compaction specs and the use of a compact_loop 
 that waits on config change messages when the ets table is empty. I don't 
 think it fully takes into account the use case of large numbers of small 
 dbs and/or some very large dbs interspersed with a lot of mid-size dbs.

 As I seid in the ticket, per-db config is desirable, but I think outside of 
 the scope of the ticket.

  Anyway I like it a lot though I've only read the code for 1/2 and hour or 
 so. I also agree with others that the code base is reaching a point of 
 being a bit crufty and it might be a good time with the git migration, 
 etc.. to take a breath and commit to making some of these OTP compliant 
 changes and design changes we've talked about.

 Just curious, would it make a big difference to commit the patch before 
 srcmv and migrate it with the rest of the code base rather than letting it 
 rot in JIRA and leave it all to Filipe to keep it updated.

 I also fear that a srcmv'd release is still out a bit and I'd really like 
 to see this one (and a few others) go into 1.2 (as per my previous mail to 
 this list in another thread). While it isn't the absolute perfect solution 
 in all cases, it is disabled by default and manual compaction strategies 
 work as they did before. In the meantime, we can refine the rest of the 
 system to make it more fully fledged and maybe even enable it by default a 
 few versions down when we are all comfortable with it. I'm not very 
 comfortable keeping good patches in JIRA and not trunk until they solve 
 every little edge case. We haven't worked like this in the past and I don't 
 

Re: Bringing automatic compaction into trunk

2011-08-16 Thread Jan Lehnardt

On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:

 All good points Jan, thanks.
 
 Having large numbers of databases is one thing, but I'm focused on the
 impact on ongoing operations with this running in the background. What
 does it do to the users experience to have all dbs scanned
 periodically, etc?
 
 The reason I suggest doing it after the move, and in its own app, is
 to reduce the work needed to not use this code in some circumstances
 (Cloudant hosting, for example). Yes, it's a separate module and
 disabled by default, but putting it in its own application will make
 the separation much more explicit and preclude unintended
 entanglements with core over time.

I think this is a valid concern, but I don't think it outweighs the
disadvantage. I'm happy to spend time to make sure this is properly
modular after srcmv.

Cheers
Jan
-- 


 
 B.
 
 On 16 August 2011 14:31, Jan Lehnardt j...@apache.org wrote:
 
 On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
 
 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).
 
 As Filipe mentions in the ticket, this was tested with large numbers of
 databases.
 
 In addition, your most want assumption doesn't hold for the average
 user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
 plus that a software doesn't start wasting a system resource without
 cleaning up after itself. But this isn't even suggesting to enable this by
 default. We have plenty of other features that need proper documentation
 to be used correctly and that we are improving over time to make them
 more obvious by removing common errors or odd behaviour.
 
 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.
 
 I was suggesting that the the patch is ready enough for trunk and that
 the level of readiness should not be solves all possible cases. Especially
 for something that is disabled by default. If we take this to the extreme,
 we'd never add any new features.
 
 I'm not suggesting it compiles for me, lets throw it into trunk.
 
 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.
 
 I don't see how that is any different from adding it before srcmv and
 avoiding leaving the front-porting effort to a single person.
 
 Ideally we'd already have srcmv done, but we don't and I don't want
 to hold off progress for an architecture change.
 
 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.
 
 :)
 
 Cheers
 Jan
 --
 
 
 
 B.
 
 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,
 
 I replied inline and then hijacked the thread for a more general 
 discussion, sorry about that  :)
 
 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
 
 Filipe,
 
  This is neat, I can definitely see the utility of the approach. I do 
 share the concerns expressed in other comments with respect to the use of 
 the config file for per db compaction specs and the use of a compact_loop 
 that waits on config change messages when the ets table is empty. I don't 
 think it fully takes into account the use case of large numbers of small 
 dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
 
 As I seid in the ticket, per-db config is desirable, but I think outside 
 of the scope of the ticket.
 
  Anyway I like it a lot though I've only read the code for 1/2 and hour 
 or so. I also agree with others that the code base is reaching a point of 
 being a bit crufty and it might be a good time with the git migration, 
 etc.. to take a breath and commit to making some of these OTP compliant 
 changes and design changes we've talked about.
 
 Just curious, would it make a big difference to commit the patch before 
 srcmv and migrate it with the rest of the code base rather than letting it 
 rot in JIRA and leave it all to Filipe to keep it updated.
 
 I also fear that a srcmv'd release is still out a bit and I'd really like 
 to see this one (and a few others) go into 1.2 (as per my previous mail to 
 this list in another thread). While it isn't the absolute perfect solution 
 in all cases, it is disabled by default and manual compaction strategies 
 work as they did before. In the meantime, we can refine the rest of the 
 system to make it more fully 

Re: Bringing automatic compaction into trunk

2011-08-16 Thread Robert Newson
Ok, let's see Pauls' code concerns addressed first, it needs that
cleanup before it can hit trunk.

I'd still prefer to see an event-driven rather than polling approach,
e.g, hook into update_notifier and build a queue of databases that are
actively being written to (and therefore growing). A much lazier
background thing could compact databases that are inactive.

B.

On 16 August 2011 14:48, Jan Lehnardt j...@apache.org wrote:

 On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:

 All good points Jan, thanks.

 Having large numbers of databases is one thing, but I'm focused on the
 impact on ongoing operations with this running in the background. What
 does it do to the users experience to have all dbs scanned
 periodically, etc?

 The reason I suggest doing it after the move, and in its own app, is
 to reduce the work needed to not use this code in some circumstances
 (Cloudant hosting, for example). Yes, it's a separate module and
 disabled by default, but putting it in its own application will make
 the separation much more explicit and preclude unintended
 entanglements with core over time.

 I think this is a valid concern, but I don't think it outweighs the
 disadvantage. I'm happy to spend time to make sure this is properly
 modular after srcmv.

 Cheers
 Jan
 --



 B.

 On 16 August 2011 14:31, Jan Lehnardt j...@apache.org wrote:

 On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:

 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).

 As Filipe mentions in the ticket, this was tested with large numbers of
 databases.

 In addition, your most want assumption doesn't hold for the average
 user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
 plus that a software doesn't start wasting a system resource without
 cleaning up after itself. But this isn't even suggesting to enable this by
 default. We have plenty of other features that need proper documentation
 to be used correctly and that we are improving over time to make them
 more obvious by removing common errors or odd behaviour.

 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.

 I was suggesting that the the patch is ready enough for trunk and that
 the level of readiness should not be solves all possible cases. Especially
 for something that is disabled by default. If we take this to the extreme,
 we'd never add any new features.

 I'm not suggesting it compiles for me, lets throw it into trunk.

 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.

 I don't see how that is any different from adding it before srcmv and
 avoiding leaving the front-porting effort to a single person.

 Ideally we'd already have srcmv done, but we don't and I don't want
 to hold off progress for an architecture change.

 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.

 :)

 Cheers
 Jan
 --



 B.

 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,

 I replied inline and then hijacked the thread for a more general 
 discussion, sorry about that  :)

 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:

 Filipe,

  This is neat, I can definitely see the utility of the approach. I do 
 share the concerns expressed in other comments with respect to the use 
 of the config file for per db compaction specs and the use of a 
 compact_loop that waits on config change messages when the ets table is 
 empty. I don't think it fully takes into account the use case of large 
 numbers of small dbs and/or some very large dbs interspersed with a lot 
 of mid-size dbs.

 As I seid in the ticket, per-db config is desirable, but I think outside 
 of the scope of the ticket.

  Anyway I like it a lot though I've only read the code for 1/2 and hour 
 or so. I also agree with others that the code base is reaching a point 
 of being a bit crufty and it might be a good time with the git 
 migration, etc.. to take a breath and commit to making some of these OTP 
 compliant changes and design changes we've talked about.

 Just curious, would it make a big difference to commit the patch before 
 srcmv and migrate it with the rest of the code base rather than letting 
 it rot in JIRA and leave it all to Filipe to keep it updated.


Re: Bringing automatic compaction into trunk

2011-08-16 Thread Jan Lehnardt

On Aug 16, 2011, at 4:00 PM, Robert Newson wrote:

 Ok, let's see Pauls' code concerns addressed first, it needs that
 cleanup before it can hit trunk.
 
 I'd still prefer to see an event-driven rather than polling approach,
 e.g, hook into update_notifier and build a queue of databases that are
 actively being written to (and therefore growing). A much lazier
 background thing could compact databases that are inactive.

Jup, my discussion was barring that all that is sorted out as an
implementation detail. Back to JIRA.

Cheers
Jan
-- 

 
 B.
 
 On 16 August 2011 14:48, Jan Lehnardt j...@apache.org wrote:
 
 On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
 
 All good points Jan, thanks.
 
 Having large numbers of databases is one thing, but I'm focused on the
 impact on ongoing operations with this running in the background. What
 does it do to the users experience to have all dbs scanned
 periodically, etc?
 
 The reason I suggest doing it after the move, and in its own app, is
 to reduce the work needed to not use this code in some circumstances
 (Cloudant hosting, for example). Yes, it's a separate module and
 disabled by default, but putting it in its own application will make
 the separation much more explicit and preclude unintended
 entanglements with core over time.
 
 I think this is a valid concern, but I don't think it outweighs the
 disadvantage. I'm happy to spend time to make sure this is properly
 modular after srcmv.
 
 Cheers
 Jan
 --
 
 
 
 B.
 
 On 16 August 2011 14:31, Jan Lehnardt j...@apache.org wrote:
 
 On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
 
 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).
 
 As Filipe mentions in the ticket, this was tested with large numbers of
 databases.
 
 In addition, your most want assumption doesn't hold for the average
 user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
 plus that a software doesn't start wasting a system resource without
 cleaning up after itself. But this isn't even suggesting to enable this by
 default. We have plenty of other features that need proper documentation
 to be used correctly and that we are improving over time to make them
 more obvious by removing common errors or odd behaviour.
 
 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.
 
 I was suggesting that the the patch is ready enough for trunk and that
 the level of readiness should not be solves all possible cases. 
 Especially
 for something that is disabled by default. If we take this to the extreme,
 we'd never add any new features.
 
 I'm not suggesting it compiles for me, lets throw it into trunk.
 
 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.
 
 I don't see how that is any different from adding it before srcmv and
 avoiding leaving the front-porting effort to a single person.
 
 Ideally we'd already have srcmv done, but we don't and I don't want
 to hold off progress for an architecture change.
 
 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.
 
 :)
 
 Cheers
 Jan
 --
 
 
 
 B.
 
 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,
 
 I replied inline and then hijacked the thread for a more general 
 discussion, sorry about that  :)
 
 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
 
 Filipe,
 
  This is neat, I can definitely see the utility of the approach. I do 
 share the concerns expressed in other comments with respect to the use 
 of the config file for per db compaction specs and the use of a 
 compact_loop that waits on config change messages when the ets table is 
 empty. I don't think it fully takes into account the use case of large 
 numbers of small dbs and/or some very large dbs interspersed with a lot 
 of mid-size dbs.
 
 As I seid in the ticket, per-db config is desirable, but I think outside 
 of the scope of the ticket.
 
  Anyway I like it a lot though I've only read the code for 1/2 and hour 
 or so. I also agree with others that the code base is reaching a point 
 of being a bit crufty and it might be a good time with the git 
 migration, etc.. to take a breath and commit to making some of these 
 OTP compliant changes and design changes we've talked 

The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
One of the principal uses of the replicator is to make this database look like 
that one.  We're unable to do that in the general case today because of the 
combination of validation functions and out-of-order document transfers.  It's 
entirely possible for a document to be saved in the source DB prior to the 
installation of a ddoc containing a validation function that would have 
rejected the document, for the replicator to install the ddoc in the target DB 
before replicating the other document, and for the other document to then be 
rejected by the target DB.

I propose we add a role which allows a user to bypass validation, or else 
extend that privilege to the _admin role.  We should still validate updates by 
default and add a way (a new qs param, for instance) to indicate that 
validation should be skipped for a particular update.  Thoughts?

Adam

Re: The replicator needs a superuser mode

2011-08-16 Thread Robert Newson
+1 on the intention but we'll need to be careful. The use case is
specifically to allow verbatim migration of databases between servers.
A separate role makes sense as I'm not sure of the consequences of
explicitly granting this ability to the existing _admin role.

B.

On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because of 
 the combination of validation functions and out-of-order document transfers.  
 It's entirely possible for a document to be saved in the source DB prior to 
 the installation of a ddoc containing a validation function that would have 
 rejected the document, for the replicator to install the ddoc in the target 
 DB before replicating the other document, and for the other document to then 
 be rejected by the target DB.

 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate updates 
 by default and add a way (a new qs param, for instance) to indicate that 
 validation should be skipped for a particular update.  Thoughts?

 Adam


Re: The replicator needs a superuser mode

2011-08-16 Thread Jan Lehnardt
This is only slightly related, but I'm dreaming of /db/_dump and /db/_restore 
endpoints (the names don't matter, could be one with GET / PUT) that just ships 
verbatim .couch files over HTTP. It would be for admins only, it would not be 
incremental (although we might be able to add that), and I haven't yet thought 
through all the concurrency and error case implications, the above solves more 
than the proposed problem and in a very different, but I thought I throw it in 
the mix.

Cheers
Jan
-- 

On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:

 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the source 
 DB prior to the installation of a ddoc containing a validation function that 
 would have rejected the document, for the replicator to install the ddoc in 
 the target DB before replicating the other document, and for the other 
 document to then be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate updates 
 by default and add a way (a new qs param, for instance) to indicate that 
 validation should be skipped for a particular update.  Thoughts?
 
 Adam



Re: The replicator needs a superuser mode

2011-08-16 Thread Ryan Ramage
 This is only slightly related, but I'm dreaming of /db/_dump and /db/_restore 
 endpoints (the names don't matter, could be one with GET / PUT) that just 
 ships verbatim .couch files over HTTP. It would be for admins only, it would 
 not be incremental (although we might be able to add that), and I haven't yet 
 thought through all the concurrency and error case implications, the above 
 solves more than the proposed problem and in a very different, but I thought 
 I throw it in the mix.


+1 on /db/_dump and /db/_restore endpoints!! Very beneficial to us
little people trying to make installers like couchapp-takeout, and
could even be used from futon to create a database from a remote db. I
am anecdotally noticing that using replication to create a local
database from a remote one with lots of attachments takes a long time,
is prone to timeouts, and gets stuck (been working with jhs on this).
Dump/restore will be also much faster, eliminating the small requests.


Re: compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Paul Davis
On Tue, Aug 16, 2011 at 6:45 AM, Jan Lehnardt j...@apache.org wrote:
 Hi Benoit,

 thanks for raising this again. I think we have a good plan to get started but
 it wouldn't hurt to get a little more organised. I think the plan is as 
 follows:

 1. Move to git, this makes all the subsequent steps more easy.

 2. srcmv, reorganising the source code so we are prepared to do all the things
    you mention and all the other things we talked about in the past :)

 3. Profit.

 --

 As for my wish list, all this post the git move:

 We could release 1.2 based off of current trunk + a few of the more
 useful JIRA patches that we haven't committed yet.

 After 1.2.x is branched, srcmv trunk and start the internal refactoring
 and pluginnifying and release 1.3 off that.

 At some point merging between before and after srcmv merging
 patches is going to be a pain, so I'd like to keep that time as short
 as possible and thus keep the differences between 1.2 and 1.3 (given
 that these are the border cases) as small as possible.

 Cheers
 Jan
 --


Early morning pre-caffeine but this sounds like a pretty good idea to
my addled brain.


 On Aug 16, 2011, at 1:20 PM, Benoit Chesneau wrote:

 Hi devs;

 Today I see lot of interesting things coming in CouchDB, but also lot of
 different interests and different usages.  Sometimes you need to extend
 couch for your usage. But today if you except the current work on the
 view engine by paul, the couchdb code become more and more monolithic or
 an aggregation of code adding some specific features/changes, while not
 envisioning what could be done by others. Also the way you have to
 extend couchdb make it difficult today to use/merge/... different forks
 around like the one done by cloudant, couchbase and even mine in
 refuge/upondata probably some others too). Couch core should be
 lighter and more open (in its strict sense).

 For example today, http layer(?), replicator(?), proxy, external
 daemons, couchapp engine, rewriter, vhosts, compaction daemon, some auth
 handler could be available as plugins. couch_config could be more
 generic and not relying on an ini file. More specifically we could have
 a couch core looking more like a mnesia alternative, the couchdb
 application, which could be couch core + plugins, distributed as a
 standalone app (like couchdb is actually). This would also maybe allow
 cloudant, couchbase and other to reuse the same core rather than forking
 it while adding their own plugins. Official Plugins could also be
 maintained as standalone projects maybe.


 I wish we could concentrate on that topic for 2.0x and make it a
 priority. That would imply to define what is the couch core, split the
 code [1] and what is a plugin [2]. Maybe the couchdb app can also be a
 full erlang release [3] built with autotools. I think that this
 plugabble structure should be done for example before adding any new
 daemon like the compaction daemon. Don't get me wrong I really like the
 idea to have a default compaction daemon in the couchdb app, and this is
 just an example. But I also want the possibility to add mine working
 differently (or not) and this should be done for the default couchdb
 release, couch core imo should be more neutral.

 Maybe we could start by opening tickets about different tasks to track
 them? What is blocking the split currently since the 1.0.3 is out? Do
 we wait for the svn-to-git conversion?

 - benoît

 [1] https://github.com/davisp/couchdb-srcmv
 [2] https://issues.apache.org/jira/browse/COUCHDB-1012
 [3] http://www.erlang.org/doc/design_principles/release_structure.html




Re: The replicator needs a superuser mode

2011-08-16 Thread Paul Davis
Me and Adam were just mulling over a similar endpoint the other night
that could be used to generate plain-text backups similar to what
couchdb-dump and couchdb-load were doing. With the idea that there
would be some special sauce to pipe from one _dump endpoint directly
into a different _load handler. Obvious downfall was incremental-ness
of this. Seems like it'd be doable, but I'm not entirely certain on
the best method.

I was also considering this as our full-proof 100% reliable method for
migrating data between different CouchDB versions which we seem to
screw up fairly regularly.

+1 on the idea. Not sure about raw couch files as it limits the wider
usefulness (and we already have scp).

On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and /db/_restore 
 endpoints (the names don't matter, could be one with GET / PUT) that just 
 ships verbatim .couch files over HTTP. It would be for admins only, it would 
 not be incremental (although we might be able to add that), and I haven't yet 
 thought through all the concurrency and error case implications, the above 
 solves more than the proposed problem and in a very different, but I thought 
 I throw it in the mix.

 Cheers
 Jan
 --

 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:

 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.

 B.

 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the source 
 DB prior to the installation of a ddoc containing a validation function 
 that would have rejected the document, for the replicator to install the 
 ddoc in the target DB before replicating the other document, and for the 
 other document to then be rejected by the target DB.

 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate updates 
 by default and add a way (a new qs param, for instance) to indicate that 
 validation should be skipped for a particular update.  Thoughts?

 Adam




Re: Remove 1.0.2 release from Apache Mirrors

2011-08-16 Thread Paul Davis
Not only should no one object, but infrastructure would object to
people objecting. :D

Thanks for helping with and cleaning up this release. I'll try and not
be moving across country when I do the next one.

On Tue, Aug 16, 2011 at 5:35 AM, Jan Lehnardt j...@apache.org wrote:
 Hi,

 in the spirit of keeping things clean I'd like to remove the 1.0.2 release 
 from the mirrors and put it into the archive.

 If nobody objects, I'll do this tomorrow.

 Cheers
 Jan
 --




Re: The replicator needs a superuser mode

2011-08-16 Thread Nathan Vander Wilt
We've already got replication, _all_docs and some really robust on-disk 
consistency properties. For shuttling raw database files between servers, 
wouldn't rsync be more efficient (and fit better within existing sysadmin 
security/deployment structures)?
-nvw


On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / PUT) 
 that just ships verbatim .couch files over HTTP. It would be for admins 
 only, it would not be incremental (although we might be able to add that), 
 and I haven't yet thought through all the concurrency and error case 
 implications, the above solves more than the proposed problem and in a very 
 different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the 
 source DB prior to the installation of a ddoc containing a validation 
 function that would have rejected the document, for the replicator to 
 install the ddoc in the target DB before replicating the other document, 
 and for the other document to then be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate 
 updates by default and add a way (a new qs param, for instance) to 
 indicate that validation should be skipped for a particular update.  
 Thoughts?
 
 Adam
 
 



Re: The replicator needs a superuser mode

2011-08-16 Thread Jan Lehnardt
Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
http://couch/db/_restore.

I acknowledge that similar solutions exist, but using the http transport allows 
for more fun things down the road.

See what we are doing with _changes today where DbUpdateNotifications nearly do 
the same thing.

Cheers
Jan
--

On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com wrote:

 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw
 
 
 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to add 
 that), and I haven't yet thought through all the concurrency and error case 
 implications, the above solves more than the proposed problem and in a very 
 different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case today 
 because of the combination of validation functions and out-of-order 
 document transfers.  It's entirely possible for a document to be saved in 
 the source DB prior to the installation of a ddoc containing a validation 
 function that would have rejected the document, for the replicator to 
 install the ddoc in the target DB before replicating the other document, 
 and for the other document to then be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate 
 updates by default and add a way (a new qs param, for instance) to 
 indicate that validation should be skipped for a particular update.  
 Thoughts?
 
 Adam
 
 
 


Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
Wow, this thread got hijacked a bit :)  Anyone object to the special role that 
has the skip validation superpower?

Adam

On Aug 16, 2011, at 1:51 PM, Jan Lehnardt wrote:

 Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
 http://couch/db/_restore.
 
 I acknowledge that similar solutions exist, but using the http transport 
 allows for more fun things down the road.
 
 See what we are doing with _changes today where DbUpdateNotifications nearly 
 do the same thing.
 
 Cheers
 Jan
 --
 
 On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 
 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw
 
 
 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to add 
 that), and I haven't yet thought through all the concurrency and error 
 case implications, the above solves more than the proposed problem and in 
 a very different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case today 
 because of the combination of validation functions and out-of-order 
 document transfers.  It's entirely possible for a document to be saved 
 in the source DB prior to the installation of a ddoc containing a 
 validation function that would have rejected the document, for the 
 replicator to install the ddoc in the target DB before replicating the 
 other document, and for the other document to then be rejected by the 
 target DB.
 
 I propose we add a role which allows a user to bypass validation, or 
 else extend that privilege to the _admin role.  We should still validate 
 updates by default and add a way (a new qs param, for instance) to 
 indicate that validation should be skipped for a particular update.  
 Thoughts?
 
 Adam
 
 
 



Re: The replicator needs a superuser mode

2011-08-16 Thread Robert Dionne
No objection, just the question of why the need for a new role, why not use 
admin?



On Aug 16, 2011, at 2:10 PM, Adam Kocoloski wrote:

 Wow, this thread got hijacked a bit :)  Anyone object to the special role 
 that has the skip validation superpower?
 
 Adam
 
 On Aug 16, 2011, at 1:51 PM, Jan Lehnardt wrote:
 
 Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
 http://couch/db/_restore.
 
 I acknowledge that similar solutions exist, but using the http transport 
 allows for more fun things down the road.
 
 See what we are doing with _changes today where DbUpdateNotifications nearly 
 do the same thing.
 
 Cheers
 Jan
 --
 
 On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 
 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw
 
 
 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to 
 add that), and I haven't yet thought through all the concurrency and 
 error case implications, the above solves more than the proposed problem 
 and in a very different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case today 
 because of the combination of validation functions and out-of-order 
 document transfers.  It's entirely possible for a document to be saved 
 in the source DB prior to the installation of a ddoc containing a 
 validation function that would have rejected the document, for the 
 replicator to install the ddoc in the target DB before replicating the 
 other document, and for the other document to then be rejected by the 
 target DB.
 
 I propose we add a role which allows a user to bypass validation, or 
 else extend that privilege to the _admin role.  We should still 
 validate updates by default and add a way (a new qs param, for 
 instance) to indicate that validation should be skipped for a 
 particular update.  Thoughts?
 
 Adam
 
 
 
 



Re: The replicator needs a superuser mode

2011-08-16 Thread Robert Newson
no objection to special role. As in my opening statement, would be
concerned about adding it to _admin without devoting more thought to
possible unintended consequences.

b.

On 16 August 2011 19:13, Robert Dionne dio...@dionne-associates.com wrote:
 No objection, just the question of why the need for a new role, why not use 
 admin?



 On Aug 16, 2011, at 2:10 PM, Adam Kocoloski wrote:

 Wow, this thread got hijacked a bit :)  Anyone object to the special role 
 that has the skip validation superpower?

 Adam

 On Aug 16, 2011, at 1:51 PM, Jan Lehnardt wrote:

 Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
 http://couch/db/_restore.

 I acknowledge that similar solutions exist, but using the http transport 
 allows for more fun things down the road.

 See what we are doing with _changes today where DbUpdateNotifications 
 nearly do the same thing.

 Cheers
 Jan
 --

 On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com 
 wrote:

 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw


 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.

 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.

 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).

 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to 
 add that), and I haven't yet thought through all the concurrency and 
 error case implications, the above solves more than the proposed problem 
 and in a very different, but I thought I throw it in the mix.

 Cheers
 Jan
 --

 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:

 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.

 B.

 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case 
 today because of the combination of validation functions and 
 out-of-order document transfers.  It's entirely possible for a 
 document to be saved in the source DB prior to the installation of a 
 ddoc containing a validation function that would have rejected the 
 document, for the replicator to install the ddoc in the target DB 
 before replicating the other document, and for the other document to 
 then be rejected by the target DB.

 I propose we add a role which allows a user to bypass validation, or 
 else extend that privilege to the _admin role.  We should still 
 validate updates by default and add a way (a new qs param, for 
 instance) to indicate that validation should be skipped for a 
 particular update.  Thoughts?

 Adam








Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
Hmm, if we used a separate role we'd need a multi-step process to trigger the 
replication

1) create the database
2) have an admin grant the _skip_validation role on that DB to the replicator's 
user_ctx
3) trigger the replication

Kind of annoying.  Certainly would be simpler to allow _admins to do this if 
just by adding a skip_validation=true parameter to write requests.

Adam

On Aug 16, 2011, at 2:21 PM, Robert Newson wrote:

 no objection to special role. As in my opening statement, would be
 concerned about adding it to _admin without devoting more thought to
 possible unintended consequences.
 
 b.
 
 On 16 August 2011 19:13, Robert Dionne dio...@dionne-associates.com wrote:
 No objection, just the question of why the need for a new role, why not use 
 admin?
 
 
 
 On Aug 16, 2011, at 2:10 PM, Adam Kocoloski wrote:
 
 Wow, this thread got hijacked a bit :)  Anyone object to the special role 
 that has the skip validation superpower?
 
 Adam
 
 On Aug 16, 2011, at 1:51 PM, Jan Lehnardt wrote:
 
 Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
 http://couch/db/_restore.
 
 I acknowledge that similar solutions exist, but using the http transport 
 allows for more fun things down the road.
 
 See what we are doing with _changes today where DbUpdateNotifications 
 nearly do the same thing.
 
 Cheers
 Jan
 --
 
 On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com 
 wrote:
 
 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw
 
 
 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to 
 add that), and I haven't yet thought through all the concurrency and 
 error case implications, the above solves more than the proposed 
 problem and in a very different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case 
 today because of the combination of validation functions and 
 out-of-order document transfers.  It's entirely possible for a 
 document to be saved in the source DB prior to the installation of a 
 ddoc containing a validation function that would have rejected the 
 document, for the replicator to install the ddoc in the target DB 
 before replicating the other document, and for the other document to 
 then be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or 
 else extend that privilege to the _admin role.  We should still 
 validate updates by default and add a way (a new qs param, for 
 instance) to indicate that validation should be skipped for a 
 particular update.  Thoughts?
 
 Adam
 
 
 
 
 
 



Re: The replicator needs a superuser mode

2011-08-16 Thread Jean-Pierre Fiset
I understand the issue brought by Adam since in our CouchDb application, there 
is a need to have
a replicator role and the validation functions skip most of the tests if the 
role is set for the
current user.

On the other hand, at the current time, I am not in favour of making super 
users for the sake of
replication. Although it might solve the particular problem stated, it removes 
the ability for a
design document to enforce some invariant properties of a database.

Since there is already a way to allow a replicator to perform any changes 
(role + proper
validation function), I do not see the need for this change. Since the super 
replicator user
removes the ability that a database has to protect the consistency of its data, 
and that there
does not seem to be a work-around, I would rather not see this change pushed to 
CouchDb.

JP

On 11-08-16 10:26 AM, Adam Kocoloski wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because of 
 the combination of validation functions and out-of-order document transfers.  
 It's entirely possible for a document to be saved in the source DB prior to 
 the installation of a ddoc containing a validation function that would have 
 rejected the document, for the replicator to install the ddoc in the target 
 DB before replicating the other document, and for the other document to then 
 be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate updates 
 by default and add a way (a new qs param, for instance) to indicate that 
 validation should be skipped for a particular update.  Thoughts?
 
 Adam



Re: Configuration Load Order

2011-08-16 Thread Noah Slater

On 16 Aug 2011, at 02:20, Jason Smith wrote:

 Is it possible to deprecate the .ini files as a configuration tool? In
 other words, tell the world: Configure CouchDB over HTTP via the
 /_config URLs, probably via Futon.

I think this proposal reaches too far.

Having the configuration in the ini files is good for a number of reasons. It 
allows you to configure CouchDB without actually running CouchDB. Following on 
from that, it allows you to rescue a CouchDB instance that is misbehaving, even 
if you are unable to access CouchDB. It lets sysadmins version the files, 
perform audits, as well as allowing them to easily integrate CouchDB within 
automatic deployment and configuration systems.

If there are certain types of things that people regularly want to do via 
CouchDB itself, such as URL handlers, or users, then I see no reason why this 
stuff shouldn't be moved to CouchDB itself. But this type of thing should 
probably be handled on a case-by-case basis. Anything which relates to the 
CouchDB server in a more general sense, should stay in the system configuration 
files.

Re: The replicator needs a superuser mode

2011-08-16 Thread Paul Davis
On Tue, Aug 16, 2011 at 1:10 PM, Adam Kocoloski kocol...@apache.org wrote:
 Wow, this thread got hijacked a bit :)

You must be new here.


Re: Configuration Load Order

2011-08-16 Thread Noah Slater

On 16 Aug 2011, at 10:33, Benoit Chesneau wrote:

 Imo we shouldn't at all provide plaintext passwords. Maybe a safer
 option would be to let the admin create the first one via http or put
 the hash in the a password.ini file manually. If we are enough kind we
 could also provide a couchctl script allowing user management, config
 changes ... ?

This sounds like a decent proposal. Much like you have to use htpasswd to 
generate passwords for Apache httpd, we could bundle a script that lets you 
generate passwords for the CouchDB ini files, and then forbid the use of 
plaintext. This solves both the technical problem (I think?) and helps us 
re-enforce better security practices across the board.

Re: Configuration Load Order

2011-08-16 Thread Jan Lehnardt

On Aug 16, 2011, at 8:31 PM, Noah Slater wrote:

 
 On 16 Aug 2011, at 10:33, Benoit Chesneau wrote:
 
 Imo we shouldn't at all provide plaintext passwords. Maybe a safer
 option would be to let the admin create the first one via http or put
 the hash in the a password.ini file manually. If we are enough kind we
 could also provide a couchctl script allowing user management, config
 changes ... ?
 
 This sounds like a decent proposal. Much like you have to use htpasswd to 
 generate passwords for Apache httpd, we could bundle a script that lets you 
 generate passwords for the CouchDB ini files, and then forbid the use of 
 plaintext. This solves both the technical problem (I think?) and helps us 
 re-enforce better security practices across the board.

Agreed.

Cheers
Jan
-- 



Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
Hi Jean-Pierre, I'm not quite sure I follow that line of reasoning.  A user 
with _admin privileges on the database can easily remove any validation 
functions prior to writing today.  In my proposal skipping validation would 
require _admin rights and an explicit opt-in on a per-request basis.  What are 
you trying to guard against with those validation functions?  Best,

Adam

On Aug 16, 2011, at 2:29 PM, Jean-Pierre Fiset wrote:

 I understand the issue brought by Adam since in our CouchDb application, 
 there is a need to have a replicator role and the validation functions skip 
 most of the tests if the role is set for the current user.
 
 On the other hand, at the current time, I am not in favour of making super 
 users for the sake of replication. Although it might solve the particular 
 problem stated, it removes the ability for a design document to enforce some 
 invariant properties of a database.
 
 Since there is already a way to allow a replicator to perform any changes 
 (role + proper validation function), I do not see the need for this change. 
 Since the super replicator user removes the ability that a database has to 
 protect the consistency of its data, and that there does not seem to be a 
 work-around, I would rather not see this change pushed to CouchDb.
 
 JP
 
 On 11-08-16 10:26 AM, Adam Kocoloski wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the source 
 DB prior to the installation of a ddoc containing a validation function that 
 would have rejected the document, for the replicator to install the ddoc in 
 the target DB before replicating the other document, and for the other 
 document to then be rejected by the target DB.
 
 I propose we add a role which allows a user to bypass validation, or else 
 extend that privilege to the _admin role.  We should still validate updates 
 by default and add a way (a new qs param, for instance) to indicate that 
 validation should be skipped for a particular update.  Thoughts?
 
 Adam
 



Re: Bringing automatic compaction into trunk

2011-08-16 Thread Damien Katz
Filipe is addressing Paul's concerns. As far as scanning vs. an evented 
architecture, I'd prefer to see Filipe's working code in place, and later 
replaced with a better alternative. We need to push the project forward, we 
value useful correct code first. It's easier to improve on it once it's in 
place.

Also, I have no objections to a more modular architecture, I very much welcome 
it. But that work can happen concurrently with pushing forward the code and 
adding features the user community cares about.

-Damien


On Aug 16, 2011, at 7:00 AM, Robert Newson wrote:

 Ok, let's see Pauls' code concerns addressed first, it needs that
 cleanup before it can hit trunk.
 
 I'd still prefer to see an event-driven rather than polling approach,
 e.g, hook into update_notifier and build a queue of databases that are
 actively being written to (and therefore growing). A much lazier
 background thing could compact databases that are inactive.
 
 B.
 
 On 16 August 2011 14:48, Jan Lehnardt j...@apache.org wrote:
 
 On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
 
 All good points Jan, thanks.
 
 Having large numbers of databases is one thing, but I'm focused on the
 impact on ongoing operations with this running in the background. What
 does it do to the users experience to have all dbs scanned
 periodically, etc?
 
 The reason I suggest doing it after the move, and in its own app, is
 to reduce the work needed to not use this code in some circumstances
 (Cloudant hosting, for example). Yes, it's a separate module and
 disabled by default, but putting it in its own application will make
 the separation much more explicit and preclude unintended
 entanglements with core over time.
 
 I think this is a valid concern, but I don't think it outweighs the
 disadvantage. I'm happy to spend time to make sure this is properly
 modular after srcmv.
 
 Cheers
 Jan
 --
 
 
 
 B.
 
 On 16 August 2011 14:31, Jan Lehnardt j...@apache.org wrote:
 
 On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
 
 I'm -1 on the approach (as I understand it) taken by the scheduler as
 it will be problematic in precisely the circumstance when you'd most
 want auto compaction (large numbers of databases and views).
 
 As Filipe mentions in the ticket, this was tested with large numbers of
 databases.
 
 In addition, your most want assumption doesn't hold for the average
 user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
 plus that a software doesn't start wasting a system resource without
 cleaning up after itself. But this isn't even suggesting to enable this by
 default. We have plenty of other features that need proper documentation
 to be used correctly and that we are improving over time to make them
 more obvious by removing common errors or odd behaviour.
 
 To this point Just curious, would it make a big difference to commit
 the patch before srcmv and migrate it with the rest of the code base
 rather than letting it rot in JIRA and leave it all to Filipe to keep
 it updated. -- I'm -∞ on any suggestion that code should be put in
 trunk to stop it from rotting. Code should land when it's ready. I
 hope we're all agreed on that and that this paragraph was redundant.
 
 I was suggesting that the the patch is ready enough for trunk and that
 the level of readiness should not be solves all possible cases. 
 Especially
 for something that is disabled by default. If we take this to the extreme,
 we'd never add any new features.
 
 I'm not suggesting it compiles for me, lets throw it into trunk.
 
 After srcmv, and then some work to OTP-ify each of the resultant
 subdirs, we should add this as a separate application. We might also
 mark it as beta in the first release to gather feedback from the
 community.
 
 I don't see how that is any different from adding it before srcmv and
 avoiding leaving the front-porting effort to a single person.
 
 Ideally we'd already have srcmv done, but we don't and I don't want
 to hold off progress for an architecture change.
 
 I'll be accused of 'stop energy' within nanoseconds of this post so I
 should end by saying I'm +1 on couchdb gaining the ability to
 automatically compact its databases and views in principle.
 
 :)
 
 Cheers
 Jan
 --
 
 
 
 B.
 
 On 16 August 2011 13:19, Jan Lehnardt j...@apache.org wrote:
 Good points Robert,
 
 I replied inline and then hijacked the thread for a more general 
 discussion, sorry about that  :)
 
 On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
 
 Filipe,
 
  This is neat, I can definitely see the utility of the approach. I do 
 share the concerns expressed in other comments with respect to the use 
 of the config file for per db compaction specs and the use of a 
 compact_loop that waits on config change messages when the ets table is 
 empty. I don't think it fully takes into account the use case of large 
 numbers of small dbs and/or some very large dbs interspersed with a lot 
 of mid-size dbs.
 
 As I seid in the ticket, per-db config 

Re: [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Filipe David Manana
On Tue, Aug 16, 2011 at 2:58 AM, Benoit Chesneau bchesn...@gmail.com wrote:

 Could be a local docs, But why didn't we took this path for this
 _security object ? Also since they are really meta informations,
 i've the feeling it should be solved as a special member in the db
 file, just like the _security object.

I don't know why _security is like it is now, that predates me, and
it's another topic :)


 Anyway what I really dislike is saving per db configuration in an ini
 file. Per db configuration should be done on the db. What if you more
 than 100 dbs. Having 100 lines in an ini file to parse is awkward.

I don't think the common case is to have a separate compact config for
every single database.
The fragmentation parameter, which is likely the most useful, you're
likely to not set a different value for 100 databases (neither the
period for e.g.).

For other things like the oauth tokens/secrets, the .ini system
doesn't scale. But that's again another topic.

 This is just as simple as this line, creating a db create an entry in
 a db index (or db file) that you can use later.

 I suspect what you think is something like rather than scanning
 periodically, to let the daemon be notified when a db (or view) can be
 compacted?
 At some point I considered reacting to db_updated events but this was
 pretty much flooding the the event handler (daemon).
 Was this your idea?


 Using db events is my idea yes.  If t actually flood the db event
 handler (not sure why), then maybe we should fix it first?

The problem is when you have many dbs in the system and under a
reasonable write load, the daemon (which is the receiver of db_updated
events) receives too many messages. To know if you need to compact the
db after such message, you need to open it, and opening it on every
message is a big burden as well.
I tried this on a system with 1024 databases being updated constantly.
It also doesn't deal with the case on startup where if a db with a
high fragmentation is not updated for a long period, it won't have
compaction started.

If someone can measure the current solution's impact and present
another working alternative with a lower impact (and practical tests,
not just theory) I would be the first one wanting to make the change
asap.


 - benoit




-- 
Filipe David Manana,
fdman...@gmail.com, fdman...@apache.org

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.


Re: The replicator needs a superuser mode

2011-08-16 Thread Randall Leeds
-1 on _skip_validation and new role

One can always write a validation document that considers the role, no? Why
can't users who need this functionality craft a validation function for this
purpose? This sounds like a blog post and not a database feature.

+0 on _dump/_load

If it ships raw .couch files I'm totally against it because I think the HTTP
API should remain as independent of implementation details as possible. If
it is non-incremental I don't see significant benefit, unless it's just to
traverse the document index and ignore the sequence index as a way to skip
reads, but this seems like a weak argument. If it's incremental, well, then,
that's replication, and we already have that.

-Randall


On Tue, Aug 16, 2011 at 11:40, Adam Kocoloski kocol...@apache.org wrote:

 Hi Jean-Pierre, I'm not quite sure I follow that line of reasoning.  A user
 with _admin privileges on the database can easily remove any validation
 functions prior to writing today.  In my proposal skipping validation would
 require _admin rights and an explicit opt-in on a per-request basis.  What
 are you trying to guard against with those validation functions?  Best,

 Adam

 On Aug 16, 2011, at 2:29 PM, Jean-Pierre Fiset wrote:

  I understand the issue brought by Adam since in our CouchDb application,
 there is a need to have a replicator role and the validation functions skip
 most of the tests if the role is set for the current user.
 
  On the other hand, at the current time, I am not in favour of making
 super users for the sake of replication. Although it might solve the
 particular problem stated, it removes the ability for a design document to
 enforce some invariant properties of a database.
 
  Since there is already a way to allow a replicator to perform any
 changes (role + proper validation function), I do not see the need for this
 change. Since the super replicator user removes the ability that a database
 has to protect the consistency of its data, and that there does not seem to
 be a work-around, I would rather not see this change pushed to CouchDb.
 
  JP
 
  On 11-08-16 10:26 AM, Adam Kocoloski wrote:
  One of the principal uses of the replicator is to make this database
 look like that one.  We're unable to do that in the general case today
 because of the combination of validation functions and out-of-order document
 transfers.  It's entirely possible for a document to be saved in the source
 DB prior to the installation of a ddoc containing a validation function that
 would have rejected the document, for the replicator to install the ddoc in
 the target DB before replicating the other document, and for the other
 document to then be rejected by the target DB.
 
  I propose we add a role which allows a user to bypass validation, or
 else extend that privilege to the _admin role.  We should still validate
 updates by default and add a way (a new qs param, for instance) to indicate
 that validation should be skipped for a particular update.  Thoughts?
 
  Adam
 




[jira] [Created] (COUCHDB-1251) Factor out couch core and hook other compnonents through a module system

2011-08-16 Thread Randall Leeds (JIRA)
Factor out couch core and hook other compnonents through a module system


 Key: COUCHDB-1251
 URL: https://issues.apache.org/jira/browse/COUCHDB-1251
 Project: CouchDB
  Issue Type: Umbrella
  Components: Build System, Database Core
Reporter: Randall Leeds
 Fix For: 2.0


https://mail-archives.apache.org/mod_mbox/couchdb-dev/201108.mbox/browser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 09:42, Paul Davis paul.joseph.da...@gmail.comwrote:

 On Tue, Aug 16, 2011 at 6:45 AM, Jan Lehnardt j...@apache.org wrote:
  Hi Benoit,
 
  thanks for raising this again. I think we have a good plan to get started
 but
  it wouldn't hurt to get a little more organised. I think the plan is as
 follows:
 
  1. Move to git, this makes all the subsequent steps more easy.
 
  2. srcmv, reorganising the source code so we are prepared to do all the
 things
 you mention and all the other things we talked about in the past :)
 
  3. Profit.
 
  --
 
  As for my wish list, all this post the git move:
 
  We could release 1.2 based off of current trunk + a few of the more
  useful JIRA patches that we haven't committed yet.
 
  After 1.2.x is branched, srcmv trunk and start the internal refactoring
  and pluginnifying and release 1.3 off that.
 
  At some point merging between before and after srcmv merging
  patches is going to be a pain, so I'd like to keep that time as short
  as possible and thus keep the differences between 1.2 and 1.3 (given
  that these are the border cases) as small as possible.
 
  Cheers
  Jan
  --
 

 Early morning pre-caffeine but this sounds like a pretty good idea to
 my addled brain.


As an experiment in JIRA usage I created an umbrella task for this.
Please place tickets under this umbrella and we can start to break down the
sub-tasks we need to actually get this work done.
https://issues.apache.org/jira/browse/COUCHDB-1251

I set the due date as the 21st of December. Holiday season.
This should give us enough time to get 1.2 out the door and make some real
progress on these goals.

Again, this is an experiment. Sorry for those of you who hate process, but I
thought maybe injecting a bit of here would stop the flow of e-mails and
focus us all collectively.

-Randall


Re: Configuration Load Order

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 11:33, Jan Lehnardt j...@apache.org wrote:


 On Aug 16, 2011, at 8:31 PM, Noah Slater wrote:

 
  On 16 Aug 2011, at 10:33, Benoit Chesneau wrote:
 
  Imo we shouldn't at all provide plaintext passwords. Maybe a safer
  option would be to let the admin create the first one via http or put
  the hash in the a password.ini file manually. If we are enough kind we
  could also provide a couchctl script allowing user management, config
  changes ... ?
 
  This sounds like a decent proposal. Much like you have to use htpasswd to
 generate passwords for Apache httpd, we could bundle a script that lets you
 generate passwords for the CouchDB ini files, and then forbid the use of
 plaintext. This solves both the technical problem (I think?) and helps us
 re-enforce better security practices across the board.

 Agreed.


Agreed also. We still have a question about load and save order.
One idea would be to track the .ini file from whence an option came. If an
option comes from a local.ini or local.d/ file it could be updated in place.
If it comes from a default.ini or default.d/ file, updates should be placed
in local.ini. This would make the most sense to me.

I would also be in favor of enforcing a load order that supports a directory
structure like:
local.d/
  010-stuff.ini
  020-others.ini

We don't need to ship anything like that by default. I think right now we
take the load directories on the command line, no? It'd be nice if the order
of resolution within those directories was well specified.

-Randall


Re: Configuration Load Order

2011-08-16 Thread Robert Newson
nice idea to have a separate htpasswd (-like) file. Passwords are
special, let's treat them accordingly.

B.

On 16 August 2011 23:03, Randall Leeds randall.le...@gmail.com wrote:
 On Tue, Aug 16, 2011 at 11:33, Jan Lehnardt j...@apache.org wrote:


 On Aug 16, 2011, at 8:31 PM, Noah Slater wrote:

 
  On 16 Aug 2011, at 10:33, Benoit Chesneau wrote:
 
  Imo we shouldn't at all provide plaintext passwords. Maybe a safer
  option would be to let the admin create the first one via http or put
  the hash in the a password.ini file manually. If we are enough kind we
  could also provide a couchctl script allowing user management, config
  changes ... ?
 
  This sounds like a decent proposal. Much like you have to use htpasswd to
 generate passwords for Apache httpd, we could bundle a script that lets you
 generate passwords for the CouchDB ini files, and then forbid the use of
 plaintext. This solves both the technical problem (I think?) and helps us
 re-enforce better security practices across the board.

 Agreed.


 Agreed also. We still have a question about load and save order.
 One idea would be to track the .ini file from whence an option came. If an
 option comes from a local.ini or local.d/ file it could be updated in place.
 If it comes from a default.ini or default.d/ file, updates should be placed
 in local.ini. This would make the most sense to me.

 I would also be in favor of enforcing a load order that supports a directory
 structure like:
 local.d/
  010-stuff.ini
  020-others.ini

 We don't need to ship anything like that by default. I think right now we
 take the load directories on the command line, no? It'd be nice if the order
 of resolution within those directories was well specified.

 -Randall



Re: compaction plugin, auth handler, foo plugin couchdb core [resent]

2011-08-16 Thread Filipe David Manana
On Tue, Aug 16, 2011 at 2:59 PM, Randall Leeds 
 As an experiment in JIRA usage I created an umbrella task for this.
 Please place tickets under this umbrella and we can start to break down the
 sub-tasks we need to actually get this work done.
 https://issues.apache.org/jira/browse/COUCHDB-1251

 I set the due date as the 21st of December. Holiday season.
 This should give us enough time to get 1.2 out the door and make some real
 progress on these goals.

Sounds like a good idea Randall.
Thanks for it.


 Again, this is an experiment. Sorry for those of you who hate process, but I
 thought maybe injecting a bit of here would stop the flow of e-mails and
 focus us all collectively.

 -Randall




-- 
Filipe David Manana,
fdman...@gmail.com, fdman...@apache.org

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.


Re: The replicator needs a superuser mode

2011-08-16 Thread Paul Davis
On Tue, Aug 16, 2011 at 4:46 PM, Randall Leeds randall.le...@gmail.com wrote:
 -1 on _skip_validation and new role

 One can always write a validation document that considers the role, no? Why
 can't users who need this functionality craft a validation function for this
 purpose? This sounds like a blog post and not a database feature.

 +0 on _dump/_load

 If it ships raw .couch files I'm totally against it because I think the HTTP
 API should remain as independent of implementation details as possible. If
 it is non-incremental I don't see significant benefit, unless it's just to
 traverse the document index and ignore the sequence index as a way to skip
 reads, but this seems like a weak argument. If it's incremental, well, then,
 that's replication, and we already have that.


Think of plain text backups and last resort upgrade paths. Also, it
wouldn't have validation docs run on it  or anything of that nature.
I'm thinking basically of having a multipart/mime stream
representation of the database that follows the update sequence. And
the _dump would allow for a ?since= parameter that would make it
incremental. This would even give people the ability to do daily logs
and so on.

 -Randall


 On Tue, Aug 16, 2011 at 11:40, Adam Kocoloski kocol...@apache.org wrote:

 Hi Jean-Pierre, I'm not quite sure I follow that line of reasoning.  A user
 with _admin privileges on the database can easily remove any validation
 functions prior to writing today.  In my proposal skipping validation would
 require _admin rights and an explicit opt-in on a per-request basis.  What
 are you trying to guard against with those validation functions?  Best,

 Adam

 On Aug 16, 2011, at 2:29 PM, Jean-Pierre Fiset wrote:

  I understand the issue brought by Adam since in our CouchDb application,
 there is a need to have a replicator role and the validation functions skip
 most of the tests if the role is set for the current user.
 
  On the other hand, at the current time, I am not in favour of making
 super users for the sake of replication. Although it might solve the
 particular problem stated, it removes the ability for a design document to
 enforce some invariant properties of a database.
 
  Since there is already a way to allow a replicator to perform any
 changes (role + proper validation function), I do not see the need for this
 change. Since the super replicator user removes the ability that a database
 has to protect the consistency of its data, and that there does not seem to
 be a work-around, I would rather not see this change pushed to CouchDb.
 
  JP
 
  On 11-08-16 10:26 AM, Adam Kocoloski wrote:
  One of the principal uses of the replicator is to make this database
 look like that one.  We're unable to do that in the general case today
 because of the combination of validation functions and out-of-order document
 transfers.  It's entirely possible for a document to be saved in the source
 DB prior to the installation of a ddoc containing a validation function that
 would have rejected the document, for the replicator to install the ddoc in
 the target DB before replicating the other document, and for the other
 document to then be rejected by the target DB.
 
  I propose we add a role which allows a user to bypass validation, or
 else extend that privilege to the _admin role.  We should still validate
 updates by default and add a way (a new qs param, for instance) to indicate
 that validation should be skipped for a particular update.  Thoughts?
 
  Adam
 





Re: The replicator needs a superuser mode

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 16:23, Paul Davis paul.joseph.da...@gmail.comwrote:

 On Tue, Aug 16, 2011 at 4:46 PM, Randall Leeds randall.le...@gmail.com
 wrote:
  -1 on _skip_validation and new role
 
  One can always write a validation document that considers the role, no?
 Why
  can't users who need this functionality craft a validation function for
 this
  purpose? This sounds like a blog post and not a database feature.
 
  +0 on _dump/_load
 
  If it ships raw .couch files I'm totally against it because I think the
 HTTP
  API should remain as independent of implementation details as possible.
 If
  it is non-incremental I don't see significant benefit, unless it's just
 to
  traverse the document index and ignore the sequence index as a way to
 skip
  reads, but this seems like a weak argument. If it's incremental, well,
 then,
  that's replication, and we already have that.
 

 Think of plain text backups and last resort upgrade paths. Also, it
 wouldn't have validation docs run on it  or anything of that nature.
 I'm thinking basically of having a multipart/mime stream
 representation of the database that follows the update sequence. And
 the _dump would allow for a ?since= parameter that would make it
 incremental. This would even give people the ability to do daily logs
 and so on.


Right-o. I don't feel strongly about it, like I said, and think it could be
easily crafted as a plugin if we get *that* situation sorted out.
How's my assessment of the need for a special role or validation skipping,
though? Am I right that one could just create a smart validation function?



  -Randall
 
 
  On Tue, Aug 16, 2011 at 11:40, Adam Kocoloski kocol...@apache.org
 wrote:
 
  Hi Jean-Pierre, I'm not quite sure I follow that line of reasoning.  A
 user
  with _admin privileges on the database can easily remove any validation
  functions prior to writing today.  In my proposal skipping validation
 would
  require _admin rights and an explicit opt-in on a per-request basis.
  What
  are you trying to guard against with those validation functions?  Best,
 
  Adam
 
  On Aug 16, 2011, at 2:29 PM, Jean-Pierre Fiset wrote:
 
   I understand the issue brought by Adam since in our CouchDb
 application,
  there is a need to have a replicator role and the validation functions
 skip
  most of the tests if the role is set for the current user.
  
   On the other hand, at the current time, I am not in favour of making
  super users for the sake of replication. Although it might solve the
  particular problem stated, it removes the ability for a design document
 to
  enforce some invariant properties of a database.
  
   Since there is already a way to allow a replicator to perform any
  changes (role + proper validation function), I do not see the need for
 this
  change. Since the super replicator user removes the ability that a
 database
  has to protect the consistency of its data, and that there does not seem
 to
  be a work-around, I would rather not see this change pushed to CouchDb.
  
   JP
  
   On 11-08-16 10:26 AM, Adam Kocoloski wrote:
   One of the principal uses of the replicator is to make this database
  look like that one.  We're unable to do that in the general case today
  because of the combination of validation functions and out-of-order
 document
  transfers.  It's entirely possible for a document to be saved in the
 source
  DB prior to the installation of a ddoc containing a validation function
 that
  would have rejected the document, for the replicator to install the ddoc
 in
  the target DB before replicating the other document, and for the other
  document to then be rejected by the target DB.
  
   I propose we add a role which allows a user to bypass validation, or
  else extend that privilege to the _admin role.  We should still validate
  updates by default and add a way (a new qs param, for instance) to
 indicate
  that validation should be skipped for a particular update.  Thoughts?
  
   Adam
  
 
 
 



Bug or my lack of understanding? Reduce output must shrink more rapidly

2011-08-16 Thread Chris Stockton
Hello,

I have been able to reduce a complex case where a certain sized
document within our application causes Reduce output must shrink more
rapidly errors and I am not sure I understand why. I spent a great
deal of time making sure I have stripped the database, the documents
and the views to the bare minimum to make it easy to reproduce, I
would really appreciate if anyone could give me some insight on what
is causing this and if a fix exists, may it be ini settings etc. I
apologize in advanced if this is my lack of understanding views or how
they work as well as to this email being a bit long, but I think it is
required to express the issue in case it is indeed a bug.

Kind Regards,

-Chris

--Reproduce steps--

1) CouchDB Production release 1.10

2) Create a fresh database

3) Create the following design document

  {
   _id: _design/test,
   _rev: 1-19eb11313c2602a00f0105f78202d1f3,
   views: {
   Grid: {
   map: function(doc) {\n  emit(\result\, doc.data);\n},
   reduce: function(keys, values, rereduce) {\n  var
container = {};\n\n  if(!rereduce) {\nfor(var value in values) {\n
 for(var col in values[value]) {\nif(values[value]) {\n
  if(!container[col]) {\ncontainer[col] = {\n
total: 0\n};\n  }\n\n
container[col].total++;\n}\n  }\n}\n  } else {\n
for(var reduced in values) {\n  for(var col in values[reduced])
{\nif(!container[col]) {\n  container[col] = {\n
 total: 0\n  };\n}\n\ncontainer[col].total
+= values[reduced][col].total;\n  }\n}\n  }\n\n  return
container;\n}
   }
   },
   language: javascript
  }

4) Create the following regular document (any id is okay)
  {
   _id: 4334dff68f2283e6e8739eabb40a4e7a,
   _rev: 24-524e9c9ebeaf88962f41e3a940788610,
   data: {
   C003089: c1,
   C006990: c2,
   C009996: c3,
   C012132: c4,
   C015574: c5,
   C018908: c6,
   C021545: c7,
   C024392: c8,
   C027281: c9,
   C030392: c10,
   C033457: null,
   C036671: null,
   C039663: null,
   C042967: null,
   C045398: null,
   C048160: null,
   C051924: null,
   C054920: null,
   C057239: null,
   C060993: null,
   C063309: null,
   C066352: null,
   C069003: null,
   C072467: null,
   C075210: null
   }
  }

5) Call the view, just a typical call no arguments
  http://SERVER:5984/db_24/_design/test/_view/Grid

6) Verify the response is CORRECT
  
{rows:[{key:null,value:{C003089:{total:1},C006990:{total:1},C009996:{total:1},C012132:{total:1},C015574:{total:1},C018908:{total:1},C021545:{total:1},C024392:{total:1},C027281:{total:1},C030392:{total:1},C033457:{total:1},C036671:{total:1},C039663:{total:1},C042967:{total:1},C045398:
 
{total:1},C048160:{total:1},C051924:{total:1},C054920:{total:1},C057239:{total:1},C060993:{total:1},C063309:{total:1},C066352:{total:1},C069003:
   {total:1},C072467:{total:1},C075210:{total:1}}}
  ]}

7) Now, delete the previous document and add the following:
  {
   _id: 4334dff68f2283e6e8739eabb40a4e7a,
   _rev: 24-524e9c9ebeaf88962f41e3a940788610,
   data: {
   C003089: c1,
   C006990: c2,
   C009996: c3,
   C012132: c4,
   C015574: c5,
   C018908: c6,
   C021545: c7,
   C024392: c8,
   C027281: c9,
   C030392: c10,
   C033457: null,
   C036671: null,
   C039663: null,
   C042967: null,
   C045398: null,
   C048160: null,
   C051924: null,
   C054920: null,
   C057239: null,
   C060993: null,
   C063309: null,
   C066352: null,
   C069003: null,
   C072467: null,
   C075210: null,
   C078387: null
   }
  }

8) Note that all we did was add a single property to the end of
data, now run the same view again

9) Notice the error:
  {error:reduce_overflow_error,reason:Reduce output must shrink
more rapidly: Current output:
'[{\C003089\:{\total\:1},\C006990\:{\total\:1},\C009996\:{\total\:1},\C012132\:{\total\:1},\C015574\:'...
(first 100 of 575 bytes)}

10) I am confused because all I did is add a single property, not sure
how this affects the reduce function?


Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:

 -1 on _skip_validation and new role
 
 One can always write a validation document that considers the role, no? Why
 can't users who need this functionality craft a validation function for this
 purpose? This sounds like a blog post and not a database feature.

Blech, really?

Q: What request do I issue to guarantee all my documents are stored in this 
other database?

A: Unpossible.

Practically speaking we need it at Cloudant because we use replication to move 
users' databases between clusters.  If it's not seen as generally useful that's 
ok, just surprising.  Best,

Adam

Re: The replicator needs a superuser mode

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 17:03, Adam Kocoloski kocol...@apache.org wrote:

 On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:

  -1 on _skip_validation and new role
 
  One can always write a validation document that considers the role, no?
 Why
  can't users who need this functionality craft a validation function for
 this
  purpose? This sounds like a blog post and not a database feature.

 Blech, really?

 Q: What request do I issue to guarantee all my documents are stored in this
 other database?

 A: Unpossible.

 Practically speaking we need it at Cloudant because we use replication to
 move users' databases between clusters.  If it's not seen as generally
 useful that's ok, just surprising.  Best,


I understand the motivation a little better now. I'm not sure it's generally
useful. I think _dump/_load might be, but I'd rather see users craft around
validation as part of their replication strategy rather than increase the
query option population.

I'm not sure I'm against admin user context bypassing validation docs,
though.


Re: Bug or my lack of understanding? Reduce output must shrink more rapidly

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 17:03, Chris Stockton chrisstockto...@gmail.comwrote:

 Hello,

 I have been able to reduce a complex case where a certain sized
 document within our application causes Reduce output must shrink more
 rapidly errors and I am not sure I understand why. I spent a great
 deal of time making sure I have stripped the database, the documents
 and the views to the bare minimum to make it easy to reproduce, I
 would really appreciate if anyone could give me some insight on what
 is causing this and if a fix exists, may it be ini settings etc. I
 apologize in advanced if this is my lack of understanding views or how
 they work as well as to this email being a bit long, but I think it is
 required to express the issue in case it is indeed a bug.

 Kind Regards,

 -Chris

 --Reproduce steps--

 1) CouchDB Production release 1.10

 2) Create a fresh database

 3) Create the following design document

  {
   _id: _design/test,
   _rev: 1-19eb11313c2602a00f0105f78202d1f3,
   views: {
   Grid: {
   map: function(doc) {\n  emit(\result\, doc.data);\n},
   reduce: function(keys, values, rereduce) {\n  var
 container = {};\n\n  if(!rereduce) {\nfor(var value in values) {\n
 for(var col in values[value]) {\nif(values[value]) {\n
  if(!container[col]) {\ncontainer[col] = {\n
total: 0\n};\n  }\n\n
 container[col].total++;\n}\n  }\n}\n  } else {\n
 for(var reduced in values) {\n  for(var col in values[reduced])
 {\nif(!container[col]) {\n  container[col] = {\n
 total: 0\n  };\n}\n\ncontainer[col].total
 += values[reduced][col].total;\n  }\n}\n  }\n\n  return
 container;\n}
   }
   },
   language: javascript
  }

 4) Create the following regular document (any id is okay)
  {
   _id: 4334dff68f2283e6e8739eabb40a4e7a,
   _rev: 24-524e9c9ebeaf88962f41e3a940788610,
   data: {
   C003089: c1,
   C006990: c2,
   C009996: c3,
   C012132: c4,
   C015574: c5,
   C018908: c6,
   C021545: c7,
   C024392: c8,
   C027281: c9,
   C030392: c10,
   C033457: null,
   C036671: null,
   C039663: null,
   C042967: null,
   C045398: null,
   C048160: null,
   C051924: null,
   C054920: null,
   C057239: null,
   C060993: null,
   C063309: null,
   C066352: null,
   C069003: null,
   C072467: null,
   C075210: null
   }
  }

 5) Call the view, just a typical call no arguments
  http://SERVER:5984/db_24/_design/test/_view/Grid

 6) Verify the response is CORRECT

  
 {rows:[{key:null,value:{C003089:{total:1},C006990:{total:1},C009996:{total:1},C012132:{total:1},C015574:{total:1},C018908:{total:1},C021545:{total:1},C024392:{total:1},C027281:{total:1},C030392:{total:1},C033457:{total:1},C036671:{total:1},C039663:{total:1},C042967:{total:1},C045398:

  
 {total:1},C048160:{total:1},C051924:{total:1},C054920:{total:1},C057239:{total:1},C060993:{total:1},C063309:{total:1},C066352:{total:1},C069003:
   {total:1},C072467:{total:1},C075210:{total:1}}}
  ]}

 7) Now, delete the previous document and add the following:
  {
   _id: 4334dff68f2283e6e8739eabb40a4e7a,
   _rev: 24-524e9c9ebeaf88962f41e3a940788610,
   data: {
   C003089: c1,
   C006990: c2,
   C009996: c3,
   C012132: c4,
   C015574: c5,
   C018908: c6,
   C021545: c7,
   C024392: c8,
   C027281: c9,
   C030392: c10,
   C033457: null,
   C036671: null,
   C039663: null,
   C042967: null,
   C045398: null,
   C048160: null,
   C051924: null,
   C054920: null,
   C057239: null,
   C060993: null,
   C063309: null,
   C066352: null,
   C069003: null,
   C072467: null,
   C075210: null,
   C078387: null
   }
  }

 8) Note that all we did was add a single property to the end of
 data, now run the same view again

 9) Notice the error:
  {error:reduce_overflow_error,reason:Reduce output must shrink
 more rapidly: Current output:

 '[{\C003089\:{\total\:1},\C006990\:{\total\:1},\C009996\:{\total\:1},\C012132\:{\total\:1},\C015574\:'...
 (first 100 of 575 bytes)}

 10) I am confused because all I did is add a single property, not sure
 how this affects the reduce function?


Since you are collecting and creating keys in the output object creating
this single property made the output of reduce larger. CouchDB tries to
detect reduce functions that don't actually reduce the data. If you know for
sure that you are working with a bounded set of properties whose occurrences
you would like to sum you may set reduce_limit=false in your configuration.
The default is true so that users don't shoot themselves in the foot
(especially because you cannot cancel a run-away reduce if you don't have
access to the machine to kill the process).


Re: Bug or my lack of understanding? Reduce output must shrink more rapidly

2011-08-16 Thread Chris Stockton
Hello,

On Tue, Aug 16, 2011 at 5:37 PM, Randall Leeds randall.le...@gmail.com wrote:
 On Tue, Aug 16, 2011 at 17:03, Chris Stockton 
 chrisstockto...@gmail.comwrote:

 Since you are collecting and creating keys in the output object creating
 this single property made the output of reduce larger. CouchDB tries to
 detect reduce functions that don't actually reduce the data. If you know for
 sure that you are working with a bounded set of properties whose occurrences
 you would like to sum you may set reduce_limit=false in your configuration.
 The default is true so that users don't shoot themselves in the foot
 (especially because you cannot cancel a run-away reduce if you don't have
 access to the machine to kill the process).


Thanks Randall for your reply, I changed my view call to [1] and oddly
it still gives the same error, maybe I am doing something wrong? I
didn't see anywhere on couchdb wiki anything for reduce_limit.
Although I think long term that kind of scares me a little bit, if for
some reason we ran across some new data that caused a infinite reduce
due to a bug, our couchdbs would all get crippled, do I have any other
options here?

It would be great if I could impose a size limit for reduce, or even a
minimum size limit, as it is odd to trigger a reduce error on the
first record, making it have to run at least 100 times should be a
good test to see if the data is shrinking or at least remaining
constant. Not sure what to suggest here beyond that, I just think it
doesn't feel quite right, maybe someone has some better suggestion.

[1] http://server:59841/db_24/_design/test/_view/Grid?reduce_limit=false


Re: Configuration Load Order

2011-08-16 Thread Jason Smith
On Wed, Aug 17, 2011 at 5:03 AM, Randall Leeds randall.le...@gmail.com wrote:
 On Tue, Aug 16, 2011 at 11:33, Jan Lehnardt j...@apache.org wrote:


 On Aug 16, 2011, at 8:31 PM, Noah Slater wrote:

 
  On 16 Aug 2011, at 10:33, Benoit Chesneau wrote:
 
  Imo we shouldn't at all provide plaintext passwords. Maybe a safer
  option would be to let the admin create the first one via http or put
  the hash in the a password.ini file manually. If we are enough kind we
  could also provide a couchctl script allowing user management, config
  changes ... ?
 
  This sounds like a decent proposal. Much like you have to use htpasswd to
 generate passwords for Apache httpd, we could bundle a script that lets you
 generate passwords for the CouchDB ini files, and then forbid the use of
 plaintext. This solves both the technical problem (I think?) and helps us
 re-enforce better security practices across the board.

 Agreed.


 Agreed also. We still have a question about load and save order.
 One idea would be to track the .ini file from whence an option came. If an
 option comes from a local.ini or local.d/ file it could be updated in place.
 If it comes from a default.ini or default.d/ file, updates should be placed
 in local.ini. This would make the most sense to me.

 I would also be in favor of enforcing a load order that supports a directory
 structure like:
 local.d/
  010-stuff.ini
  020-others.ini

IMHO, this is madness.

The American quip goes: the professor who never even ran for dog
catcher presumes to tell the president how to do his job. Developers
who spend all day in ./utils/run pontificate about good daemon
behavior in an OS or distribution.

(I don't *really* believe this. I know several of you are responsible
for production couches, but that is the flash-bulb image in my mind.)
I don't feel strongly on the matter, just want to share a sysadmin's
perspective. Any of the proposals would be an improvement, so I'm
net-happy.

Some final apologist thoughts:

My proposal is already implemented. Now I say promote HTTP config
(Futon) over .ini files when possible. Integrators, packagers, and
advanced sysadmins can attack the .ini files just as before.

CouchDB stores versioned data, with a powerful validation and audit
tool (potentially, I'm thinking about validate_doc_update and log()).
Now we are invoking use cases of versioning the config, and auditing
it. Wow! My point is not that the config (or some of it) should be in
a database, but that the config should (1) *lose* complexity over
time, not gain it; and (2) be deprecated as an implementation detail,
or just for advanced users.

Config files that change themselves are bizarre and scary. If that's
what we've got, fine, but make it as simple as possible.

Admins, passwords, and non-boostrappy configuration over HTTP seems
more Couch-like, more of the web, and more relaxed.

Take a MySQL admin, or an admin of Drupal, Wordpress, Moodle, Joomla,
or pretty much any big PHP application. Tell them this: You have to
get CouchDB up in the first place. So you edit some config files. Once
it's up, you connect with your client/browser. It assumes you are an
admin, and you complete installation over that interface. They would
respond: Yeah, sure.

I do not buy the misbehaving Couch scenario. Firstly, how common is
that? After installation and confirmation, daemons get pretty stable.
If a misconfiguration totally destroys the couch, well, they are still
plain text. As before, load emacs and go for it!

Finally, I am basically happy with the Couch config. It's quirky but
not too bad. I only hope to share a fresh perspective: the viewpoint
of people for whom couch is just another daemon, like MySQL or httpd
or cron.

-- 
Iris Couch


Re: The replicator needs a superuser mode

2011-08-16 Thread Jason Smith
On Tue, Aug 16, 2011 at 10:24 PM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and /db/_restore 
 endpoints

Jan, I also had that dream at CouchOne, but now I think it is a very bad idea.

A database is a URL. Every URL is different. Cloning URL_A to URL_B is
tempting, but fundamentally anti-CouchDB.

There is a reason the security object does not replicate. Every URL
(or origin) is a different security environment, and it is meaningless
or wrong to apply A's security object to B's database.

Validation functions decide what to allow based on userCtx and secObj.
Both of those change (generally) with the URL. Cloning one database to
another IMO spits in the face of the architecture and philosophy of
replication.

IMHO, cloning a *database* is not desirable. Long-term, you really
want to replicate a database.

Cloning a *couch* (GET /_dump, PUT /_restore) would be awesome. That
is the right abstraction level. Among other reasons, it can include
the config. Maybe that is mission creep.

-- 
Iris Couch


Re: Bug or my lack of understanding? Reduce output must shrink more rapidly

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 17:53, Chris Stockton chrisstockto...@gmail.comwrote:

 Hello,

 On Tue, Aug 16, 2011 at 5:37 PM, Randall Leeds randall.le...@gmail.com
 wrote:
  On Tue, Aug 16, 2011 at 17:03, Chris Stockton chrisstockto...@gmail.com
 wrote:
 
  Since you are collecting and creating keys in the output object creating
  this single property made the output of reduce larger. CouchDB tries to
  detect reduce functions that don't actually reduce the data. If you know
 for
  sure that you are working with a bounded set of properties whose
 occurrences
  you would like to sum you may set reduce_limit=false in your
 configuration.
  The default is true so that users don't shoot themselves in the foot
  (especially because you cannot cancel a run-away reduce if you don't have
  access to the machine to kill the process).
 

 Thanks Randall for your reply, I changed my view call to [1] and oddly
 it still gives the same error, maybe I am doing something wrong? I
 didn't see anywhere on couchdb wiki anything for reduce_limit.
 Although I think long term that kind of scares me a little bit, if for
 some reason we ran across some new data that caused a infinite reduce
 due to a bug, our couchdbs would all get crippled, do I have any other
 options here?

 It would be great if I could impose a size limit for reduce, or even a
 minimum size limit, as it is odd to trigger a reduce error on the
 first record, making it have to run at least 100 times should be a
 good test to see if the data is shrinking or at least remaining
 constant. Not sure what to suggest here beyond that, I just think it
 doesn't feel quite right, maybe someone has some better suggestion.

 [1] http://server:59841/db_24/_design/test/_view/Grid?reduce_limit=false


After this I'll tell you about how you change that setting, but you should
consider restructuring your map/reduce:

For example, instead of building an object with these counts in memory and
trying to reduce them over reduce/rereduce just emit multiple rows.

map:
for (var col in doc) {
   emit(col, 1);
}

reduce:
_sum

This way you can use the built-in reduction by specifying just the string
_sum as your reduce, which is much more efficient than doing it yourself.
Also, you don't hit reduce limit.

Anyway, in case you *do* work with your own installation and want to break
the reduce limit sometime, here's how:

If you look in default.ini you will see the section [query_server_config]
with reduce_limit = true.
You could put something like this in your local.ini:

[query_server_config]
reduce_limit = false

If you don't have access to the box you should be able to issue:
PUT http://server/_config/query_server_config/reduce_limit
The body of the request should be the quoted json string false.

For example, with cURL, you might do:
curl -XPUT -HContent-Type: application/json -d'false' http://
server/_config/query_server_config/reduce_limit
(Note that the data here is single and double quoted to ensure the double
quotes are passed as part of the body and not removed by the shell.)

If you get an error, e.g., because you're using IrisCouch or something other
service which locks down the installation a bit, you'll have to contact
their support.


Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
On Aug 16, 2011, at 8:20 PM, Randall Leeds wrote:

 On Tue, Aug 16, 2011 at 17:03, Adam Kocoloski kocol...@apache.org wrote:
 
 On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:
 
 -1 on _skip_validation and new role
 
 One can always write a validation document that considers the role, no?
 Why
 can't users who need this functionality craft a validation function for
 this
 purpose? This sounds like a blog post and not a database feature.
 
 Blech, really?
 
 Q: What request do I issue to guarantee all my documents are stored in this
 other database?
 
 A: Unpossible.
 
 Practically speaking we need it at Cloudant because we use replication to
 move users' databases between clusters.  If it's not seen as generally
 useful that's ok, just surprising.  Best,
 
 
 I understand the motivation a little better now. I'm not sure it's generally
 useful. I think _dump/_load might be, but I'd rather see users craft around
 validation as part of their replication strategy rather than increase the
 query option population.
 
 I'm not sure I'm against admin user context bypassing validation docs,
 though.

That's interesting.  It sounds like you're motivated to minimize the surface 
area of the API.  I can respect that.  I'm not sure I like _admins 
automatically bypassing validation, though, because we already require _admin 
to update _design docs, so it's not as if we can make the use of _admin 
particularly rare.  Will think on it.  Best,

Adam

[jira] [Updated] (COUCHDB-1246) CouchJS process spawned and not killed on each Reduce Overflow Error

2011-08-16 Thread Filipe Manana (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-1246:
---

Attachment: os_pool_trunk.patch

Paul, as we were discussing on IRC, I've a reproducible case where querying a 
view just hangs and we get os pool full, blocking subsequent requests.

I attach here a wip patch, which adds an etap test (your patch is making this 
test fail).


 CouchJS process spawned and not killed on each Reduce Overflow Error
 

 Key: COUCHDB-1246
 URL: https://issues.apache.org/jira/browse/COUCHDB-1246
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 1.1
 Environment: Linux Debian Squeeze
 [query_server_config]
 reduce_limit = true
 os_process_limit = 25
Reporter: Michael Newman
 Attachments: COUCHDB-1246.patch, categories, os_pool_trunk.patch


 Running the view attached results in a reduce_overflow_error. For each 
 reduce_overflow_error a process of /usr/lib/couchdb/bin/couchjs 
 /usr/share/couchdb/server/main.js starts running. Once this gets to 25, which 
 is the os_process_limit by default, all views result in a server error: 
 timeout {gen_server,call,[couch_query_servers,{get_proc,javascript}]}
 As far as I can tell, these processes and the non-response from the views 
 will continue until couch is restarted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1246) CouchJS process spawned and not killed on each Reduce Overflow Error

2011-08-16 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086083#comment-13086083
 ] 

Paul Joseph Davis commented on COUCHDB-1246:


Filipe,

Awesome work on the test. When I stared at it last I was contemplating writing 
a huge monolithic thing to setup a view and do all that crazy stuff. Knowing we 
can work down at the os process level should allow us to get a better handle on 
this.

I'll try looking at this closer tonight or tomorrow.

 CouchJS process spawned and not killed on each Reduce Overflow Error
 

 Key: COUCHDB-1246
 URL: https://issues.apache.org/jira/browse/COUCHDB-1246
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 1.1
 Environment: Linux Debian Squeeze
 [query_server_config]
 reduce_limit = true
 os_process_limit = 25
Reporter: Michael Newman
 Attachments: COUCHDB-1246.patch, categories, os_pool_trunk.patch


 Running the view attached results in a reduce_overflow_error. For each 
 reduce_overflow_error a process of /usr/lib/couchdb/bin/couchjs 
 /usr/share/couchdb/server/main.js starts running. Once this gets to 25, which 
 is the os_process_limit by default, all views result in a server error: 
 timeout {gen_server,call,[couch_query_servers,{get_proc,javascript}]}
 As far as I can tell, these processes and the non-response from the views 
 will continue until couch is restarted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: The replicator needs a superuser mode

2011-08-16 Thread Dale Harvey
On 17 August 2011 02:47, Adam Kocoloski kocol...@apache.org wrote:

 On Aug 16, 2011, at 8:20 PM, Randall Leeds wrote:

  On Tue, Aug 16, 2011 at 17:03, Adam Kocoloski kocol...@apache.org
 wrote:
 
  On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:
 
  -1 on _skip_validation and new role
 
  One can always write a validation document that considers the role, no?
  Why
  can't users who need this functionality craft a validation function for
  this
  purpose? This sounds like a blog post and not a database feature.
 
  Blech, really?
 
  Q: What request do I issue to guarantee all my documents are stored in
 this
  other database?
 
  A: Unpossible.
 
  Practically speaking we need it at Cloudant because we use replication
 to
  move users' databases between clusters.  If it's not seen as generally
  useful that's ok, just surprising.  Best,
 
 
  I understand the motivation a little better now. I'm not sure it's
 generally
  useful. I think _dump/_load might be, but I'd rather see users craft
 around
  validation as part of their replication strategy rather than increase the
  query option population.
 
  I'm not sure I'm against admin user context bypassing validation docs,
  though.

 That's interesting.  It sounds like you're motivated to minimize the
 surface area of the API.  I can respect that.  I'm not sure I like _admins
 automatically bypassing validation, though, because we already require
 _admin to update _design docs, so it's not as if we can make the use of
 _admin particularly rare.  Will think on it.  Best,

 Adam


Just to point out a very useful usecase for /_dump /_load endpoint, on
mobile we need to ship preloaded data / applications, I originally curl'd
design docs and PUT them on starteup, but the resulting files are large and
startup time is slow, replicating isnt an option.

Now we use .couch files to preload data, however all my stuff is in a hosted
server where I dont have access to scp (I can just copy them down to servers
where I can access .couch files, but speaking on behalf of new users /
making things as easy as possible)


Re: The replicator needs a superuser mode

2011-08-16 Thread Jason Smith
On Wed, Aug 17, 2011 at 7:03 AM, Adam Kocoloski kocol...@apache.org wrote:
 On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:

 -1 on _skip_validation and new role

 One can always write a validation document that considers the role, no? Why
 can't users who need this functionality craft a validation function for this
 purpose? This sounds like a blog post and not a database feature.

 Blech, really?

 Q: What request do I issue to guarantee all my documents are stored in this 
 other database?

 A: Unpossible.

 Practically speaking we need it at Cloudant because we use replication to 
 move users' databases between clusters.  If it's not seen as generally useful 
 that's ok, just surprising.  Best,

Adam, I'm conflicted. It feels presumptuous to disagree with you and
the developers, which I've done a lot recently.

Also, I too struggle with migrating data, verbatim, between servers
(between couches, and also between Linux boxes).

But to guarantee all my documents are stored in this other database
is actually incoherent. It is IMHO anti-CouchDB.

Validation functions, user accounts (which change from couch to
couch), and security objects (which also change from db to db, and
couch to couch) all come together to decide whether a change is
approved (valid). That is very powerful, and very fundamental.
Providing this guarantee betrays the promise that Couch makes to
developers.

People are using validation functions for government compliance, to
meet regulatory requirements (SOX, HIPAA). IIRC, you are proposing a
query parameter for Couch to disregard those instructions.

Validation functions confirm not only authorization, but also
well-formedness of the documents. So, again, in the real world, where
many people use _admin accounts, adding a ?force=true parameter sounds
dangerous.

Do you worry whether, in the wild, people will use it more and more,
like logging in to your workstation as root/Administrator? It
eliminates daily annoyances but it is actually very risky behavior.

Finally, yes, an admin can ultimately circumvent validation functions.
But to me, that is the checks and balances of real life. If you forget
your BIOS password, you can physically open the box and move a jumper.

I do agree about the need to move opaque data around. I disagree that
a query parameter should allow it. I feel the hosting provider pain.
The customer creates _design/angry with validate_doc_update:

function(newDoc, oldDoc, userCtx, secObj) {
throw {forbidden: I am _design/angry and I hate all documents!};
}

And now I am responsible for replicating their data, unmolested, all
over the place.

-- 
Iris Couch


Re: The replicator needs a superuser mode

2011-08-16 Thread Jason Smith
On Tue, Aug 16, 2011 at 9:26 PM, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because of 
 the combination of validation functions and out-of-order document transfers.  
 It's entirely possible for a document to be saved in the source DB prior to 
 the installation of a ddoc containing a validation function that would have 
 rejected the document, for the replicator to install the ddoc in the target 
 DB before replicating the other document, and for the other document to then 
 be rejected by the target DB.

Somebody asked about this on Stack Overflow. It was a very simple but
challenging question, but now I can't find it. Basically, he made your
point above.

Aren't you identifying two problems, though?

1. Sometimes you need to ignore validation to just make a nice, clean copy.
2. Replication batches (an optimization) are disobeying the change
sequence, which can screw up the replica.

I responded to #1 already.

But my feeling about #2 is that the optimization goes too far.
replication batches should always have boundaries immediately before
and after design documents. In other words, batch all you want, but
design documents [1] must always be in a batch size of 1. That will
retain the semantics.

[1] Actually, the only ddocs needing their own private batches are
those with a validate_doc_update field.

-- 
Iris Couch


Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
On Aug 16, 2011, at 10:23 PM, Jason Smith wrote:

 On Wed, Aug 17, 2011 at 7:03 AM, Adam Kocoloski kocol...@apache.org wrote:
 On Aug 16, 2011, at 5:46 PM, Randall Leeds wrote:
 
 -1 on _skip_validation and new role
 
 One can always write a validation document that considers the role, no? Why
 can't users who need this functionality craft a validation function for this
 purpose? This sounds like a blog post and not a database feature.
 
 Blech, really?
 
 Q: What request do I issue to guarantee all my documents are stored in this 
 other database?
 
 A: Unpossible.
 
 Practically speaking we need it at Cloudant because we use replication to 
 move users' databases between clusters.  If it's not seen as generally 
 useful that's ok, just surprising.  Best,
 
 Adam, I'm conflicted. It feels presumptuous to disagree with you and
 the developers, which I've done a lot recently.
 
 Also, I too struggle with migrating data, verbatim, between servers
 (between couches, and also between Linux boxes).
 
 But to guarantee all my documents are stored in this other database
 is actually incoherent. It is IMHO anti-CouchDB.

Hi Jason, we're going to have to disagree on this one.  Replication is really 
flexible and can do lots of things that database replication has not 
historically been able to do, but I think it's a sad state of affairs that it's 
not possible to use replication to create a replica of an arbitrary database.

 Validation functions, user accounts (which change from couch to
 couch), and security objects (which also change from db to db, and
 couch to couch) all come together to decide whether a change is
 approved (valid). That is very powerful, and very fundamental.
 Providing this guarantee betrays the promise that Couch makes to
 developers.

No, it doesn't.  The guarantee presumes you have _admin access to the target 
database.  Developers shouldn't give that out, just like they shouldn't give 
out root access to the server itself.

 People are using validation functions for government compliance, to
 meet regulatory requirements (SOX, HIPAA). IIRC, you are proposing a
 query parameter for Couch to disregard those instructions.

Only if you have _admin access to the database, in which case you can already 
bypass validation or do whatever else you want to the data in that database if 
you're so inclined.

 Validation functions confirm not only authorization, but also
 well-formedness of the documents. So, again, in the real world, where
 many people use _admin accounts, adding a ?force=true parameter sounds
 dangerous.

Well, yes, it would be dangerous to use on every request.

 Do you worry whether, in the wild, people will use it more and more,
 like logging in to your workstation as root/Administrator? It
 eliminates daily annoyances but it is actually very risky behavior.

Meh.  If they choose to bypass their own validation functions that's their 
concern.  I don't lose sleep over it.

 Finally, yes, an admin can ultimately circumvent validation functions.
 But to me, that is the checks and balances of real life. If you forget
 your BIOS password, you can physically open the box and move a jumper.
 
 I do agree about the need to move opaque data around. I disagree that
 a query parameter should allow it. I feel the hosting provider pain.
 The customer creates _design/angry with validate_doc_update:
 
function(newDoc, oldDoc, userCtx, secObj) {
throw {forbidden: I am _design/angry and I hate all documents!};
}
 
 And now I am responsible for replicating their data, unmolested, all
 over the place.
 
 -- 
 Iris Couch



[jira] [Updated] (COUCHDB-1246) CouchJS process spawned and not killed on each Reduce Overflow Error

2011-08-16 Thread Filipe Manana (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-1246:
---

Attachment: os_pool_trunk.patch

Thanks Paul. I'll probably be updating the patch one or two times until then.

 CouchJS process spawned and not killed on each Reduce Overflow Error
 

 Key: COUCHDB-1246
 URL: https://issues.apache.org/jira/browse/COUCHDB-1246
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 1.1
 Environment: Linux Debian Squeeze
 [query_server_config]
 reduce_limit = true
 os_process_limit = 25
Reporter: Michael Newman
 Attachments: COUCHDB-1246.patch, categories, os_pool_trunk.patch, 
 os_pool_trunk.patch


 Running the view attached results in a reduce_overflow_error. For each 
 reduce_overflow_error a process of /usr/lib/couchdb/bin/couchjs 
 /usr/share/couchdb/server/main.js starts running. Once this gets to 25, which 
 is the os_process_limit by default, all views result in a server error: 
 timeout {gen_server,call,[couch_query_servers,{get_proc,javascript}]}
 As far as I can tell, these processes and the non-response from the views 
 will continue until couch is restarted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: The replicator needs a superuser mode

2011-08-16 Thread Adam Kocoloski
On Aug 16, 2011, at 10:31 PM, Jason Smith wrote:

 On Tue, Aug 16, 2011 at 9:26 PM, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the source 
 DB prior to the installation of a ddoc containing a validation function that 
 would have rejected the document, for the replicator to install the ddoc in 
 the target DB before replicating the other document, and for the other 
 document to then be rejected by the target DB.
 
 Somebody asked about this on Stack Overflow. It was a very simple but
 challenging question, but now I can't find it. Basically, he made your
 point above.
 
 Aren't you identifying two problems, though?
 
 1. Sometimes you need to ignore validation to just make a nice, clean copy.
 2. Replication batches (an optimization) are disobeying the change
 sequence, which can screw up the replica.

As far as I know the only reason one needs to ignore validation to make a nice 
clean copy is because the replicator does not guarantee the updates are applied 
on the target in the order they were received on the source.  It's all one 
issue to me.

 I responded to #1 already.
 
 But my feeling about #2 is that the optimization goes too far.
 replication batches should always have boundaries immediately before
 and after design documents. In other words, batch all you want, but
 design documents [1] must always be in a batch size of 1. That will
 retain the semantics.
 
 [1] Actually, the only ddocs needing their own private batches are
 those with a validate_doc_update field.

My standard retort to transaction boundaries is that there is no global 
ordering of events in a distributed system.  A clustered CouchDB can try to 
build a vector clock out of the change sequences of the individual servers and 
stick to that merged sequence during replication, but even then the ddoc entry 
in the feed could be concurrent with several other updates.  I rather like 
that the replicator aggressively mixes up the ordering of updates because it 
prevents us from making choices in the single-server case that aren't sensible 
in a cluster.

By the way, I don't consider this line of discussion presumptuous in the least. 
 Cheers,

Adam



[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-16 Thread Filipe Manana (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086096#comment-13086096
 ] 

Filipe Manana commented on COUCHDB-1153:


Paul,

I'm addressing all the concerns pointed before. Some of them are already done 
and be tracked in individual commits:
https://github.com/fdmanana/couchdb/commits/compaction_daemon

The thing I'm not 100% sure is how to make the config and load/start of os_mon. 
I'm not that familiar with all those OTP structuring details. I've come up with 
this so far:

http://friendpaste.com/R43WflJ8r75MupvXuS98v

Using the args_file you mentioned makes sense, but we already have some stuff 
that could be moved into that new file, so I think it should go into a separate 
change. I'll help/do it, just need to figure out exactly how to do it and 
integrate into the build system / startup scripts.


 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 

Re: The replicator needs a superuser mode

2011-08-16 Thread Jason Smith
On Wed, Aug 17, 2011 at 9:49 AM, Adam Kocoloski kocol...@apache.org wrote:
 On Aug 16, 2011, at 10:31 PM, Jason Smith wrote:

 On Tue, Aug 16, 2011 at 9:26 PM, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database look 
 like that one.  We're unable to do that in the general case today because 
 of the combination of validation functions and out-of-order document 
 transfers.  It's entirely possible for a document to be saved in the source 
 DB prior to the installation of a ddoc containing a validation function 
 that would have rejected the document, for the replicator to install the 
 ddoc in the target DB before replicating the other document, and for the 
 other document to then be rejected by the target DB.

 Somebody asked about this on Stack Overflow. It was a very simple but
 challenging question, but now I can't find it. Basically, he made your
 point above.

 Aren't you identifying two problems, though?

 1. Sometimes you need to ignore validation to just make a nice, clean copy.
 2. Replication batches (an optimization) are disobeying the change
 sequence, which can screw up the replica.

 As far as I know the only reason one needs to ignore validation to make a 
 nice clean copy is because the replicator does not guarantee the updates are 
 applied on the target in the order they were received on the source.  It's 
 all one issue to me.

 I responded to #1 already.

 But my feeling about #2 is that the optimization goes too far.
 replication batches should always have boundaries immediately before
 and after design documents. In other words, batch all you want, but
 design documents [1] must always be in a batch size of 1. That will
 retain the semantics.

 [1] Actually, the only ddocs needing their own private batches are
 those with a validate_doc_update field.

 My standard retort to transaction boundaries is that there is no global 
 ordering of events in a distributed system.  A clustered CouchDB can try to 
 build a vector clock out of the change sequences of the individual servers 
 and stick to that merged sequence during replication, but even then the ddoc 
 entry in the feed could be concurrent with several other updates.  I rather 
 like that the replicator aggressively mixes up the ordering of updates 
 because it prevents us from making choices in the single-server case that 
 aren't sensible in a cluster.

That is interesting. So if it is crucial that an application enforce
transaction semantics, then that application can go ahead and
understand the distribution architecture, and it can confirm that a
ddoc is committed and distributed among all nodes, and then it can
make subsequent changes or replications.

Or, written as a dialogue:

Developer: My application knows or cares that Couch is distributed.
Developer: My application depends on a validation function applying universally.
Developer. But my application won't bother to confirm that it's been
fully pushed before I make changes or replications.
Adam: WTF?

Snark aside, it's an excellent point. Thanks.

-- 
Iris Couch


Re: The replicator needs a superuser mode

2011-08-16 Thread Jason Smith
tl;dr response here, philosophical musings below.

1. The requirements are real, it's reasonable to want to copy from A to B
2. Replication is a whole worldview, adding ?force=true breaks that worldview
3. Dump and restore sounds more appropriate

On Wed, Aug 17, 2011 at 9:34 AM, Adam Kocoloski kocol...@apache.org wrote:
 But to guarantee all my documents are stored in this other database
 is actually incoherent. It is IMHO anti-CouchDB.

 Hi Jason, we're going to have to disagree on this one.  Replication is really 
 flexible and can do lots of things that database replication has not 
 historically been able to do, but I think it's a sad state of affairs that 
 it's not possible to use replication to create a replica of an arbitrary 
 database.

True. I agree with the requirements, but the solution raises a red flag.

My understanding of couch:

There is no such thing as a database (or data set) clone. There is no
such thing as a database copy. There is no such thing as two databases
with the same document. It's like Pauli's exclusion principle. Sure,
maybe the doc and rev history are the same, but the _security object,
the authentication environment, and the URI are different. That
(generally) affects how applications and validation works.

Put another way, this idea is a leaky abstraction. I much prefer Jan's
_dump and _restore idea. It has some difficulties, but it is *not*
replication. It's something totally different. In the universe of a
database, replication always follows the rules. In the universe of a
Couch, sure, sometimes you need to clone data around. There's an
appropriate action for each abstraction layer.

The nice thing about _dump and _restore, and also rsync, is that you
make full, opaque clones (not replicas!). You can't merge or splice
data sets. Once you are talking about merging data, or pulling out a
subset, now you are in database land, not couch land, and you have to
follow the rules of replication.

-- 
Iris Couch


Re: The replicator needs a superuser mode

2011-08-16 Thread Randall Leeds
On Tue, Aug 16, 2011 at 20:37, Jason Smith j...@iriscouch.com wrote:

 The nice thing about _dump and _restore, and also rsync, is that you
 make full, opaque clones (not replicas!). You can't merge or splice
 data sets. Once you are talking about merging data, or pulling out a
 subset, now you are in database land, not couch land, and you have to
 follow the rules of replication.


Yeah, this is what I'm thinking, too. Except I'd reverse couch and database
:)


[jira] [Updated] (COUCHDB-1246) CouchJS process spawned and not killed on each Reduce Overflow Error

2011-08-16 Thread Filipe Manana (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-1246:
---

Attachment: os_pool_trunk.patch

 CouchJS process spawned and not killed on each Reduce Overflow Error
 

 Key: COUCHDB-1246
 URL: https://issues.apache.org/jira/browse/COUCHDB-1246
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 1.1
 Environment: Linux Debian Squeeze
 [query_server_config]
 reduce_limit = true
 os_process_limit = 25
Reporter: Michael Newman
 Attachments: COUCHDB-1246.patch, categories, os_pool_trunk.patch, 
 os_pool_trunk.patch, os_pool_trunk.patch


 Running the view attached results in a reduce_overflow_error. For each 
 reduce_overflow_error a process of /usr/lib/couchdb/bin/couchjs 
 /usr/share/couchdb/server/main.js starts running. Once this gets to 25, which 
 is the os_process_limit by default, all views result in a server error: 
 timeout {gen_server,call,[couch_query_servers,{get_proc,javascript}]}
 As far as I can tell, these processes and the non-response from the views 
 will continue until couch is restarted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira