[jira] Updated: (COUCHDB-597) Replication tasks crash.

2010-02-27 Thread Randall Leeds (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Leeds updated COUCHDB-597:
--

Attachment: 597_fixes.patch

Corrects problems with continuous replication timeouts introduced by r916518 
and r916868.

> Replication tasks crash.
> 
>
> Key: COUCHDB-597
> URL: https://issues.apache.org/jira/browse/COUCHDB-597
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Robert Newson
> Fix For: 0.11
>
> Attachments: 
> 0001-changes-replication-timeouts-and-att.-fixes-COUCHDB-.patch, 
> 597_fixes.patch, couchdb_597.patch
>
>
> If I kick off 10 replication tasks in quick succession, occasionally one or 
> two of the replication tasks will die and not be resumed. It seems that the 
> stat tracking is a little buggy, and under stress can eventually cause a 
> permanent failure of the supervised replication task;
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [<0.80.0>] {error_report,<0.30.0>,
> {<0.80.0>,supervisor_report,
>  [{supervisor,{local,couch_rep_sup}},
>   {errorContext,shutdown_error},
>   {reason,killed},
>   {offender,
>   [{pid,<0.6700.11>},
>{name,"fcbb13200a1618cf983b347f4d2c9835+create_target"},
>{mfa,
>{gen_server,start_link,
>[couch_rep,
> ["fcbb13200a1618cf983b347f4d2c9835",
>  {[{<<"create_target">>,true},
>{<<"source">>,<<"http://node:5984/perf-p2";>>},
>{<<"target">>,<<"perf-p2">>}]},
>  {user_ctx,null,[<<"_admin">>]}],
> []]}},
>{restart_type,temporary},
>{shutdown,1},
>{child_type,worker}]}]}}
> [Fri, 11 Dec 2009 19:00:08 GMT] [error] [emulator] Error in process 
> <0.6705.11> with exit value: 
> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: /_all_dbs and security

2010-02-27 Thread J Chris Anderson

On Feb 27, 2010, at 10:32 AM, Filipe David Manana wrote:

> Dear devs,
> 
> Currently, the URI handler for /_all_dbs just lists, recursively, all the db
> files in the database dir (parameter database_dir of the .ini file).
> 
> Since he have now a _security object per DB (I dunno why it's not a regular
> doc) which allows to restrict access to each DB, that code is no longer
> fair. It makes sense that this handler just returns a list of the DBs an
> user has access to.
> 
> It's through this URI that for example Futon lists the available DBs.
> 
> There's a ticket for this: https://issues.apache.org/jira/browse/COUCHDB-661
> 
> That solution is acceptable if the number of DBs in the server is "just" up
> to about 10 000 or so. I tested with 7500 DBs, each occupying about 1Mb and
> having 100 docs, and the response time for _all_dbs was about 4 seconds
> (more details in the comments of that ticket).
> 
> The problem is that for each DB file found, one has to read its header and
> then read its _security object to figure out if the session user can access
> that DB. Therefore, we have 2 disk read operations for each DB file. 1
> million DBs would imply 2 million disk reads.
> 
> Obviously an efficient solution for this would be to have a view which maps
> users to DBs. I have an incomplete idea for this.
> What I thought about is the following:
> 
> 1) Having a special db, named "_dbs" (for example) which would contain meta
> information about every available DB (like the meta tables in Oracle, SQL
> Server, and so on).
> 
> 2) That DB would contain a doc for each available DB. Each doc would contain
> the reader names and roles associated to the corresponding DB (this is the
> only kind of info we need for _all_dbs)
> 
> 3) We would have a view, like Brian Candler suggested in a comment to that
> ticket, that emits keys like:
>emit(['name',name],db)
>emit(['role',role],db)
> 
> 4) For DBs with a _security object having empty lists for both the reader
> names and reader roles, we would emit the special role "_public" for example
> 
> 5) Whenever the _security object of a DB is updated, we would update the
> corresponding reader names and roles in the _dbs DB.
> 

this is the best reason I've heard for making it a security document. I wonder 
how much slower the 7.5k dbs scan proceeds when it has to look up documents 
instead of linked objects? do you mind adding a doc-read to the tight loop just 
to see what it does to performance?

the 7.5k thing isn't important once we have a _dbs db, but the cost it will 
expose as a benchmark will be proportional to the cost incurred on opening any 
db for any operation, and thus significant.



> I though of some issues (for which I don't have a solution) :
> 
> 1)  If a user just copies DB files from elsewhere (another server or a
> backup for e.g.) into the DBs directory, how do we detect them? Scanning for
> all DB files at startup and taking proper action would be potentially slow.
> Also, if a DB file is copied while CouchDB is running, I dunno how to detect
> it. The only idea I have now is: Every time a DB file is opened (due to a
> user request), we check if _dbs has a corresponding entry and if not we take
> proper action
> 
> 2) If a user deletes a DB file manually (i.e. rm db_file.couch), how to
> detect it and remove the corresponding entry in _dbs?
> 
> 3) If a user restores a DB file backup containing an old _security object,
> we need to detect that and update the entry in _dbs. A way to do this would
> be to store the DB update seq number in the corresponding doc at _dbs and
> then using the same idea as in 1)
> 
> These are very preliminary ideas.
> 
> I would like to collect suggestions from all of you on how to implement this
> efficiently and know if you can point out any other problems I haven't
> thought about.
> 
> thanks
> 
> best regards,
> 
> -- 
> Filipe David Manana,
> fdman...@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
> 
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."



[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

2010-02-27 Thread Filipe Manana (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-639:
--

Attachment: (was: rep-att-comp-and-multipart-trunk-4.patch)

> Make replication profit of attachment compression and improve push 
> replication for large attachments
> 
>
> Key: COUCHDB-639
> URL: https://issues.apache.org/jira/browse/COUCHDB-639
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 0.11
> Environment: trunk
>Reporter: Filipe Manana
> Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and 
> then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + 
> for example). Currently it sends the attachments in-lined in the respective 
> JSON doc. Not only this requires too much ram memory, it also wastes too much 
> CPU time doing the base64 encoding of the attachment (and also a 
> decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both 
> issues. Docs containing attachments are now streamed to the target remote DB 
> using the multipart doc streaming feature provided by couch_doc.erl, and 
> compressed attachments are not uncompressed and re-compressed during the 
> replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 
> 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in 
> my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

2010-02-27 Thread Filipe Manana (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-639:
--

Attachment: rep-att-comp-and-multipart-trunk.patch

A 1 line change. Added missing call to couch_util:url_encode/1 with a doc id as 
the parameter.

> Make replication profit of attachment compression and improve push 
> replication for large attachments
> 
>
> Key: COUCHDB-639
> URL: https://issues.apache.org/jira/browse/COUCHDB-639
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 0.11
> Environment: trunk
>Reporter: Filipe Manana
> Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and 
> then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + 
> for example). Currently it sends the attachments in-lined in the respective 
> JSON doc. Not only this requires too much ram memory, it also wastes too much 
> CPU time doing the base64 encoding of the attachment (and also a 
> decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both 
> issues. Docs containing attachments are now streamed to the target remote DB 
> using the multipart doc streaming feature provided by couch_doc.erl, and 
> compressed attachments are not uncompressed and re-compressed during the 
> replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 
> 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in 
> my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



/_all_dbs and security

2010-02-27 Thread Filipe David Manana
Dear devs,

Currently, the URI handler for /_all_dbs just lists, recursively, all the db
files in the database dir (parameter database_dir of the .ini file).

Since he have now a _security object per DB (I dunno why it's not a regular
doc) which allows to restrict access to each DB, that code is no longer
fair. It makes sense that this handler just returns a list of the DBs an
user has access to.

It's through this URI that for example Futon lists the available DBs.

There's a ticket for this: https://issues.apache.org/jira/browse/COUCHDB-661

That solution is acceptable if the number of DBs in the server is "just" up
to about 10 000 or so. I tested with 7500 DBs, each occupying about 1Mb and
having 100 docs, and the response time for _all_dbs was about 4 seconds
(more details in the comments of that ticket).

The problem is that for each DB file found, one has to read its header and
then read its _security object to figure out if the session user can access
that DB. Therefore, we have 2 disk read operations for each DB file. 1
million DBs would imply 2 million disk reads.

Obviously an efficient solution for this would be to have a view which maps
users to DBs. I have an incomplete idea for this.
What I thought about is the following:

1) Having a special db, named "_dbs" (for example) which would contain meta
information about every available DB (like the meta tables in Oracle, SQL
Server, and so on).

2) That DB would contain a doc for each available DB. Each doc would contain
the reader names and roles associated to the corresponding DB (this is the
only kind of info we need for _all_dbs)

3) We would have a view, like Brian Candler suggested in a comment to that
ticket, that emits keys like:
emit(['name',name],db)
emit(['role',role],db)

4) For DBs with a _security object having empty lists for both the reader
names and reader roles, we would emit the special role "_public" for example

5) Whenever the _security object of a DB is updated, we would update the
corresponding reader names and roles in the _dbs DB.

I though of some issues (for which I don't have a solution) :

1)  If a user just copies DB files from elsewhere (another server or a
backup for e.g.) into the DBs directory, how do we detect them? Scanning for
all DB files at startup and taking proper action would be potentially slow.
Also, if a DB file is copied while CouchDB is running, I dunno how to detect
it. The only idea I have now is: Every time a DB file is opened (due to a
user request), we check if _dbs has a corresponding entry and if not we take
proper action

2) If a user deletes a DB file manually (i.e. rm db_file.couch), how to
detect it and remove the corresponding entry in _dbs?

3) If a user restores a DB file backup containing an old _security object,
we need to detect that and update the entry in _dbs. A way to do this would
be to store the DB update seq number in the corresponding doc at _dbs and
then using the same idea as in 1)

These are very preliminary ideas.

I would like to collect suggestions from all of you on how to implement this
efficiently and know if you can point out any other problems I haven't
thought about.

thanks

best regards,

-- 
Filipe David Manana,
fdman...@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."


Re: Releasing 0.11, Request for Comments

2010-02-27 Thread Filipe David Manana
On Sat, Feb 27, 2010 at 12:22 AM, J Chris Anderson  wrote:

>
> On Feb 26, 2010, at 7:00 AM, Filipe David Manana wrote:
>
> > I would still like to see ticket 639 in 0.11.
> >
>
> I'm reading 639 and it seems like a great patch. But it's a little bit big
> and I can't tell for certain that there isn't something subtle I'm not
> seeing. I think the best course of action would be to apply it just after
> the 0.11 release, so it gets some field testing before it goes out in a
> release.
>
> It's not a new feature so it is still a candidate for 0.11.1 / 1.0. I'd
> just feel better about introducing a large patch to the code base if it has
> time to get used in a few different settings before being baked into a
> release.
>

True, it's a little bit big.
In that case I don't mind having it in a 0.11.1 release.

cheers



>
> Chris
>
> > Adam told me, via IRC, he will review the patch by the end of this week.
> >
> > I vote +1 on it.
> >
> > cheers
> >
> > On Fri, Feb 26, 2010 at 3:44 PM, Noah Slater 
> wrote:
> >
> >> Nope, you can send payment to any email.
> >>
> >> As long as the recipient can click on the link in the email, they can
> >> deposit it into any account.
> >>
> >> On 26 Feb 2010, at 14:37, till wrote:
> >>
> >>> I always knew secretly open source worked like that. ;)
> >>>
> >>> Btw, I know it's not so subtle, but you need to include your paypal
> >> email... :D
> >>>
> >>> http://www.youtube.com/watch?v=Ui86peQZ74s
> >>>
> >>> On Fri, Feb 26, 2010 at 3:26 PM, Noah Slater 
> >> wrote:
>  For the list's benefit, Randall sent me $2 via PayPal, suggesting I
> buy
> >> a candy bar.
> 
>  Thanks Randall, I may unofficially call this the Randall release.
> Kinda
> >> got a ring to it, that.
> 
>  If anyone else wants to sponsor the release, you know what to do.
> 
>  On 25 Feb 2010, at 20:25, Noah Slater wrote:
> 
> > Send $1 to nsla...@tumbolia.org and I will cut it.
> >
> > On 25 Feb 2010, at 20:13, Randall Leeds wrote:
> >
> >> +1
> >> Cut it!
> >>
> >> On Feb 25, 2010 12:08 PM, "Paul Davis"  >
> >> wrote:
> >>
> >>> So really all of the patches brought up in the last few hours are
> >> good
> >> candidates for 1.0.
> >> +1
> >
> 
> 
> >>
> >>
> >
> >
> > --
> > Filipe David Manana,
> > fdman...@gmail.com
> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
> >
> > "Reasonable men adapt themselves to the world.
> > Unreasonable men adapt the world to themselves.
> > That's why all progress depends on unreasonable men."
>
>


-- 
Filipe David Manana,
fdman...@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."


[jira] Created: (COUCHDB-676) trailing slash in path cannot be recovered by external process

2010-02-27 Thread Andrew Straw (JIRA)
trailing slash in path cannot be recovered by external process
--

 Key: COUCHDB-676
 URL: https://issues.apache.org/jira/browse/COUCHDB-676
 Project: CouchDB
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: 0.11
Reporter: Andrew Straw


I modified the example given in the ExternalProcesses wiki page to return 
exactly the request line and queried the external process with both a trailing 
slash ( http://127.0.0.1:5984/test/_test/ ) and no trailing slash ( 
http://127.0.0.1:5984/test/_test ). The request line is exactly the same for 
these two cases.

This is problematic because external processes may treat these cases 
differently. For example, Django's CommonMiddleware class redirects any "path" 
to "path/" if there is a view at "path/" but not at "path", which is usually 
the case for a default view of a Django app. When using Django with 
couchdb-wsgi, this results in an infinite redirect loop because although Django 
issues a redirect to "path/", couchdb-wsgi emits "path" as the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.