Re: POST with _id

2009-08-13 Thread Kevin Jackson
 Another +1 here too - that has bitten me before...

+1 from me, we've also hit that one

Kev


[jira] Updated: (COUCHDB-69) Allow selective retaining of older revisions to a document

2009-08-13 Thread Jason Davies (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Davies updated COUCHDB-69:


Attachment: history_revs.patch

First stab at allowing old revisions to survive compaction *and* be replicated. 
 Note that this required a change to the by_seq B-tree.

I'm still working on making this configurable: currently you need to set the 
HISTORY_ENABLED macro to true in couch_db.hrl for this to be turned on.

 Allow selective retaining of older revisions to a document
 --

 Key: COUCHDB-69
 URL: https://issues.apache.org/jira/browse/COUCHDB-69
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
 Environment: All
Reporter: Jan Lehnardt
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10

 Attachments: history_revs.patch


 At the moment, compaction gets rid of all old revisions of a document. Also, 
 replication also deals with the latest revision. It would be nice if it would 
 be possible to specify a list of revisions to keep around that do not get 
 compacted away and replicated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-69) Allow selective retaining of older revisions to a document

2009-08-13 Thread Jason Davies (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Davies updated COUCHDB-69:


Attachment: history_revs.2.patch

Updated patch allowing per-db configuration.

 Allow selective retaining of older revisions to a document
 --

 Key: COUCHDB-69
 URL: https://issues.apache.org/jira/browse/COUCHDB-69
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
 Environment: All
Reporter: Jan Lehnardt
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10

 Attachments: history_revs.2.patch, history_revs.patch


 At the moment, compaction gets rid of all old revisions of a document. Also, 
 replication also deals with the latest revision. It would be nice if it would 
 be possible to specify a list of revisions to keep around that do not get 
 compacted away and replicated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Robert Newson (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Newson updated COUCHDB-465:
--

Attachment: sequence_id.patch

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Robert Newson (JIRA)
Produce sequential, but unique, document id's
-

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch

Currently, if the client does not specify an id (POST'ing a single document or 
using _bulk_docs) a random 16 byte value is created. This kind of key is 
particularly brutal on b+tree updates and the append-only nature of couchdb 
files.

Attached is a patch to change this to a two-part identifier. The first part is 
a random 12 byte value and the remainder is a counter. The random prefix is 
rerandomized when the counter reaches its maximum. The rollover in the patch is 
at 16 million but can obviously be changed. The upshot is that the b+tree is 
updated in a better fashion, which should lead to performance benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-449) Turn off delayed commits by default

2009-08-13 Thread Adam Kocoloski (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski updated COUCHDB-449:
---

Attachment: delayed_commits_v1.patch

Here's a patch to make delayed_commits a server-wide config option.  The 
setting looks like

[couchdb]
delayed_commits = true

and defaults to false.  If finer-grained control is required users can override 
the default by setting the X-Couch-Full-Commit header to true or false.

Jan mentioned enabling delayed_commits for the test suite.  I didn't do this.

 Turn off delayed commits by default
 ---

 Key: COUCHDB-449
 URL: https://issues.apache.org/jira/browse/COUCHDB-449
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9, 0.9.1
Reporter: Jan Lehnardt
Priority: Blocker
 Fix For: 0.10

 Attachments: delayed_commits_v1.patch


 Delayed commits make CouchDB significantly faster. They also open a one 
 second window for data loss. In 0.9 and trunk, delayed commits are enabled by 
 default and can be overridden with HTTP headers and an explicit API call to 
 flush the write buffer. I suggest to turn off delayed commits by default and 
 use the same overrides to enable it per request. A per-database option is 
 possible, too.
 One concern is developer workflow speed. The setting affects the test suite 
 performance significantly. I'd opt to change couch.js to set the appropriate 
 header to enable delayed commits for tests.
 CouchDB should guarantee data safety first and speed second, with sensible 
 overrides.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-466) couchdb oauth doesn't work behind reverse proxy

2009-08-13 Thread Benoit Chesneau (JIRA)
couchdb oauth doesn't work behind reverse proxy
---

 Key: COUCHDB-466
 URL: https://issues.apache.org/jira/browse/COUCHDB-466
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Benoit Chesneau
 Fix For: 0.10
 Attachments: x_forwarded_host.diff

Currently oauth doesn't work behind a reverse proxy because signature is based 
on Host. Reverse proxy like apache, lighttpd pass to the proxied server some 
header that help him to know which host is forwared. Apache send 
X-Forwarded-For, Lighttpd X-Host, 

Patch attached fix this issue by testing if a custom forwarded host header is 
present and use it as Host. If it isn't present it will use Host header of 
fallback on socket detection like it is currently. All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-466) couchdb oauth doesn't work behind reverse proxy

2009-08-13 Thread Benoit Chesneau (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Chesneau updated COUCHDB-466:


Attachment: x_forwarded_host.diff

 couchdb oauth doesn't work behind reverse proxy
 ---

 Key: COUCHDB-466
 URL: https://issues.apache.org/jira/browse/COUCHDB-466
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: x_forwarded_host.diff


 Currently oauth doesn't work behind a reverse proxy because signature is 
 based on Host. Reverse proxy like apache, lighttpd pass to the proxied server 
 some header that help him to know which host is forwared. Apache send 
 X-Forwarded-For, Lighttpd X-Host, 
 Patch attached fix this issue by testing if a custom forwarded host header is 
 present and use it as Host. If it isn't present it will use Host header of 
 fallback on socket detection like it is currently. All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742907#action_12742907
 ] 

Adam Kocoloski commented on COUCHDB-465:


Nice work, Robert!  I'm +1 on this patch.

One concern is the guessability of IDs, but if users are really concerned about 
that they can always generate their own.

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742911#action_12742911
 ] 

Robert Newson commented on COUCHDB-465:
---

Thanks!

Guessability is a concern, which means this might need to be switchable. 
Perhaps couch_seq_generator becomes couch_id_generator and an ini file chooses 
between the two strategies, defaulting to the safest, but worst-case, new_uuid 
behavior. To get good keys for b+tree insertion necessarily makes them more 
guessable as they'd have to be close to existing keys by design.

I do owe some quantitative benchmarking to support the assertions in the 
description. I did a 10k insertion test with a small document, {content: 
hello}, and average insertion rate per document was 2ms with random and 1ms 
with the patch. This was more to prove that I'd changed *something* rather than 
a measure of the actual improvement. I would expect to see improved insertion 
rates across a lot of scenarios, less difference between uncompacted and 
compacted size (barring document updates and deletes) as less of the b+tree is 
rewritten, and a smaller post-compaction size vs random. The exact extent of 
these improvements should be established by a decent benchmark. 



 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-296) A built-in conflicts view

2009-08-13 Thread Zachary Zolton (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742923#action_12742923
 ] 

Zachary Zolton commented on COUCHDB-296:


I think this is a duplicate of COUCHDB-462 track conflict count in db_info 
(was built-in conflicts view).

So, we should probably close this out, right?

 A built-in conflicts view
 -

 Key: COUCHDB-296
 URL: https://issues.apache.org/jira/browse/COUCHDB-296
 Project: CouchDB
  Issue Type: New Feature
  Components: Database Core
Affects Versions: 0.9
Reporter: Dirkjan Ochtman
Priority: Minor
 Fix For: 0.10


 It would be great if CouchDB came with a built-in db/_conflicts view. It 
 could have code like the current test/view_conflicts.js.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-422) PUT to _local/doc_id creates document '_local'

2009-08-13 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742939#action_12742939
 ] 

Adam Kocoloski commented on COUCHDB-422:


Thanks for this report, Eric.  I think _local docs should behave like _design 
docs; that is, we should accept a PUT to _local/foo just fine.

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (COUCHDB-422) PUT to _local/doc_id creates document '_local'

2009-08-13 Thread Adam Kocoloski (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski reassigned COUCHDB-422:
--

Assignee: Paul Joseph Davis

Paul, can you check into this sometime?

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-419) Replication HTTP requests lack User-Agent and Accept headers

2009-08-13 Thread Adam Kocoloski (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski closed COUCHDB-419.
--

   Resolution: Fixed
Fix Version/s: 0.10

Thanks Henri, CouchDB should now be sending these headers with replication 
requests.



 Replication HTTP requests lack User-Agent and Accept headers
 

 Key: COUCHDB-419
 URL: https://issues.apache.org/jira/browse/COUCHDB-419
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Henri Bergius
Assignee: Adam Kocoloski
 Fix For: 0.10


 Currently when making replication HTTP requests to a remote CouchDB instance, 
 CouchDB makes the HTTP requests anonymously, without providing any 
 information that it is a CouchDB instance and that it wants to receive JSON.
 Examples of what this could be:
 User-Agent: CouchDB/0.9.0
 Accept: application/json
 See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
 It would be good to add these headers to HTTP requests made by CouchDB, as it 
 makes it easier for other systems like Midgard to support the replication 
 protocol:
 http://bergie.iki.fi/blog/couchdb_and_midgard_talking_with_each_other/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-427) Trying to replicate a database from an old format increase cpu usage up to 100%

2009-08-13 Thread Adam Kocoloski (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski updated COUCHDB-427:
---

Attachment: duplicate_attachments_crash.txt

Confirmed that this is still a problem on trunk.  I'm attaching the full crash 
dump, but the key stacktrace is

[error] [0.69.0] ** Generic server 0.69.0 terminating 
** Last message in was {'EXIT',0.89.0,
   {{nocatch,
{bad_request,Duplicate attachments}},
[{couch_db,check_dup_atts,1},
 {couch_db,sort_and_check_atts,1},
 {couch_db,'-update_docs/4-lc$^3/1-3-',2},
 {couch_db,'-update_docs/4-lc$^2/1-2-',2},
 {couch_db,'-update_docs/4-lc$^2/1-2-',2},
 {couch_db,update_docs,4},
 {couch_rep_writer,writer_loop,3}]}}

 Trying to replicate a database from an old format increase cpu usage up to 
 100%
 ---

 Key: COUCHDB-427
 URL: https://issues.apache.org/jira/browse/COUCHDB-427
 Project: CouchDB
  Issue Type: Bug
 Environment: osx, ubuntu, openbsd
Reporter: Benoit Chesneau
Priority: Critical
 Fix For: 0.10

 Attachments: duplicate_attachments_crash.txt


 When you try to replicate a database from an old version of couchdb to 
 latest, the cpu usage increase up to 100% and more instead of just hanging.
 You can try to replicate from http://benoitc.im/b to latest trunk or 0.9.1 to 
 replicate this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked

2009-08-13 Thread Benoit Chesneau (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Chesneau updated COUCHDB-461:


Attachment: attachments_put_length.diff
attachments_get_length.diff

I splitted previous patch to make it easier. First patch allow couchdb to send 
attachements (GET) whithout chunked encoding. Content-Length is fixed. 

Second patch improve the handling of PUT attachments and follow the standard. 
If transfert encoding is fixed, couchdb wil first look at it and if chunked use 
it, If no Transfer-Encoding header, Content-Length is used. With this patch 
CouchDB now react like current HTTP servers which fix problems with some client 
that give error 500 on couchdb behind a proxy like Apache server.

 attachments - handle correctly chunked encoding and Content-Length, send 
 attachments unchunked
 --

 Key: COUCHDB-461
 URL: https://issues.apache.org/jira/browse/COUCHDB-461
 Project: CouchDB
  Issue Type: Bug
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: attachments_get_length.diff, 
 attachments_put_length.diff, chunked.diff, chunked2.diff


 This patch allow couchdb to send attachments unchunked, instead, 
 Content-Length is fixed and content is streamed. It also fix attachments PUT 
 by detecting first if encoding is chunked then test the length witch is the 
 standard way to do it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: POST with _id

2009-08-13 Thread Chris Anderson
On Thu, Aug 13, 2009 at 12:21 AM, Kevin Jacksonfoamd...@gmail.com wrote:
 Another +1 here too - that has bitten me before...

 +1 from me, we've also hit that one


Is there a Jira ticket open for this? I can easily imagine this thread
being lost to the sands of time.




-- 
Chris Anderson
http://jchrisa.net
http://couch.io


Re: [jira] Created: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Brian Candler
On Thu, Aug 13, 2009 at 09:06:14AM -0700, Robert Newson (JIRA) wrote:
 Attached is a patch to change this to a two-part identifier. The first
 part is a random 12 byte value and the remainder is a counter. The random
 prefix is rerandomized when the counter reaches its maximum. The rollover
 in the patch is at 16 million but can obviously be changed. The upshot is
 that the b+tree is updated in a better fashion, which should lead to
 performance benefits.

I'd like to suggest an alternative algorithm for consideration.

- first 48 bits of the UUID is the time, in milliseconds, since 1 Jan 1970

- remaining 80 bits starts as a random value and increments from there,
  for example when doing a _bulk_docs insert (*)

I have been using this algorithm for a while, generated client-side - it's
in my 'couchtiny' ruby client.

I did it this way so as to get monotonically-increasing doc ids; a view with
equal keys will sort them in order of insertion into the DB. It also avoids
having to keep a separate created_at timestamp field, because you can just
get it from the id.

  def created_at
Time.at(id[0,12].to_i(16) / 1000.0) rescue nil
  end

Of course, the fact I generate uids like this demonstrates that there's no
one-size-fits-all solution, but I just thought it was worth mentioning
because you should get the B-tree insertion boost as a side-effect too.

Regards,

Brian.

(*) It's your choice whether you want to re-randomize this when the next
millisecond comes along, or just leave it to increment as a serial number.
Even if you have multiple servers inserting documents into the same
database, the chances of them using the same serial number within the same
millisecond are infinitessimal, as long as they all start from an
independent random point within the 2^80 possibilities.

Wrapping would be very rare, but what I currently do is re-randomize for
each bulk insert, and choose a starting random value which is more than 2^32
away from the ceiling.


[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Brian Candler (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742970#action_12742970
 ] 

Brian Candler commented on COUCHDB-465:
---


I'd like to suggest an alternative algorithm for consideration.

- first 48 bits of the UUID is the time, in milliseconds, since 1 Jan 1970

- remaining 80 bits starts as a random value and increments from there,
  for example when doing a _bulk_docs insert (*)

I have been using this algorithm for a while, generated client-side - it's
in my 'couchtiny' ruby client.

I did it this way so as to get monotonically-increasing doc ids; a view with
equal keys will sort them in order of insertion into the DB. It also avoids
having to keep a separate created_at timestamp field, because you can just
get it from the id.

  def created_at
Time.at(id[0,12].to_i(16) / 1000.0) rescue nil
  end

Of course, the fact I generate uids like this demonstrates that there's no
one-size-fits-all solution, but I just thought it was worth mentioning
because you should get the B-tree insertion boost as a side-effect too.

Regards,

Brian.

(*) It's your choice whether you want to re-randomize this when the next
millisecond comes along, or just leave it to increment as a serial number.
Even if you have multiple servers inserting documents into the same
database, the chances of them using the same serial number within the same
millisecond are infinitessimal, as long as they all start from an
independent random point within the 2^80 possibilities.

Wrapping would be very rare, but what I currently do is re-randomize for
each bulk insert, and choose a starting random value which is more than 2^32
away from the ceiling.


 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742982#action_12742982
 ] 

Robert Newson commented on COUCHDB-465:
---


Another interesting algorithm. I could change the patch so there's a 
couch_id_generator where the algorithm is configurable, defaulting to the 
current one, if that would move things along?


 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Robert Newson (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Newson updated COUCHDB-465:
--

Attachment: uuid_generator.patch


I renamed couch_seq_generator to couch_uuid_generator. It supports two 
algorithms; the original random one and the new random+sequential. It defaults 
to random.

To configure you need a new ini block;

[uuid]
algorithm=(random|sequence)



 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch, uuid_generator.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Joan Touzet (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743029#action_12743029
 ] 

Joan Touzet commented on COUCHDB-465:
-

This is a great patch, and solves the problem of having to do it in client-side 
logic. +1 from me too!

It looks like Brian's solution above is intended to allow _all_docs to return 
all documents in chronological order, thus getting a time-sorted view for 
free, i.e. without an extra field per document, extra view to maintain and 
update, extra view storage on the disk, etc. I admit I did the same for myself 
;) but it isn't necessarily a consideration for everyone. For example, in a 
replication situation, you'd need to be sure your clocks were well 
synchronized, and that you didn't have collisions in the prefix portion.

Perhaps providing a mechanism to declare your own function to override one of 
the two defaults (random, or rnewson's) would indeed be the best way forward, 
and the wiki could have a HOWTO with a set of small recipes on alternative 
approaches?

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch, uuid_generator.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-422) PUT to _local/doc_id creates document '_local'

2009-08-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743044#action_12743044
 ] 

Paul Joseph Davis commented on COUCHDB-422:
---

Checked this out and I can fairly easily make the patch with a caveat. Is it 
just me or do _local docs not allow attachments?

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked

2009-08-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743047#action_12743047
 ] 

Paul Joseph Davis commented on COUCHDB-461:
---

Benoit, can you comment on how CouchDB is currently breaking the HTTP spec so 
we have a record of it? IIRC, it was something that we would expect chunked 
transfers by default instead of rejecting them but I'd like to have the reason 
written down.

Also, I'd suggest that for attachments we don't use chunked transfers because 
we should never need it. Unless someone can give me a use case that absolutely 
requires receiving chunked attachments I'd vote to remove them and use a 
straight up Content-Lenght.

 attachments - handle correctly chunked encoding and Content-Length, send 
 attachments unchunked
 --

 Key: COUCHDB-461
 URL: https://issues.apache.org/jira/browse/COUCHDB-461
 Project: CouchDB
  Issue Type: Bug
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: attachments_get_length.diff, 
 attachments_put_length.diff, chunked.diff, chunked2.diff


 This patch allow couchdb to send attachments unchunked, instead, 
 Content-Length is fixed and content is streamed. It also fix attachments PUT 
 by detecting first if encoding is chunked then test the length witch is the 
 standard way to do it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743049#action_12743049
 ] 

Paul Joseph Davis commented on COUCHDB-465:
---

Just to throw something out in the interest of complicating things, should we 
consider a query string override the configured default algorithm as well?

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch, uuid_generator.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-08-13 Thread Chris Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743063#action_12743063
 ] 

Chris Anderson commented on COUCHDB-194:


Eric,

Patches are totally welcome on this. _by_seq probably didn't get much attention 
lately, as I think it's still deprecated in favor of _changes.

It's worth noting that docids are not collated with ICU when they are in the 
_all_docs view, so there are some places the collation rules can differ.

if you are able to prepare a test case illustrating the change you'd like, it 
probably won't be hard to find something to finish the patch.



 [startkey, endkey[: provide a right-open range selection method
 ---

 Key: COUCHDB-194
 URL: https://issues.apache.org/jira/browse/COUCHDB-194
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Maximillian Dornseif
Priority: Blocker
 Fix For: 0.10


 While writing something about using CouchDB I came across the issue of slice 
 indexes (called startkey and endkey in CouchDB lingo).
 I found no exact definition of startkey and endkey anywhere in the 
 documentation. Testing reveals that access on _all_docs and on views 
 documents are retuned in the interval
 [startkey, endkey] = (startkey = k = endkey).
 I don't know if this was a conscious design decision. But I like to promote a 
 slightly different interpretation (and thus API change):
 [startkey, endkey[ = (startkey = k  endkey).
 Both approaches are valid and used in the real world. Ruby uses the inclusive 
 (right-closed in math speak) first approach:
  l = [1,2,3,4]
  l.slice(1,2)
 = [2, 3]
 Python uses the exclusive (right-open in math speak) second approach:
  l = [1,2,3,4]
  l[1,2]
 [2]
 For array indices both work fine and which one to prefer is mostly an issue 
 of habit. In spoken language both approaches are used: Have the Software 
 done until saturday probably means right-open to the client and right-closed 
 to the coder.
 But if you are working with keys that are more than array indexes, then 
 right-open is much easier to handle. That is because you have to *guess* the 
 biggest value you want to get. The Wiki at 
 http://wiki.apache.org/couchdb/View_collation contains an example of that 
 problem:
 It is suggested that you use
 startkey=_design/endkey=_design/Z
 or
 startkey=_design/endkey=_design/\u″
 to get a list of all design documents - also the replication system in the db 
 core uses the same hack.
 This breaks if a design document is named ZTop or 
 \Iñtërnâtiônàlizætiøn. Such names might be unlikely but we are computer 
 scientists; unlikely is a bad approach to software engineering.
 The think what we really want to ask CouchDB is to get all documents with 
 keys starting with '_design/'.
 This is basically impossible to do with right-closed intervals. We could use 
 startkey=_design/endkey=_design0″ ('0′ is the ASCII character after '/') 
 and this will work fine ... until there is actually a document with the key 
 _design0″ in the system. Unlikely, but ...
 To make selection by intervals reliable currently clients have to guess the 
 last key (the  approach) or use the fist key not to include (the _design0 
 approach) and then post process the result to remove the last element 
 returned if it exactly matches the given endkey value.
 If couchdb would change to a right-open interval approach post processing 
 would go away in most cases. See 
 http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
  for two real world examples.
 At least for string keys and float keys changing the meaning to [startkey, 
 endkey[ would allow selections like
 * all strings starting with 'abc'
 * all numbers between 10.5 and 11
 It also would hopefully break not to much existing code. Since the notion of 
 endkey seems to be already considered fishy (see the Z approach) most 
 code seems to try to avoid that issue. For example 
 'startkey=_design/endkey=_design/Z' still would work unless you 
 have a design document being named exactly Z.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked

2009-08-13 Thread Benoit Chesneau (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743064#action_12743064
 ] 

Benoit Chesneau commented on COUCHDB-461:
-

well according the spec :

All HTTP/1.1 applications MUST be able to receive and decode the chunked 
transfer-coding, and MUST ignore chunk-extension extensions they do not 
understand. 

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html

Also for the order chunk then content-length, according the spec : 

Messages MUST NOT include both a Content-Length header field and a non-identity 
transfer-coding. If the message does include a non- identity transfer-coding, 
the Content-Length MUST be ignored. 

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

I think chunked encoding is still good for some purpose, it allows the client 
to set buffer dynamically, and ignore bad chunks which could save some 
bandwidth/time .

 attachments - handle correctly chunked encoding and Content-Length, send 
 attachments unchunked
 --

 Key: COUCHDB-461
 URL: https://issues.apache.org/jira/browse/COUCHDB-461
 Project: CouchDB
  Issue Type: Bug
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: attachments_get_length.diff, 
 attachments_put_length.diff, chunked.diff, chunked2.diff


 This patch allow couchdb to send attachments unchunked, instead, 
 Content-Length is fixed and content is streamed. It also fix attachments PUT 
 by detecting first if encoding is chunked then test the length witch is the 
 standard way to do it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-427) Trying to replicate a database from an old format increase cpu usage up to 100%

2009-08-13 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743068#action_12743068
 ] 

Adam Kocoloski commented on COUCHDB-427:


So we identified the document that has duplicate attachments as

http://benoitc.im/b/7cb06e5de28327c7fc81c7028bece5a3

and indeed it does have three attachments with the same name. I'm not sure what 
the next step is here.  We certainly don't want replication to break, but this 
seems like such an edge case that I'm not sure it's worth putting in special 
code in the replicator to deal with it.

Damien, the check_dup_atts code is your stuff, right?  Do you know how Benoit 
could've ended up with 3 identically-named attachments in the past?

 Trying to replicate a database from an old format increase cpu usage up to 
 100%
 ---

 Key: COUCHDB-427
 URL: https://issues.apache.org/jira/browse/COUCHDB-427
 Project: CouchDB
  Issue Type: Bug
 Environment: osx, ubuntu, openbsd
Reporter: Benoit Chesneau
Priority: Critical
 Fix For: 0.10

 Attachments: duplicate_attachments_crash.txt


 When you try to replicate a database from an old version of couchdb to 
 latest, the cpu usage increase up to 100% and more instead of just hanging.
 You can try to replicate from http://benoitc.im/b to latest trunk or 0.9.1 to 
 replicate this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.