date:20090813


 [ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Newson updated COUCHDB-465:
--

Attachment: sequence_id.patch

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (COUCHDB-465) Produce sequential, but unique, document id's

Produce sequential, but unique, document id's
-

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch

Currently, if the client does not specify an id (POST'ing a single document or 
using _bulk_docs) a random 16 byte value is created. This kind of key is 
particularly brutal on b+tree updates and the append-only nature of couchdb 
files.

Attached is a patch to change this to a two-part identifier. The first part is 
a random 12 byte value and the remainder is a counter. The random prefix is 
rerandomized when the counter reaches its maximum. The rollover in the patch is 
at 16 million but can obviously be changed. The upshot is that the b+tree is 
updated in a better fashion, which should lead to performance benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-449) Turn off delayed commits by default

[
https://issues.apache.org/jira/browse/COUCHDB-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam Kocoloski updated COUCHDB-449:
---

Attachment: delayed_commits_v1.patch

Here's a patch to make delayed_commits a server-wide config option. The
setting looks like

[couchdb]
delayed_commits = true

and defaults to false. If finer-grained control is required users can override
the default by setting the X-Couch-Full-Commit header to true or false.

Jan mentioned enabling delayed_commits for the test suite. I didn't do this.

Turn off delayed commits by default
---

Key: COUCHDB-449
URL: https://issues.apache.org/jira/browse/COUCHDB-449
Project: CouchDB
Issue Type: Bug
Components: Database Core
Affects Versions: 0.9, 0.9.1
Reporter: Jan Lehnardt
Priority: Blocker
Fix For: 0.10

Attachments: delayed_commits_v1.patch

Delayed commits make CouchDB significantly faster. They also open a one
second window for data loss. In 0.9 and trunk, delayed commits are enabled by
default and can be overridden with HTTP headers and an explicit API call to
flush the write buffer. I suggest to turn off delayed commits by default and
use the same overrides to enable it per request. A per-database option is
possible, too.
One concern is developer workflow speed. The setting affects the test suite
performance significantly. I'd opt to change couch.js to set the appropriate
header to enable delayed commits for tests.
CouchDB should guarantee data safety first and speed second, with sensible
overrides.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (COUCHDB-466) couchdb oauth doesn't work behind reverse proxy

couchdb oauth doesn't work behind reverse proxy
---

 Key: COUCHDB-466
 URL: https://issues.apache.org/jira/browse/COUCHDB-466
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Benoit Chesneau
 Fix For: 0.10
 Attachments: x_forwarded_host.diff

Currently oauth doesn't work behind a reverse proxy because signature is based 
on Host. Reverse proxy like apache, lighttpd pass to the proxied server some 
header that help him to know which host is forwared. Apache send 
X-Forwarded-For, Lighttpd X-Host, 

Patch attached fix this issue by testing if a custom forwarded host header is 
present and use it as Host. If it isn't present it will use Host header of 
fallback on socket detection like it is currently. All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-466) couchdb oauth doesn't work behind reverse proxy


 [ 
https://issues.apache.org/jira/browse/COUCHDB-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Chesneau updated COUCHDB-466:


Attachment: x_forwarded_host.diff

 couchdb oauth doesn't work behind reverse proxy
 ---

 Key: COUCHDB-466
 URL: https://issues.apache.org/jira/browse/COUCHDB-466
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: x_forwarded_host.diff


 Currently oauth doesn't work behind a reverse proxy because signature is 
 based on Host. Reverse proxy like apache, lighttpd pass to the proxied server 
 some header that help him to know which host is forwared. Apache send 
 X-Forwarded-For, Lighttpd X-Host, 
 Patch attached fix this issue by testing if a custom forwarded host header is 
 present and use it as Host. If it isn't present it will use Host header of 
 fallback on socket detection like it is currently. All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's


[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742907#action_12742907
 ] 

Adam Kocoloski commented on COUCHDB-465:


Nice work, Robert!  I'm +1 on this patch.

One concern is the guessability of IDs, but if users are really concerned about 
that they can always generate their own.

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

[
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742911#action_12742911
]

Robert Newson commented on COUCHDB-465:
---

Thanks!

Guessability is a concern, which means this might need to be switchable.
Perhaps couch_seq_generator becomes couch_id_generator and an ini file chooses
between the two strategies, defaulting to the safest, but worst-case, new_uuid
behavior. To get good keys for b+tree insertion necessarily makes them more
guessable as they'd have to be close to existing keys by design.

I do owe some quantitative benchmarking to support the assertions in the
description. I did a 10k insertion test with a small document, {content:
hello}, and average insertion rate per document was 2ms with random and 1ms
with the patch. This was more to prove that I'd changed *something* rather than
a measure of the actual improvement. I would expect to see improved insertion
rates across a lot of scenarios, less difference between uncompacted and
compacted size (barring document updates and deletes) as less of the b+tree is
rewritten, and a smaller post-compaction size vs random. The exact extent of
these improvements should be established by a decent benchmark.

Produce sequential, but unique, document id's
-

Key: COUCHDB-465
URL: https://issues.apache.org/jira/browse/COUCHDB-465
Project: CouchDB
Issue Type: Improvement
Reporter: Robert Newson
Attachments: sequence_id.patch

Currently, if the client does not specify an id (POST'ing a single document
or using _bulk_docs) a random 16 byte value is created. This kind of key is
particularly brutal on b+tree updates and the append-only nature of couchdb
files.
Attached is a patch to change this to a two-part identifier. The first part
is a random 12 byte value and the remainder is a counter. The random prefix
is rerandomized when the counter reaches its maximum. The rollover in the
patch is at 16 million but can obviously be changed. The upshot is that the
b+tree is updated in a better fashion, which should lead to performance
benefits.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-296) A built-in conflicts view

2009-08-13 Thread Zachary Zolton (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742923#action_12742923
 ] 

Zachary Zolton commented on COUCHDB-296:


I think this is a duplicate of COUCHDB-462 track conflict count in db_info 
(was built-in conflicts view).

So, we should probably close this out, right?

 A built-in conflicts view
 -

 Key: COUCHDB-296
 URL: https://issues.apache.org/jira/browse/COUCHDB-296
 Project: CouchDB
  Issue Type: New Feature
  Components: Database Core
Affects Versions: 0.9
Reporter: Dirkjan Ochtman
Priority: Minor
 Fix For: 0.10


 It would be great if CouchDB came with a built-in db/_conflicts view. It 
 could have code like the current test/view_conflicts.js.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-422) PUT to _local/doc_id creates document '_local'


[ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742939#action_12742939
 ] 

Adam Kocoloski commented on COUCHDB-422:


Thanks for this report, Eric.  I think _local docs should behave like _design 
docs; that is, we should accept a PUT to _local/foo just fine.

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (COUCHDB-422) PUT to _local/doc_id creates document '_local'


 [ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski reassigned COUCHDB-422:
--

Assignee: Paul Joseph Davis

Paul, can you check into this sometime?

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (COUCHDB-419) Replication HTTP requests lack User-Agent and Accept headers


 [ 
https://issues.apache.org/jira/browse/COUCHDB-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski closed COUCHDB-419.
--

   Resolution: Fixed
Fix Version/s: 0.10

Thanks Henri, CouchDB should now be sending these headers with replication 
requests.



 Replication HTTP requests lack User-Agent and Accept headers
 

 Key: COUCHDB-419
 URL: https://issues.apache.org/jira/browse/COUCHDB-419
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Henri Bergius
Assignee: Adam Kocoloski
 Fix For: 0.10


 Currently when making replication HTTP requests to a remote CouchDB instance, 
 CouchDB makes the HTTP requests anonymously, without providing any 
 information that it is a CouchDB instance and that it wants to receive JSON.
 Examples of what this could be:
 User-Agent: CouchDB/0.9.0
 Accept: application/json
 See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
 It would be good to add these headers to HTTP requests made by CouchDB, as it 
 makes it easier for other systems like Midgard to support the replication 
 protocol:
 http://bergie.iki.fi/blog/couchdb_and_midgard_talking_with_each_other/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-427) Trying to replicate a database from an old format increase cpu usage up to 100%


 [ 
https://issues.apache.org/jira/browse/COUCHDB-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski updated COUCHDB-427:
---

Attachment: duplicate_attachments_crash.txt

Confirmed that this is still a problem on trunk.  I'm attaching the full crash 
dump, but the key stacktrace is

[error] [0.69.0] ** Generic server 0.69.0 terminating 
** Last message in was {'EXIT',0.89.0,
   {{nocatch,
{bad_request,Duplicate attachments}},
[{couch_db,check_dup_atts,1},
 {couch_db,sort_and_check_atts,1},
 {couch_db,'-update_docs/4-lc$^3/1-3-',2},
 {couch_db,'-update_docs/4-lc$^2/1-2-',2},
 {couch_db,'-update_docs/4-lc$^2/1-2-',2},
 {couch_db,update_docs,4},
 {couch_rep_writer,writer_loop,3}]}}

 Trying to replicate a database from an old format increase cpu usage up to 
 100%
 ---

 Key: COUCHDB-427
 URL: https://issues.apache.org/jira/browse/COUCHDB-427
 Project: CouchDB
  Issue Type: Bug
 Environment: osx, ubuntu, openbsd
Reporter: Benoit Chesneau
Priority: Critical
 Fix For: 0.10

 Attachments: duplicate_attachments_crash.txt


 When you try to replicate a database from an old version of couchdb to 
 latest, the cpu usage increase up to 100% and more instead of just hanging.
 You can try to replicate from http://benoitc.im/b to latest trunk or 0.9.1 to 
 replicate this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked

[
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benoit Chesneau updated COUCHDB-461:

Attachment: attachments_put_length.diff
attachments_get_length.diff

I splitted previous patch to make it easier. First patch allow couchdb to send
attachements (GET) whithout chunked encoding. Content-Length is fixed.

Second patch improve the handling of PUT attachments and follow the standard.
If transfert encoding is fixed, couchdb wil first look at it and if chunked use
it, If no Transfer-Encoding header, Content-Length is used. With this patch
CouchDB now react like current HTTP servers which fix problems with some client
that give error 500 on couchdb behind a proxy like Apache server.

attachments - handle correctly chunked encoding and Content-Length, send
attachments unchunked
--

Key: COUCHDB-461
URL: https://issues.apache.org/jira/browse/COUCHDB-461
Project: CouchDB
Issue Type: Bug
Reporter: Benoit Chesneau
Fix For: 0.10

Attachments: attachments_get_length.diff,
attachments_put_length.diff, chunked.diff, chunked2.diff

This patch allow couchdb to send attachments unchunked, instead,
Content-Length is fixed and content is streamed. It also fix attachments PUT
by detecting first if encoding is chunked then test the length witch is the
standard way to do it.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: POST with _id

2009-08-13 Thread Chris Anderson

On Thu, Aug 13, 2009 at 12:21 AM, Kevin Jacksonfoamd...@gmail.com wrote:
 Another +1 here too - that has bitten me before...

 +1 from me, we've also hit that one


Is there a Jira ticket open for this? I can easily imagine this thread
being lost to the sands of time.




-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: [jira] Created: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Brian Candler

On Thu, Aug 13, 2009 at 09:06:14AM -0700, Robert Newson (JIRA) wrote:
 Attached is a patch to change this to a two-part identifier. The first
 part is a random 12 byte value and the remainder is a counter. The random
 prefix is rerandomized when the counter reaches its maximum. The rollover
 in the patch is at 16 million but can obviously be changed. The upshot is
 that the b+tree is updated in a better fashion, which should lead to
 performance benefits.

I'd like to suggest an alternative algorithm for consideration.

- first 48 bits of the UUID is the time, in milliseconds, since 1 Jan 1970

- remaining 80 bits starts as a random value and increments from there,
  for example when doing a _bulk_docs insert (*)

I have been using this algorithm for a while, generated client-side - it's
in my 'couchtiny' ruby client.

I did it this way so as to get monotonically-increasing doc ids; a view with
equal keys will sort them in order of insertion into the DB. It also avoids
having to keep a separate created_at timestamp field, because you can just
get it from the id.

  def created_at
Time.at(id[0,12].to_i(16) / 1000.0) rescue nil
  end

Of course, the fact I generate uids like this demonstrates that there's no
one-size-fits-all solution, but I just thought it was worth mentioning
because you should get the B-tree insertion boost as a side-effect too.

Regards,

Brian.

(*) It's your choice whether you want to re-randomize this when the next
millisecond comes along, or just leave it to increment as a serial number.
Even if you have multiple servers inserting documents into the same
database, the chances of them using the same serial number within the same
millisecond are infinitessimal, as long as they all start from an
independent random point within the 2^80 possibilities.

Wrapping would be very rare, but what I currently do is re-randomize for
each bulk insert, and choose a starting random value which is more than 2^32
away from the ceiling.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Brian Candler (JIRA)

[
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742970#action_12742970
]

Brian Candler commented on COUCHDB-465:
---

I'd like to suggest an alternative algorithm for consideration.

- first 48 bits of the UUID is the time, in milliseconds, since 1 Jan 1970

- remaining 80 bits starts as a random value and increments from there,
for example when doing a _bulk_docs insert (*)

I have been using this algorithm for a while, generated client-side - it's
in my 'couchtiny' ruby client.

I did it this way so as to get monotonically-increasing doc ids; a view with
equal keys will sort them in order of insertion into the DB. It also avoids
having to keep a separate created_at timestamp field, because you can just
get it from the id.

def created_at
Time.at(id[0,12].to_i(16) / 1000.0) rescue nil
end

Of course, the fact I generate uids like this demonstrates that there's no
one-size-fits-all solution, but I just thought it was worth mentioning
because you should get the B-tree insertion boost as a side-effect too.

Regards,

Brian.

(*) It's your choice whether you want to re-randomize this when the next
millisecond comes along, or just leave it to increment as a serial number.
Even if you have multiple servers inserting documents into the same
database, the chances of them using the same serial number within the same
millisecond are infinitessimal, as long as they all start from an
independent random point within the 2^80 possibilities.

Wrapping would be very rare, but what I currently do is re-randomize for
each bulk insert, and choose a starting random value which is more than 2^32
away from the ceiling.

Produce sequential, but unique, document id's
-

Key: COUCHDB-465
URL: https://issues.apache.org/jira/browse/COUCHDB-465
Project: CouchDB
Issue Type: Improvement
Reporter: Robert Newson
Attachments: sequence_id.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's


[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742982#action_12742982
 ] 

Robert Newson commented on COUCHDB-465:
---


Another interesting algorithm. I could change the patch so there's a 
couch_id_generator where the algorithm is configurable, defaulting to the 
current one, if that would move things along?


 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-465) Produce sequential, but unique, document id's


 [ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Newson updated COUCHDB-465:
--

Attachment: uuid_generator.patch


I renamed couch_seq_generator to couch_uuid_generator. It supports two 
algorithms; the original random one and the new random+sequential. It defaults 
to random.

To configure you need a new ini block;

[uuid]
algorithm=(random|sequence)



 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch, uuid_generator.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Joan Touzet (JIRA)

[
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743029#action_12743029
]

Joan Touzet commented on COUCHDB-465:
-

This is a great patch, and solves the problem of having to do it in client-side
logic. +1 from me too!

It looks like Brian's solution above is intended to allow _all_docs to return
all documents in chronological order, thus getting a time-sorted view for
free, i.e. without an extra field per document, extra view to maintain and
update, extra view storage on the disk, etc. I admit I did the same for myself
;) but it isn't necessarily a consideration for everyone. For example, in a
replication situation, you'd need to be sure your clocks were well
synchronized, and that you didn't have collisions in the prefix portion.

Perhaps providing a mechanism to declare your own function to override one of
the two defaults (random, or rnewson's) would indeed be the best way forward,
and the wiki could have a HOWTO with a set of small recipes on alternative
approaches?

Produce sequential, but unique, document id's
-

Key: COUCHDB-465
URL: https://issues.apache.org/jira/browse/COUCHDB-465
Project: CouchDB
Issue Type: Improvement
Reporter: Robert Newson
Attachments: sequence_id.patch, uuid_generator.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-422) PUT to _local/doc_id creates document '_local'

2009-08-13 Thread Paul Joseph Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743044#action_12743044
 ] 

Paul Joseph Davis commented on COUCHDB-422:
---

Checked this out and I can fairly easily make the patch with a caveat. Is it 
just me or do _local docs not allow attachments?

 PUT to _local/doc_id creates document '_local'
 --

 Key: COUCHDB-422
 URL: https://issues.apache.org/jira/browse/COUCHDB-422
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.10
 Environment: Ubuntu 9.04
Reporter: eric casteleijn
Assignee: Paul Joseph Davis
Priority: Minor
 Fix For: 0.10


 After davisp's revision r796246 doing a put to a document id like 
 '_local/doc_id' results in a visible, and apparently non-local document with 
 id '_local'. When escaping the slash as '%2F' everything works as before, and 
 expected, i.e. a local document with the above id is created.
 To test:
 curl -X PUT -d '{foo: bar}' http://127.0.0.1:5987/db1/_local/yokal
 result:
 {ok:true,id:_local,rev:1-770307fe8d4210bab8ec65c59983e03c}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked

2009-08-13 Thread Paul Joseph Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743047#action_12743047
 ] 

Paul Joseph Davis commented on COUCHDB-461:
---

Benoit, can you comment on how CouchDB is currently breaking the HTTP spec so 
we have a record of it? IIRC, it was something that we would expect chunked 
transfers by default instead of rejecting them but I'd like to have the reason 
written down.

Also, I'd suggest that for attachments we don't use chunked transfers because 
we should never need it. Unless someone can give me a use case that absolutely 
requires receiving chunked attachments I'd vote to remove them and use a 
straight up Content-Lenght.

 attachments - handle correctly chunked encoding and Content-Length, send 
 attachments unchunked
 --

 Key: COUCHDB-461
 URL: https://issues.apache.org/jira/browse/COUCHDB-461
 Project: CouchDB
  Issue Type: Bug
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: attachments_get_length.diff, 
 attachments_put_length.diff, chunked.diff, chunked2.diff


 This patch allow couchdb to send attachments unchunked, instead, 
 Content-Length is fixed and content is streamed. It also fix attachments PUT 
 by detecting first if encoding is chunked then test the length witch is the 
 standard way to do it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's

2009-08-13 Thread Paul Joseph Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743049#action_12743049
 ] 

Paul Joseph Davis commented on COUCHDB-465:
---

Just to throw something out in the interest of complicating things, should we 
consider a query string override the configured default algorithm as well?

 Produce sequential, but unique, document id's
 -

 Key: COUCHDB-465
 URL: https://issues.apache.org/jira/browse/COUCHDB-465
 Project: CouchDB
  Issue Type: Improvement
Reporter: Robert Newson
 Attachments: sequence_id.patch, uuid_generator.patch


 Currently, if the client does not specify an id (POST'ing a single document 
 or using _bulk_docs) a random 16 byte value is created. This kind of key is 
 particularly brutal on b+tree updates and the append-only nature of couchdb 
 files.
 Attached is a patch to change this to a two-part identifier. The first part 
 is a random 12 byte value and the remainder is a counter. The random prefix 
 is rerandomized when the counter reaches its maximum. The rollover in the 
 patch is at 16 million but can obviously be changed. The upshot is that the 
 b+tree is updated in a better fashion, which should lead to performance 
 benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-08-13 Thread Chris Anderson (JIRA)

[
https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743063#action_12743063
]

Chris Anderson commented on COUCHDB-194:

Eric,

Patches are totally welcome on this. _by_seq probably didn't get much attention
lately, as I think it's still deprecated in favor of _changes.

It's worth noting that docids are not collated with ICU when they are in the
_all_docs view, so there are some places the collation rules can differ.

if you are able to prepare a test case illustrating the change you'd like, it
probably won't be hard to find something to finish the patch.

[startkey, endkey[: provide a right-open range selection method
---

Key: COUCHDB-194
URL: https://issues.apache.org/jira/browse/COUCHDB-194
Project: CouchDB
Issue Type: Improvement
Components: HTTP Interface
Affects Versions: 0.10
Reporter: Maximillian Dornseif
Priority: Blocker
Fix For: 0.10

While writing something about using CouchDB I came across the issue of slice
indexes (called startkey and endkey in CouchDB lingo).
I found no exact definition of startkey and endkey anywhere in the
documentation. Testing reveals that access on _all_docs and on views
documents are retuned in the interval
[startkey, endkey] = (startkey = k = endkey).
I don't know if this was a conscious design decision. But I like to promote a
slightly different interpretation (and thus API change):
[startkey, endkey[ = (startkey = k endkey).
Both approaches are valid and used in the real world. Ruby uses the inclusive
(right-closed in math speak) first approach:
l = [1,2,3,4]
l.slice(1,2)
= [2, 3]
Python uses the exclusive (right-open in math speak) second approach:
l = [1,2,3,4]
l[1,2]
[2]
For array indices both work fine and which one to prefer is mostly an issue
of habit. In spoken language both approaches are used: Have the Software
done until saturday probably means right-open to the client and right-closed
to the coder.
But if you are working with keys that are more than array indexes, then
right-open is much easier to handle. That is because you have to *guess* the
biggest value you want to get. The Wiki at
http://wiki.apache.org/couchdb/View_collation contains an example of that
problem:
It is suggested that you use
startkey=_design/endkey=_design/Z
or
startkey=_design/endkey=_design/\u″
to get a list of all design documents - also the replication system in the db
core uses the same hack.
This breaks if a design document is named ZTop or
\Iñtërnâtiônàlizætiøn. Such names might be unlikely but we are computer
scientists; unlikely is a bad approach to software engineering.
The think what we really want to ask CouchDB is to get all documents with
keys starting with '_design/'.
This is basically impossible to do with right-closed intervals. We could use
startkey=_design/endkey=_design0″ ('0′ is the ASCII character after '/')
and this will work fine ... until there is actually a document with the key
_design0″ in the system. Unlikely, but ...
To make selection by intervals reliable currently clients have to guess the
last key (the approach) or use the fist key not to include (the _design0
approach) and then post process the result to remove the last element
returned if it exactly matches the given endkey value.
If couchdb would change to a right-open interval approach post processing
would go away in most cases. See
http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
for two real world examples.
At least for string keys and float keys changing the meaning to [startkey,
endkey[ would allow selections like
* all strings starting with 'abc'
* all numbers between 10.5 and 11
It also would hopefully break not to much existing code. Since the notion of
endkey seems to be already considered fishy (see the Z approach) most
code seems to try to avoid that issue. For example
'startkey=_design/endkey=_design/Z' still would work unless you
have a design document being named exactly Z.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-461) attachments - handle correctly chunked encoding and Content-Length, send attachments unchunked


[ 
https://issues.apache.org/jira/browse/COUCHDB-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743064#action_12743064
 ] 

Benoit Chesneau commented on COUCHDB-461:
-

well according the spec :

All HTTP/1.1 applications MUST be able to receive and decode the chunked 
transfer-coding, and MUST ignore chunk-extension extensions they do not 
understand. 

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html

Also for the order chunk then content-length, according the spec : 

Messages MUST NOT include both a Content-Length header field and a non-identity 
transfer-coding. If the message does include a non- identity transfer-coding, 
the Content-Length MUST be ignored. 

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

I think chunked encoding is still good for some purpose, it allows the client 
to set buffer dynamically, and ignore bad chunks which could save some 
bandwidth/time .

 attachments - handle correctly chunked encoding and Content-Length, send 
 attachments unchunked
 --

 Key: COUCHDB-461
 URL: https://issues.apache.org/jira/browse/COUCHDB-461
 Project: CouchDB
  Issue Type: Bug
Reporter: Benoit Chesneau
 Fix For: 0.10

 Attachments: attachments_get_length.diff, 
 attachments_put_length.diff, chunked.diff, chunked2.diff


 This patch allow couchdb to send attachments unchunked, instead, 
 Content-Length is fixed and content is streamed. It also fix attachments PUT 
 by detecting first if encoding is chunked then test the length witch is the 
 standard way to do it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-427) Trying to replicate a database from an old format increase cpu usage up to 100%