Re: Advice on policy merging non-committer branches

2012-02-28 Thread Paul Davis
On Tue, Feb 28, 2012 at 9:57 PM, Jason Smith  wrote:
> I would like to merge a branch from a non-committer[1]. The log shows
> a non-apache author, but an apache committer.
>
> What is the policy regarding this? I was thinking the following:
>
> 1. Merge freely and promiscuously from anybody in my GitHub (or
> whatever) repo (community engagement)

Not quite. More below.

> 2. As the branch nears time for "promotion," ask the non-committer to
> git format-patch and attach to JIRA, signing (checking) the license
> transfer.

Unnecessary.

> 3. With that settled, either git rebase or `git am` (I'm unclear about
> this). The point is, get an @apache.org committer id on each commit.

Unnecessary.

> 4. Push where appropriate into the ASF repo
>

Included in discussion of 1 below.

> Questions:
>
> Must the non-committer attach the exact same commit id? Or is it
> sufficient that it merely be the same diff (delta)? (I changed the ID
> when I rebased his commit and added my email to the committer header.)
>

No. Commit SHA's are in no way important from a license perspective.

> Before the JIRA license agreement, may we push non-committers' code to
> the repo at all?
>

Kinda, see below.

> Before the JIRA license agreement, may we push non-committers' code to
> the more official branches: master, 1.2.x, etc.?
>
> May we push whatever we want so long as the license agreement is
> signed (checked) before voting on a release artifact?

For the last two questions, definitely not. Never push code to ASF
hardware that you're not 100% certain is OK to be in the repository.
That doesn't necessarily mean that it has to have the ASF license
attached, but if you don't know that it can be in the repo, don't push
it.

First things first, as a committer you have to remember the ICLA that
you signed. Its your responsibility to make sure that all code you
push to the repository is compliant with ASF policies and the legal
aspects those entails.

Before Git, the general policy we used in CouchDB was to request that
non-trivial patches be submitted to JIRA and have people click the
checkbox. While this captures the general intent of things, it has
been declared an official position of the board that this is
unnecessary for accepting contributions. It has also been decided that
the committer and author fields do not have to be tied to specific
Apache accounts.

The policy as it stands now is that we must be able to demonstrate
that there was a clear intent for the code in question to be
contributed. While there hasn't been an official position on how to
demonstrate intent I think there are a couple things that are fairly
obvious:

Traditional:

1. Same as always: Anything submitted to JIRA. The check box has been
declared not a necessity though I think the input field is required,
and if someone said "not-intended for inclusion" we should just
clarify if that was an accident or not.

2. Patches submitted to a mailing list.

New with Git:

3. If someone posts a link to a publicly available Git branch with
language indicating their intent for it to be included, then we should
feel free to add the repo as a remote and yank it in. While not
absolutely necessary, it might be a good idea to rewrite the commit
message to reference either the email or the original contributed
commit sha (in case of a rebase) so that we can link the two.

4. Jukka Zitting has recently been doing work on connecting GitHub
Pull Requests to the dev@ mailing lists. Assuming this is the case I
think we should feel free to take any code submitted in this manner.
Thus our old "Submit that to JIRA" would be a "Send us a Pull
Request".

In contrast, we shouldn't feel free to just find code in a random
GitHub fork and push that onto ASF hardware. If there's something we
see that we want then we should ask for clarification (plus that's
only polite).

Bottom line, as a committer you're responsible for the code that you
push to the repository. If you're not sure on a specific patch or
situation, bring it up to dev@ or similar venue and we can run it up
the flag pole until we find an answer.


[jira] [Reopened] (COUCHDB-1416) the requested_path that is passed to a show is wrong on a vhost with a path

2012-02-28 Thread Adam Kocoloski (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kocoloski reopened COUCHDB-1416:
-


+JasonSmith: Actually, would anybody who is able please reopen COUCHDB-1416 so 
that Ryan can add an attachment. Thanks!

> the requested_path that is passed to a show is wrong on a vhost with a path 
> 
>
> Key: COUCHDB-1416
> URL: https://issues.apache.org/jira/browse/COUCHDB-1416
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 1.2
>Reporter: Ryan Ramage
>Priority: Minor
> Attachments: 
> A_0001-Testing-requested_path-for-various-combinations-of-r.patch, 
> A_0002-Compatibility-with-the-CLI-test-runner.patch, 
> A_0003-Store-the-entire-requested-path-in-x-couchdb-vhost-f.patch, 
> A_0004-For-a-vhost-correctly-reflect-true-requested-path.patch
>
>
> In a show or list, it is impossible to construct a full url that an end user 
> could use to re-request the resource, given the various combinations of 
> vhosts and rewrites. 
> The major one is if the vhost contains a path component, this path 
> information is not passed to the show at all. 
> I have created three tests that highlight the condition, currently failing 
> for one test, with the two passing to prevent regressions.
> The commit can be found here:
> https://github.com/ryanramage/couchdb/commit/e9417480e2ce160f359d9508dcec3d4e56045a60
> I have talked this over with JasonSmith and bennoitc on #couchdb and they 
> asked me to write the tests and raise the jira. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1416) the requested_path that is passed to a show is wrong on a vhost with a path

2012-02-28 Thread Jason Smith (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218890#comment-13218890
 ] 

Jason Smith commented on COUCHDB-1416:
--

@Jan, would you kindly re-open this ticket (I cannot) so that Ryan can add an 
attachment? Thanks.

@Ryan, would you please run this:

git format-patch 47c81f4c25f5f9ec4ef60c4ea638d77118b9a9ee -1

And attach the 0001-Testing-*.patch file to this ticket and click the copyright 
agreement.

> the requested_path that is passed to a show is wrong on a vhost with a path 
> 
>
> Key: COUCHDB-1416
> URL: https://issues.apache.org/jira/browse/COUCHDB-1416
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 1.2
>Reporter: Ryan Ramage
>Priority: Minor
> Attachments: 
> A_0001-Testing-requested_path-for-various-combinations-of-r.patch, 
> A_0002-Compatibility-with-the-CLI-test-runner.patch, 
> A_0003-Store-the-entire-requested-path-in-x-couchdb-vhost-f.patch, 
> A_0004-For-a-vhost-correctly-reflect-true-requested-path.patch
>
>
> In a show or list, it is impossible to construct a full url that an end user 
> could use to re-request the resource, given the various combinations of 
> vhosts and rewrites. 
> The major one is if the vhost contains a path component, this path 
> information is not passed to the show at all. 
> I have created three tests that highlight the condition, currently failing 
> for one test, with the two passing to prevent regressions.
> The commit can be found here:
> https://github.com/ryanramage/couchdb/commit/e9417480e2ce160f359d9508dcec3d4e56045a60
> I have talked this over with JasonSmith and bennoitc on #couchdb and they 
> asked me to write the tests and raise the jira. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Advice on policy merging non-committer branches

2012-02-28 Thread Jason Smith
I would like to merge a branch from a non-committer[1]. The log shows
a non-apache author, but an apache committer.

What is the policy regarding this? I was thinking the following:

1. Merge freely and promiscuously from anybody in my GitHub (or
whatever) repo (community engagement)
2. As the branch nears time for "promotion," ask the non-committer to
git format-patch and attach to JIRA, signing (checking) the license
transfer.
3. With that settled, either git rebase or `git am` (I'm unclear about
this). The point is, get an @apache.org committer id on each commit.
4. Push where appropriate into the ASF repo

Questions:

Must the non-committer attach the exact same commit id? Or is it
sufficient that it merely be the same diff (delta)? (I changed the ID
when I rebased his commit and added my email to the committer header.)

Before the JIRA license agreement, may we push non-committers' code to
the repo at all?

Before the JIRA license agreement, may we push non-committers' code to
the more official branches: master, 1.2.x, etc.?

May we push whatever we want so long as the license agreement is
signed (checked) before voting on a release artifact?

[1]: 
https://github.com/jhs/couchdb/commit/1451ee57f2afdade5b24c3fb4ae37efadf9ef1ed


Re: Please report your indexing speed

2012-02-28 Thread Jan Lehnardt
One more report.

I got suspicious of the rather short runtimes, so I picked the
default_doc and ran it at 500k:

bench_R14B04_1.1.1_default_doc.tpl.log 2m19.139s
bench_R14B04_1.2.x_default_doc.tpl.log 2m18.875s

It seems to me that we need more variation in what we test,
more OSs, larger ddocs, like the one Stefan linked to. Can
anyone help providing this?

Cheers
Jan
-- 


On Feb 29, 2012, at 00:23 , Jan Lehnardt wrote:

> For Robert Newson, avoiding bulk inserts to populate the dbs:
> 
> bench_R14B04_1.1.1_default_doc.tpl.log 0m19.692s
> bench_R14B04_1.2.x_default_doc.tpl.log 0m17.033s
> 
> bench_R14B04_1.1.1_nested_6k.tpl.log   1m31.393s
> bench_R14B04_1.2.x_nested_6k.tpl.log   0m42.010s
> 
> bench_R14B04_1.1.1_small_doc.tpl.log   0m8.103s
> bench_R14B04_1.2.x_small_doc.tpl.log   0m10.597s
> 
> bench_R14B04_1.1.1_wow.tpl.log 0m33.944s
> bench_R14B04_1.2.x_wow.tpl.log 0m27.087s
> 
> (Just R14B04, full logs available on demand)
> 
> Cheers
> Jan
> -- 
> 
> 
> On Feb 28, 2012, at 23:09 , Jan Lehnardt wrote:
> 
>> Same story, but spinning disk, 5400rpm:
>> 
>> bench_R14B04_1.1.1_default_doc.tpl.log 0m19.175s
>> bench_R14B04_1.2.x_default_doc.tpl.log 0m16.821s
>> bench_R15B_1.1.1_default_doc.tpl.log   0m13.050s
>> bench_R15B_1.2.x_default_doc.tpl.log   0m13.292s
>> 
>> bench_R14B04_1.1.1_nested_6k.tpl.log   1m26.941s
>> bench_R14B04_1.2.x_nested_6k.tpl.log   0m39.178s
>> bench_R15B_1.1.1_nested_6k.tpl.log 0m47.766s
>> bench_R15B_1.2.x_nested_6k.tpl.log 0m31.697s
>> 
>> bench_R14B04_1.1.1_small_doc.tpl.log   1m19.851s
>> bench_R14B04_1.2.x_small_doc.tpl.log   1m43.057s
>> bench_R15B_1.1.1_small_doc.tpl.log 0m52.249s
>> bench_R15B_1.2.x_small_doc.tpl.log 1m8.195s
>> 
>> bench_R14B04_1.1.1_wow.tpl.log 0m29.589s
>> bench_R14B04_1.2.x_wow.tpl.log 0m24.867s
>> bench_R15B_1.1.1_wow.tpl.log   0m20.171s
>> bench_R15B_1.2.x_wow.tpl.log   0m18.800s
>> 
>> Full logs at http://jan.prima.de/slow_couch/rust/
>> 
>> Cheers
>> Jan
>> -- 
>> 
>> 
>> On Feb 28, 2012, at 21:22 , Jan Lehnardt wrote:
>> 
>>> 
>>> # tl;dr:
>>> 
>>> bench_R14B04_1.1.1_default_doc.tpl.log 0m18.749s
>>> bench_R14B04_1.2.x_default_doc.tpl.log 0m16.304s
>>> bench_R15B_1.1.1_default_doc.tpl.log   0m12.946s
>>> bench_R15B_1.2.x_default_doc.tpl.log   0m13.616s
>>> 
>>> bench_R14B04_1.1.1_nested_6k.tpl.log   1m27.267s
>>> bench_R14B04_1.2.x_nested_6k.tpl.log   0m37.910s
>>> bench_R15B_1.1.1_nested_6k.tpl.log 0m46.963s
>>> bench_R15B_1.2.x_nested_6k.tpl.log 0m33.011s
>>> 
>>> bench_R14B04_1.1.1_small_doc.tpl.log   1m17.212s
>>> bench_R14B04_1.2.x_small_doc.tpl.log   1m41.383s
>>> bench_R15B_1.1.1_small_doc.tpl.log 0m52.858s
>>> bench_R15B_1.2.x_small_doc.tpl.log 1m9.043s
>>> 
>>> bench_R14B04_1.1.1_wow.tpl.log 0m29.842s
>>> bench_R14B04_1.2.x_wow.tpl.log 0m24.178s
>>> bench_R15B_1.1.1_wow.tpl.log   0m20.493s
>>> bench_R15B_1.2.x_wow.tpl.log   0m19.584s
>>> 
>>> (Full logs at [5])
>>> 
>>> 
>>> # Description
>>> 
>>> All of these are on Mac OS X 10.7.3 on an SSD.
>>> 
>>> I'll be running the same set on spinning disk and then Robert N asked
>>> me to populate the DBs not using builk docs. Since that's gonna take
>>> a while, I'll probably run this overnight.
>>> 
>>> All of the results are generated by my fork of Jason's slow_couchdb[1]
>>> and Filipe's seatoncouch[2].
>>> 
>>> The changes I've made is have the small_doc test run with 500k instead
>>> of 50k docs, added .view files to match the tpl files in
>>> seatoncouch/templates/* so we can have similar views use the different
>>> doc structures.
>>> 
>>> I also added two scripts to orchestrate the above testing in a more
>>> automated fashion. It also allows you to run the full matrix yourself.
>>> All you need is set up homebrew allow `brew switch erlang R14B04` and
>>> R15B (which is controlled in matrix.sh[3]) and have a git checkout of the
>>> CouchDB sources that allow you to do `git checkout 1.1.1` or `1.2.x`
>>> (which is controlled in runner.sh[4], adjust the path to the git checkout
>>> there as well).
>>> 
>>> matrix.sh also allows you to specify which docs to run.
>>> 
>>> Please shout if you need any more info about this test run or how to
>>> run this yourself.
>>> 
>>> 
>>> # Analysis
>>> 
>>> Inconclusive, I'l like to run this on larger dbs in general to see if
>>> there are more differences that shake out and I've yet have to run this
>>> on a spinning disk let alone another OS* or more complex view functions
>>> or larger design docs (like the one Stefan had).
>>> 
>>> * It shouldn't be too much work to port slow_couchdb to other OSs, I'll
>>> definitely be looking into that, but we can do with every bit of help :)
>>> 
>>> So far, I'm happy to conclude that while there are definitely provable
>>> differences, 

Re: Please report your indexing speed

2012-02-28 Thread Jan Lehnardt
For Robert Newson, avoiding bulk inserts to populate the dbs:

bench_R14B04_1.1.1_default_doc.tpl.log 0m19.692s
bench_R14B04_1.2.x_default_doc.tpl.log 0m17.033s

bench_R14B04_1.1.1_nested_6k.tpl.log   1m31.393s
bench_R14B04_1.2.x_nested_6k.tpl.log   0m42.010s

bench_R14B04_1.1.1_small_doc.tpl.log   0m8.103s
bench_R14B04_1.2.x_small_doc.tpl.log   0m10.597s

bench_R14B04_1.1.1_wow.tpl.log 0m33.944s
bench_R14B04_1.2.x_wow.tpl.log 0m27.087s

(Just R14B04, full logs available on demand)

Cheers
Jan
-- 


On Feb 28, 2012, at 23:09 , Jan Lehnardt wrote:

> Same story, but spinning disk, 5400rpm:
> 
> bench_R14B04_1.1.1_default_doc.tpl.log 0m19.175s
> bench_R14B04_1.2.x_default_doc.tpl.log 0m16.821s
> bench_R15B_1.1.1_default_doc.tpl.log   0m13.050s
> bench_R15B_1.2.x_default_doc.tpl.log   0m13.292s
> 
> bench_R14B04_1.1.1_nested_6k.tpl.log   1m26.941s
> bench_R14B04_1.2.x_nested_6k.tpl.log   0m39.178s
> bench_R15B_1.1.1_nested_6k.tpl.log 0m47.766s
> bench_R15B_1.2.x_nested_6k.tpl.log 0m31.697s
> 
> bench_R14B04_1.1.1_small_doc.tpl.log   1m19.851s
> bench_R14B04_1.2.x_small_doc.tpl.log   1m43.057s
> bench_R15B_1.1.1_small_doc.tpl.log 0m52.249s
> bench_R15B_1.2.x_small_doc.tpl.log 1m8.195s
> 
> bench_R14B04_1.1.1_wow.tpl.log 0m29.589s
> bench_R14B04_1.2.x_wow.tpl.log 0m24.867s
> bench_R15B_1.1.1_wow.tpl.log   0m20.171s
> bench_R15B_1.2.x_wow.tpl.log   0m18.800s
> 
> Full logs at http://jan.prima.de/slow_couch/rust/
> 
> Cheers
> Jan
> -- 
> 
> 
> On Feb 28, 2012, at 21:22 , Jan Lehnardt wrote:
> 
>> 
>> # tl;dr:
>> 
>> bench_R14B04_1.1.1_default_doc.tpl.log 0m18.749s
>> bench_R14B04_1.2.x_default_doc.tpl.log 0m16.304s
>> bench_R15B_1.1.1_default_doc.tpl.log   0m12.946s
>> bench_R15B_1.2.x_default_doc.tpl.log   0m13.616s
>> 
>> bench_R14B04_1.1.1_nested_6k.tpl.log   1m27.267s
>> bench_R14B04_1.2.x_nested_6k.tpl.log   0m37.910s
>> bench_R15B_1.1.1_nested_6k.tpl.log 0m46.963s
>> bench_R15B_1.2.x_nested_6k.tpl.log 0m33.011s
>> 
>> bench_R14B04_1.1.1_small_doc.tpl.log   1m17.212s
>> bench_R14B04_1.2.x_small_doc.tpl.log   1m41.383s
>> bench_R15B_1.1.1_small_doc.tpl.log 0m52.858s
>> bench_R15B_1.2.x_small_doc.tpl.log 1m9.043s
>> 
>> bench_R14B04_1.1.1_wow.tpl.log 0m29.842s
>> bench_R14B04_1.2.x_wow.tpl.log 0m24.178s
>> bench_R15B_1.1.1_wow.tpl.log   0m20.493s
>> bench_R15B_1.2.x_wow.tpl.log   0m19.584s
>> 
>> (Full logs at [5])
>> 
>> 
>> # Description
>> 
>> All of these are on Mac OS X 10.7.3 on an SSD.
>> 
>> I'll be running the same set on spinning disk and then Robert N asked
>> me to populate the DBs not using builk docs. Since that's gonna take
>> a while, I'll probably run this overnight.
>> 
>> All of the results are generated by my fork of Jason's slow_couchdb[1]
>> and Filipe's seatoncouch[2].
>> 
>> The changes I've made is have the small_doc test run with 500k instead
>> of 50k docs, added .view files to match the tpl files in
>> seatoncouch/templates/* so we can have similar views use the different
>> doc structures.
>> 
>> I also added two scripts to orchestrate the above testing in a more
>> automated fashion. It also allows you to run the full matrix yourself.
>> All you need is set up homebrew allow `brew switch erlang R14B04` and
>> R15B (which is controlled in matrix.sh[3]) and have a git checkout of the
>> CouchDB sources that allow you to do `git checkout 1.1.1` or `1.2.x`
>> (which is controlled in runner.sh[4], adjust the path to the git checkout
>> there as well).
>> 
>> matrix.sh also allows you to specify which docs to run.
>> 
>> Please shout if you need any more info about this test run or how to
>> run this yourself.
>> 
>> 
>> # Analysis
>> 
>> Inconclusive, I'l like to run this on larger dbs in general to see if
>> there are more differences that shake out and I've yet have to run this
>> on a spinning disk let alone another OS* or more complex view functions
>> or larger design docs (like the one Stefan had).
>> 
>> * It shouldn't be too much work to port slow_couchdb to other OSs, I'll
>> definitely be looking into that, but we can do with every bit of help :)
>> 
>> So far, I'm happy to conclude that while there are definitely provable
>> differences, that we can live with them.
>> 
>> Cheers
>> Jan
>> -- 
>> 
>> 
>> [1]: https://github.com/janl/slow_couchdb
>> [2]: https://github.com/janl/seatoncouch
>> [3]: https://github.com/janl/slow_couchdb/blob/master/matrix.sh
>> [4]: https://github.com/janl/slow_couchdb/blob/master/runner.sh
>> [5]: http://jan.prima.de/slow_couch/ssd/
>> 
>> 
>> On Feb 28, 2012, at 18:53 , Filipe David Manana wrote:
>> 
>>> Jason, repeated my last test with the 1Kb docs (
>>> https://gist.github.com/1930804, map function
>>> http://friendpaste.com/5C99aqXocN6N6H1BAYIigs ) to cover branch 1.1.x
>>

Re: Please report your indexing speed

2012-02-28 Thread Jan Lehnardt
Same story, but spinning disk, 5400rpm:

bench_R14B04_1.1.1_default_doc.tpl.log 0m19.175s
bench_R14B04_1.2.x_default_doc.tpl.log 0m16.821s
bench_R15B_1.1.1_default_doc.tpl.log   0m13.050s
bench_R15B_1.2.x_default_doc.tpl.log   0m13.292s

bench_R14B04_1.1.1_nested_6k.tpl.log   1m26.941s
bench_R14B04_1.2.x_nested_6k.tpl.log   0m39.178s
bench_R15B_1.1.1_nested_6k.tpl.log 0m47.766s
bench_R15B_1.2.x_nested_6k.tpl.log 0m31.697s

bench_R14B04_1.1.1_small_doc.tpl.log   1m19.851s
bench_R14B04_1.2.x_small_doc.tpl.log   1m43.057s
bench_R15B_1.1.1_small_doc.tpl.log 0m52.249s
bench_R15B_1.2.x_small_doc.tpl.log 1m8.195s

bench_R14B04_1.1.1_wow.tpl.log 0m29.589s
bench_R14B04_1.2.x_wow.tpl.log 0m24.867s
bench_R15B_1.1.1_wow.tpl.log   0m20.171s
bench_R15B_1.2.x_wow.tpl.log   0m18.800s

Full logs at http://jan.prima.de/slow_couch/rust/

Cheers
Jan
-- 


On Feb 28, 2012, at 21:22 , Jan Lehnardt wrote:

> 
> # tl;dr:
> 
> bench_R14B04_1.1.1_default_doc.tpl.log 0m18.749s
> bench_R14B04_1.2.x_default_doc.tpl.log 0m16.304s
> bench_R15B_1.1.1_default_doc.tpl.log   0m12.946s
> bench_R15B_1.2.x_default_doc.tpl.log   0m13.616s
> 
> bench_R14B04_1.1.1_nested_6k.tpl.log   1m27.267s
> bench_R14B04_1.2.x_nested_6k.tpl.log   0m37.910s
> bench_R15B_1.1.1_nested_6k.tpl.log 0m46.963s
> bench_R15B_1.2.x_nested_6k.tpl.log 0m33.011s
> 
> bench_R14B04_1.1.1_small_doc.tpl.log   1m17.212s
> bench_R14B04_1.2.x_small_doc.tpl.log   1m41.383s
> bench_R15B_1.1.1_small_doc.tpl.log 0m52.858s
> bench_R15B_1.2.x_small_doc.tpl.log 1m9.043s
> 
> bench_R14B04_1.1.1_wow.tpl.log 0m29.842s
> bench_R14B04_1.2.x_wow.tpl.log 0m24.178s
> bench_R15B_1.1.1_wow.tpl.log   0m20.493s
> bench_R15B_1.2.x_wow.tpl.log   0m19.584s
> 
> (Full logs at [5])
> 
> 
> # Description
> 
> All of these are on Mac OS X 10.7.3 on an SSD.
> 
> I'll be running the same set on spinning disk and then Robert N asked
> me to populate the DBs not using builk docs. Since that's gonna take
> a while, I'll probably run this overnight.
> 
> All of the results are generated by my fork of Jason's slow_couchdb[1]
> and Filipe's seatoncouch[2].
> 
> The changes I've made is have the small_doc test run with 500k instead
> of 50k docs, added .view files to match the tpl files in
> seatoncouch/templates/* so we can have similar views use the different
> doc structures.
> 
> I also added two scripts to orchestrate the above testing in a more
> automated fashion. It also allows you to run the full matrix yourself.
> All you need is set up homebrew allow `brew switch erlang R14B04` and
> R15B (which is controlled in matrix.sh[3]) and have a git checkout of the
> CouchDB sources that allow you to do `git checkout 1.1.1` or `1.2.x`
> (which is controlled in runner.sh[4], adjust the path to the git checkout
> there as well).
> 
> matrix.sh also allows you to specify which docs to run.
> 
> Please shout if you need any more info about this test run or how to
> run this yourself.
> 
> 
> # Analysis
> 
> Inconclusive, I'l like to run this on larger dbs in general to see if
> there are more differences that shake out and I've yet have to run this
> on a spinning disk let alone another OS* or more complex view functions
> or larger design docs (like the one Stefan had).
> 
> * It shouldn't be too much work to port slow_couchdb to other OSs, I'll
> definitely be looking into that, but we can do with every bit of help :)
> 
> So far, I'm happy to conclude that while there are definitely provable
> differences, that we can live with them.
> 
> Cheers
> Jan
> -- 
> 
> 
> [1]: https://github.com/janl/slow_couchdb
> [2]: https://github.com/janl/seatoncouch
> [3]: https://github.com/janl/slow_couchdb/blob/master/matrix.sh
> [4]: https://github.com/janl/slow_couchdb/blob/master/runner.sh
> [5]: http://jan.prima.de/slow_couch/ssd/
> 
> 
> On Feb 28, 2012, at 18:53 , Filipe David Manana wrote:
> 
>> Jason, repeated my last test with the 1Kb docs (
>> https://gist.github.com/1930804, map function
>> http://friendpaste.com/5C99aqXocN6N6H1BAYIigs ) to cover branch 1.1.x
>> as well. Here are the full results (also in
>> https://gist.github.com/1930807):
>> 
>> 
>> Before COUCHDB-1186
>> 
>> fdmanana 23:21:05 ~/git/hub/slow_couchdb (master)> docs=50
>> batch=5000 ./bench.sh wow.tpl
>> Server: CouchDB/1.2.0a-a68a792-git (Erlang OTP/R14B03)
>> {"couchdb":"Welcome","version":"1.2.0a-a68a792-git"}
>> 
>> [INFO] Created DB named `db1'
>> [INFO] Uploaded 5000 documents via _bulk_docs
>> ()
>> [INFO] Uploaded 5000 documents via _bulk_docs
>> Building view.
>> {"total_rows":50,"offset":0,"rows":[
>> {"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":

Re: Please report your indexing speed

2012-02-28 Thread Jan Lehnardt

# tl;dr:

bench_R14B04_1.1.1_default_doc.tpl.log 0m18.749s
bench_R14B04_1.2.x_default_doc.tpl.log 0m16.304s
bench_R15B_1.1.1_default_doc.tpl.log   0m12.946s
bench_R15B_1.2.x_default_doc.tpl.log   0m13.616s

bench_R14B04_1.1.1_nested_6k.tpl.log   1m27.267s
bench_R14B04_1.2.x_nested_6k.tpl.log   0m37.910s
bench_R15B_1.1.1_nested_6k.tpl.log 0m46.963s
bench_R15B_1.2.x_nested_6k.tpl.log 0m33.011s

bench_R14B04_1.1.1_small_doc.tpl.log   1m17.212s
bench_R14B04_1.2.x_small_doc.tpl.log   1m41.383s
bench_R15B_1.1.1_small_doc.tpl.log 0m52.858s
bench_R15B_1.2.x_small_doc.tpl.log 1m9.043s

bench_R14B04_1.1.1_wow.tpl.log 0m29.842s
bench_R14B04_1.2.x_wow.tpl.log 0m24.178s
bench_R15B_1.1.1_wow.tpl.log   0m20.493s
bench_R15B_1.2.x_wow.tpl.log   0m19.584s

(Full logs at [5])


# Description

All of these are on Mac OS X 10.7.3 on an SSD.

I'll be running the same set on spinning disk and then Robert N asked
me to populate the DBs not using builk docs. Since that's gonna take
a while, I'll probably run this overnight.

All of the results are generated by my fork of Jason's slow_couchdb[1]
and Filipe's seatoncouch[2].

The changes I've made is have the small_doc test run with 500k instead
of 50k docs, added .view files to match the tpl files in
seatoncouch/templates/* so we can have similar views use the different
doc structures.

I also added two scripts to orchestrate the above testing in a more
automated fashion. It also allows you to run the full matrix yourself.
All you need is set up homebrew allow `brew switch erlang R14B04` and
R15B (which is controlled in matrix.sh[3]) and have a git checkout of the
CouchDB sources that allow you to do `git checkout 1.1.1` or `1.2.x`
(which is controlled in runner.sh[4], adjust the path to the git checkout
there as well).

matrix.sh also allows you to specify which docs to run.

Please shout if you need any more info about this test run or how to
run this yourself.


# Analysis

Inconclusive, I'l like to run this on larger dbs in general to see if
there are more differences that shake out and I've yet have to run this
on a spinning disk let alone another OS* or more complex view functions
or larger design docs (like the one Stefan had).

* It shouldn't be too much work to port slow_couchdb to other OSs, I'll
definitely be looking into that, but we can do with every bit of help :)

So far, I'm happy to conclude that while there are definitely provable
differences, that we can live with them.

Cheers
Jan
-- 


[1]: https://github.com/janl/slow_couchdb
[2]: https://github.com/janl/seatoncouch
[3]: https://github.com/janl/slow_couchdb/blob/master/matrix.sh
[4]: https://github.com/janl/slow_couchdb/blob/master/runner.sh
[5]: http://jan.prima.de/slow_couch/ssd/


On Feb 28, 2012, at 18:53 , Filipe David Manana wrote:

> Jason, repeated my last test with the 1Kb docs (
> https://gist.github.com/1930804, map function
> http://friendpaste.com/5C99aqXocN6N6H1BAYIigs ) to cover branch 1.1.x
> as well. Here are the full results (also in
> https://gist.github.com/1930807):
> 
> 
> Before COUCHDB-1186
> 
> fdmanana 23:21:05 ~/git/hub/slow_couchdb (master)> docs=50
> batch=5000 ./bench.sh wow.tpl
> Server: CouchDB/1.2.0a-a68a792-git (Erlang OTP/R14B03)
> {"couchdb":"Welcome","version":"1.2.0a-a68a792-git"}
> 
> [INFO] Created DB named `db1'
> [INFO] Uploaded 5000 documents via _bulk_docs
> ()
> [INFO] Uploaded 5000 documents via _bulk_docs
> Building view.
> {"total_rows":50,"offset":0,"rows":[
> {"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
> ]}
> 
> real  5m6.676s
> user  0m0.009s
> sys   0m0.010s
> 
> 
> After COUCHDB-1186
> 
> fdmanana 23:50:07 ~/git/hub/slow_couchdb (master)> docs=50
> batch=5000 ./bench.sh wow.tpl
> Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
> {"couchdb":"Welcome","version":"1.2.0a-f023052-git"}
> 
> [INFO] Created DB named `db1'
> [INFO] Uploaded 5000 documents via _bulk_docs
> ()
> [INFO] Uploaded 5000 documents via _bulk_docs
> Building view.
> {"total_rows":50,"offset":0,"rows":[
> {"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
> ]}
> 
> real  5m1.395s
> user  0m0.008s
> sys   0m0.010s
> 
> 
> After COUCHDB-1186 + better queueing patch
> (http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w)
> 
> fdmanana 00:14:25 ~/git/hub/slow_couchdb (master)> docs=50
> batch=5000 ./bench.sh wow.tpl
> Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
> {"couchdb":"Welcome","version":"1.2.0a-f023052-git"}
> 
> [INFO] Created DB named `db1'
> [INFO] Uploaded 5000 documents via _bulk_docs
> (..

Re: Please report your indexing speed

2012-02-28 Thread Filipe David Manana
Jason, repeated my last test with the 1Kb docs (
https://gist.github.com/1930804, map function
http://friendpaste.com/5C99aqXocN6N6H1BAYIigs ) to cover branch 1.1.x
as well. Here are the full results (also in
https://gist.github.com/1930807):


Before COUCHDB-1186

fdmanana 23:21:05 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-a68a792-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-a68a792-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real5m6.676s
user0m0.009s
sys 0m0.010s


After COUCHDB-1186

fdmanana 23:50:07 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-f023052-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real5m1.395s
user0m0.008s
sys 0m0.010s


After COUCHDB-1186 + better queueing patch
(http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w)

fdmanana 00:14:25 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-f023052-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real4m48.175s
user0m0.008s
sys 0m0.009s


CouchDB branch 1.1.x

fdmanana 08:16:58 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.1.2a785d32f-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.1.2a785d32f-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"0001c0a1-edcb-4dbc-aa9d-533c73d980cb","key":["dwarf","assassin"],"value":[{"x":62038.32,"y":105825.29},{"x":90713.13,"y":128570.97},{"x":43836.37,"y":80517.12},{"x":71610.97,"y":143739.99},{"x":86038.39,"y":84731.8}]}
]}

real5m44.374s
user0m0.008s
sys 0m0.010s


Disk model APPLE SSD TS128C, quad core machine, 8Gb of ram.



On Tue, Feb 28, 2012 at 5:17 AM, Jason Smith  wrote:
> Forgive the clean new thread. Hopefully it will not remain so.
>
> If you can, would you please clone https://github.com/jhs/slow_couchdb
>
> And build whatever Erlangs and CouchDB checkouts you see fit, and run
> the test. For example:
>
>    docs=50 ./bench.sh small_doc.tpl
>
> That should run the test and, God willing, upload the results to a
> couch in the cloud. We should be able to use that information to
> identify who you are, whether you are on SSD, what Erlang and Couch
> build, and how fast it ran. Modulo bugs.



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."


[jira] [Commented] (COUCHDB-1275) Futon's recent database list doesn't decode slashes in database names

2012-02-28 Thread Jan Lehnardt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218241#comment-13218241
 ] 

Jan Lehnardt commented on COUCHDB-1275:
---

yeah :)

> Futon's recent database list doesn't decode slashes in database names
> -
>
> Key: COUCHDB-1275
> URL: https://issues.apache.org/jira/browse/COUCHDB-1275
> Project: CouchDB
>  Issue Type: Bug
>  Components: Futon
>Affects Versions: 1.1
>Reporter: Jan Lehnardt
>Priority: Minor
>
> Create a database with a slash in it, futon will go to the database view 
> automatically and add it to the recent databases list. the list will display 
> the encoded %2f instead of the /
> Here's a quick fix: http://friendpaste.com/1WORPAfSY5MUyoisaAQtZB
> I tested it for XSS but I may have overlooked something and I'd appreciate a 
> review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Noah Slater
On Tue, Feb 28, 2012 at 10:05 AM, Benoit Chesneau wrote:
>
> Also noah, jan what is the status of this vote? Should we consider it
> as aborted or paused?


As far as I can tell, we have not identified, for sure, a release blocking
issue. Once we are sure that there is a release blocking issue, I will
abort the vote. But I am not aborting it simply because it's taking a while
to get clear on the issues. :)


Please report your indexing speed

2012-02-28 Thread Jason Smith
Forgive the clean new thread. Hopefully it will not remain so.

If you can, would you please clone https://github.com/jhs/slow_couchdb

And build whatever Erlangs and CouchDB checkouts you see fit, and run
the test. For example:

docs=50 ./bench.sh small_doc.tpl

That should run the test and, God willing, upload the results to a
couch in the cloud. We should be able to use that information to
identify who you are, whether you are on SSD, what Erlang and Couch
build, and how fast it ran. Modulo bugs.


Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Robert Newson
I'm running my script on a EC2 node with spinning media, the numbers
come out the same for 1.1.1 vs 1.2. The only time I've seen a slowdown
with a scripted approach is my original one which didn't use bulk
docs. :/

B.

On 28 February 2012 11:33, Bob Dionne  wrote:
> Filipe,
>
> This additional patch looks good, though I haven't tested it. Interesting 
> comment about R15B, I did notice a difference with BigCouch in terms of some 
> of the internal race conditions we see at times. Perhaps there are some 
> performance changes relating to that. I also recently upgraded from the 
> Macbook pro to a MBA so who knows.
>
> I ran Jason and Bob's scripts a bit last night and saw similar slow downs 
> between 1.1 and 1.2, though as reported elsewhere with larger docs it's less 
> of an issue. In this patch[1] there's clearly a savings in avoiding the 
> decode call, but I wonder how often that case obtains compared to the others. 
> If {cmd, CMD} dominates then there is an additional overhead incurred however 
> small it might be. Perhaps this explains why the benefits appear for larger 
> docs only.
>
> Anyway, just speculation from the code.
>
> Regards,
>
> Bob
>
> [1] https://github.com/fdmanana/couchdb/commit/cce325378723c863f05cca21
>
> On Feb 27, 2012, at 11:33 AM, Filipe David Manana wrote:
>
>> I just tried Jason's script (modified it to use 500 000 docs instead
>> of 50 000) against 1.2.x and 1.1.1, using OTP R14B03. Here's my
>> results:
>>
>> 1.2.x:
>>
>> $ port=5984 ./test.sh
>> "none"
>> Filling db.
>> done
>> HTTP/1.1 200 OK
>> Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
>> Date: Mon, 27 Feb 2012 16:08:43 GMT
>> Content-Type: text/plain; charset=utf-8
>> Content-Length: 252
>> Cache-Control: must-revalidate
>>
>> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":130494577,"data_size":130490673,"instance_start_time":"1330358830830086","disk_format_version":6,"committed_update_seq":51}
>> Building view.
>>
>> real  1m5.725s
>> user  0m0.006s
>> sys   0m0.005s
>> done
>>
>>
>> 1.1.1:
>>
>> $ port=5984 ./test.sh
>> ""
>> Filling db.
>> done
>> HTTP/1.1 200 OK
>> Server: CouchDB/1.1.2a785d32f-git (Erlang OTP/R14B03)
>> Date: Mon, 27 Feb 2012 16:15:33 GMT
>> Content-Type: text/plain;charset=utf-8
>> Content-Length: 230
>> Cache-Control: must-revalidate
>>
>> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":122142818,"instance_start_time":"1330359233327316","disk_format_version":5,"committed_update_seq":51}
>> Building view.
>>
>> real  1m4.249s
>> user  0m0.006s
>> sys   0m0.005s
>> done
>>
>>
>> I don't see any significant difference there.
>>
>> Regarding COUCHDB-1186, the only thing that might cause some non
>> determinism and affect performance is the queing/dequeing. Depending
>> on timings, it's possible the writer is dequeing less items per
>> dequeue operation and therefore inserting smaller batches into the
>> btree. The following small change ensures larger batches (while still
>> respecting the queue max size/item count):
>>
>> http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w
>>
>> Running the test with this change:
>>
>> $ port=5984 ./test.sh
>> "none"
>> Filling db.
>> done
>> HTTP/1.1 200 OK
>> Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
>> Date: Mon, 27 Feb 2012 16:23:20 GMT
>> Content-Type: text/plain; charset=utf-8
>> Content-Length: 252
>> Cache-Control: must-revalidate
>>
>> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":130494577,"data_size":130490673,"instance_start_time":"1330359706846104","disk_format_version":6,"committed_update_seq":51}
>> Building view.
>>
>> real  0m49.762s
>> user  0m0.006s
>> sys   0m0.005s
>> done
>>
>>
>> If there's no objection, I'll push that patch.
>>
>> Also, another note, I noticed sometime ago that with master, using OTP
>> R15B I got a performance drop of 10% to 15% compared to using master
>> with OTP R14B04. Maybe it applies to 1.2.x as well.
>>
>>
>> On Mon, Feb 27, 2012 at 5:33 AM, Robert Newson  wrote:
>>> Bob D, can you give more details on the data set you're testing?
>>> Number of docs, size/complexity of docs, etc? Basically, enough info
>>> that I could write a script to automate building an equivalent
>>> database.
>>>
>>> I wrote a quick bash script to make a database and time a view build
>>> here: http://friendpaste.com/7kBiKJn3uX1KiGJAFPv4nK
>>>
>>> B.
>>>
>>> On 27 February 2012 13:15, Jan Lehnardt  wrote:

 On Feb 27, 2012, at 12:58 , Bob Dionne wrote:

> Thanks for the clarification. I hope I'm not conflating things by 
> continuing the discussion here, I thought that's what you requested?

 The discussion we had on IRC was regarding collecting more data items for 
 the performance regression before we start to draw conclusions.

 My intention here is to understand wh

Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Benoit Chesneau
On Tue, Feb 28, 2012 at 11:05 AM, Benoit Chesneau  wrote:
> On Tue, Feb 28, 2012 at 4:49 AM, Paul Davis  
> wrote
>>
>> Yeah, I've seen the btree behave quite differently on SSD's vs HDD's
>> (same code had drastically different runtime characteristics).
>>
>> In other words, can we get a report of what type of disk everyone is running 
>> on?
>>
> + 1 .
>
> We actually pollute this thread about vote, and the ticket about view
> speedups which could be related or not :) Maybe we could open a ticket
> to collect all the feedback and tests we have ?

N / Y ?


Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Bob Dionne
Filipe,

This additional patch looks good, though I haven't tested it. Interesting 
comment about R15B, I did notice a difference with BigCouch in terms of some of 
the internal race conditions we see at times. Perhaps there are some 
performance changes relating to that. I also recently upgraded from the Macbook 
pro to a MBA so who knows.

I ran Jason and Bob's scripts a bit last night and saw similar slow downs 
between 1.1 and 1.2, though as reported elsewhere with larger docs it's less of 
an issue. In this patch[1] there's clearly a savings in avoiding the decode 
call, but I wonder how often that case obtains compared to the others. If {cmd, 
CMD} dominates then there is an additional overhead incurred however small it 
might be. Perhaps this explains why the benefits appear for larger docs only.

Anyway, just speculation from the code.

Regards,

Bob

[1] https://github.com/fdmanana/couchdb/commit/cce325378723c863f05cca21

On Feb 27, 2012, at 11:33 AM, Filipe David Manana wrote:

> I just tried Jason's script (modified it to use 500 000 docs instead
> of 50 000) against 1.2.x and 1.1.1, using OTP R14B03. Here's my
> results:
> 
> 1.2.x:
> 
> $ port=5984 ./test.sh
> "none"
> Filling db.
> done
> HTTP/1.1 200 OK
> Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
> Date: Mon, 27 Feb 2012 16:08:43 GMT
> Content-Type: text/plain; charset=utf-8
> Content-Length: 252
> Cache-Control: must-revalidate
> 
> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":130494577,"data_size":130490673,"instance_start_time":"1330358830830086","disk_format_version":6,"committed_update_seq":51}
> Building view.
> 
> real  1m5.725s
> user  0m0.006s
> sys   0m0.005s
> done
> 
> 
> 1.1.1:
> 
> $ port=5984 ./test.sh
> ""
> Filling db.
> done
> HTTP/1.1 200 OK
> Server: CouchDB/1.1.2a785d32f-git (Erlang OTP/R14B03)
> Date: Mon, 27 Feb 2012 16:15:33 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 230
> Cache-Control: must-revalidate
> 
> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":122142818,"instance_start_time":"1330359233327316","disk_format_version":5,"committed_update_seq":51}
> Building view.
> 
> real  1m4.249s
> user  0m0.006s
> sys   0m0.005s
> done
> 
> 
> I don't see any significant difference there.
> 
> Regarding COUCHDB-1186, the only thing that might cause some non
> determinism and affect performance is the queing/dequeing. Depending
> on timings, it's possible the writer is dequeing less items per
> dequeue operation and therefore inserting smaller batches into the
> btree. The following small change ensures larger batches (while still
> respecting the queue max size/item count):
> 
> http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w
> 
> Running the test with this change:
> 
> $ port=5984 ./test.sh
> "none"
> Filling db.
> done
> HTTP/1.1 200 OK
> Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
> Date: Mon, 27 Feb 2012 16:23:20 GMT
> Content-Type: text/plain; charset=utf-8
> Content-Length: 252
> Cache-Control: must-revalidate
> 
> {"db_name":"db1","doc_count":51,"doc_del_count":0,"update_seq":51,"purge_seq":0,"compact_running":false,"disk_size":130494577,"data_size":130490673,"instance_start_time":"1330359706846104","disk_format_version":6,"committed_update_seq":51}
> Building view.
> 
> real  0m49.762s
> user  0m0.006s
> sys   0m0.005s
> done
> 
> 
> If there's no objection, I'll push that patch.
> 
> Also, another note, I noticed sometime ago that with master, using OTP
> R15B I got a performance drop of 10% to 15% compared to using master
> with OTP R14B04. Maybe it applies to 1.2.x as well.
> 
> 
> On Mon, Feb 27, 2012 at 5:33 AM, Robert Newson  wrote:
>> Bob D, can you give more details on the data set you're testing?
>> Number of docs, size/complexity of docs, etc? Basically, enough info
>> that I could write a script to automate building an equivalent
>> database.
>> 
>> I wrote a quick bash script to make a database and time a view build
>> here: http://friendpaste.com/7kBiKJn3uX1KiGJAFPv4nK
>> 
>> B.
>> 
>> On 27 February 2012 13:15, Jan Lehnardt  wrote:
>>> 
>>> On Feb 27, 2012, at 12:58 , Bob Dionne wrote:
>>> 
 Thanks for the clarification. I hope I'm not conflating things by 
 continuing the discussion here, I thought that's what you requested?
>>> 
>>> The discussion we had on IRC was regarding collecting more data items for 
>>> the performance regression before we start to draw conclusions.
>>> 
>>> My intention here is to understand what needs doing before we can release 
>>> 1.2.0.
>>> 
>>> I'll reply inline for the other issues.
>>> 
 I just downloaded the release candidate again to start fresh. "make 
 distcheck" hangs on this step:
 
 /Users/bitdiddle/Downloads/apache-couchdb-1.2.0/apache-couchdb-1.2.0/_build/../test/etap/150-invalid-view-seq.t
  . 6/?
 
 Just stops completely. Th

Re: feasibility of a design doc option to use the "ddoc new"/"ddoc " based protocol for map and reduce as well

2012-02-28 Thread Ronny Pfannschmidt

On 02/28/2012 11:24 AM, Benoit Chesneau wrote:

On Tue, Feb 28, 2012 at 11:09 AM, Jason Smith  wrote:

On Tue, Feb 28, 2012 at 10:05 AM, Alexander Shorin  wrote:

Hi Ronny,

Invalidating views by ddoc _rev change is very bad idea - your 2M docs
database will have to be reindexed on each ddoc update: by adding
attachment or changing show function. Wait, what's the reason for
views to be invalidated in this case?


Ronny, please correct me if I am wrong.

But I think the reason is to allow using the *entire* design document
to help build views. If so, the _rev invalidation is one thing, but
changing CouchDB to send the entire ddoc will be a more substantial
change.

At any rate, this is why some example failing unit tests might clarify
the objective.



why not adding a version property to your ddoc changes ?



i started to realize, that a better workaround could actually just
put the data required for my viewservers view handling into the 
doc.views.libs attribute


then changes to that would automatically invalidate the views without 
breaking everything


i will investigate how to lay out my ddocs to get that behavior

-- Ronny



Re: feasibility of a design doc option to use the "ddoc new"/"ddoc " based protocol for map and reduce as well

2012-02-28 Thread Benoit Chesneau
On Tue, Feb 28, 2012 at 11:09 AM, Jason Smith  wrote:
> On Tue, Feb 28, 2012 at 10:05 AM, Alexander Shorin  wrote:
>> Hi Ronny,
>>
>> Invalidating views by ddoc _rev change is very bad idea - your 2M docs
>> database will have to be reindexed on each ddoc update: by adding
>> attachment or changing show function. Wait, what's the reason for
>> views to be invalidated in this case?
>
> Ronny, please correct me if I am wrong.
>
> But I think the reason is to allow using the *entire* design document
> to help build views. If so, the _rev invalidation is one thing, but
> changing CouchDB to send the entire ddoc will be a more substantial
> change.
>
> At any rate, this is why some example failing unit tests might clarify
> the objective.
>

why not adding a version property to your ddoc changes ?

- benoît


Re: feasibility of a design doc option to use the "ddoc new"/"ddoc " based protocol for map and reduce as well

2012-02-28 Thread Jason Smith
On Tue, Feb 28, 2012 at 10:05 AM, Alexander Shorin  wrote:
> Hi Ronny,
>
> Invalidating views by ddoc _rev change is very bad idea - your 2M docs
> database will have to be reindexed on each ddoc update: by adding
> attachment or changing show function. Wait, what's the reason for
> views to be invalidated in this case?

Ronny, please correct me if I am wrong.

But I think the reason is to allow using the *entire* design document
to help build views. If so, the _rev invalidation is one thing, but
changing CouchDB to send the entire ddoc will be a more substantial
change.

At any rate, this is why some example failing unit tests might clarify
the objective.

-- 
Iris Couch


Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Benoit Chesneau
On Tue, Feb 28, 2012 at 4:49 AM, Paul Davis  wrote
>
> Yeah, I've seen the btree behave quite differently on SSD's vs HDD's
> (same code had drastically different runtime characteristics).
>
> In other words, can we get a report of what type of disk everyone is running 
> on?
>
+ 1 .

We actually pollute this thread about vote, and the ticket about view
speedups which could be related or not :) Maybe we could open a ticket
to collect all the feedback and tests we have ?

Also noah, jan what is the status of this vote? Should we consider it
as aborted or paused?

- benoit


Re: feasibility of a design doc option to use the "ddoc new"/"ddoc " based protocol for map and reduce as well

2012-02-28 Thread Alexander Shorin
Hi Ronny,

Invalidating views by ddoc _rev change is very bad idea - your 2M docs
database will have to be reindexed on each ddoc update: by adding
attachment or changing show function. Wait, what's the reason for
views to be invalidated in this case?


--
,,,^..^,,,



On Tue, Feb 28, 2012 at 12:45 PM, Ronny Pfannschmidt
 wrote:
> On 02/28/2012 04:09 AM, Jason Smith wrote:
>>
>> On Tue, Feb 28, 2012 at 7:12 AM, Ronny Pfannschmidt
>>   wrote:
>>>
>>> Hi,
>>>
>>> while trying to build a a view server for ddocs that validate/support
>>> documents as FSM (Finite State Machine)
>>> i hit a inherent limit of the protocol,
>>>
>>> map and reduce don't get the full ddoc, but only a part of it,
>>> which means my view server can't actually work with the full ddoc
>>> unless i do some weird hacks, and end up in a situation,
>>> where i circumvent proper view invalidation
>>>
>>> if map/reduce where allowed to opt in for using the newer protocol for
>>> accessing functions,
>>> my problem would go away
>>>
>>> as for view invalidation, a simple variant could just use the _rev,
>>> a more sophisticated one would take a hash of parts of the document
>>> (using excludes/includes defined in options as well)
>>
>>
>> Hi, Ronny. Are you aware that the contents of .views.lib are sent to
>> the view server? At least with Javascript, the idea is that CommonJS
>> modules can go in there.
>>
>> Maybe that can help as a workaround for now.
>>
>
> Hi Jason,
>
> rather than just a workaround,
> i would like to know the likelihood of accepting a patch that implements the
> view option + using _rev as invalidation hint
>
> also i cant find docs on the protocol that's being used for exchanging
> CommonJS of views to the viewserver
>
> -- Ronny


Re: feasibility of a design doc option to use the "ddoc new"/"ddoc " based protocol for map and reduce as well

2012-02-28 Thread Ronny Pfannschmidt

On 02/28/2012 04:09 AM, Jason Smith wrote:

On Tue, Feb 28, 2012 at 7:12 AM, Ronny Pfannschmidt
  wrote:

Hi,

while trying to build a a view server for ddocs that validate/support
documents as FSM (Finite State Machine)
i hit a inherent limit of the protocol,

map and reduce don't get the full ddoc, but only a part of it,
which means my view server can't actually work with the full ddoc
unless i do some weird hacks, and end up in a situation,
where i circumvent proper view invalidation

if map/reduce where allowed to opt in for using the newer protocol for
accessing functions,
my problem would go away

as for view invalidation, a simple variant could just use the _rev,
a more sophisticated one would take a hash of parts of the document
(using excludes/includes defined in options as well)


Hi, Ronny. Are you aware that the contents of .views.lib are sent to
the view server? At least with Javascript, the idea is that CommonJS
modules can go in there.

Maybe that can help as a workaround for now.



Hi Jason,

rather than just a workaround,
i would like to know the likelihood of accepting a patch that implements 
the view option + using _rev as invalidation hint


also i cant find docs on the protocol that's being used for exchanging 
CommonJS of views to the viewserver


-- Ronny


[jira] [Commented] (COUCHDB-1186) Speedups in the view indexer

2012-02-28 Thread Filipe Manana (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218002#comment-13218002
 ] 

Filipe Manana commented on COUCHDB-1186:


My replies in the following development mailing list thread:

http://mail-archives.apache.org/mod_mbox/couchdb-dev/201202.mbox/%3CCA%2BY%2B4475J_wPbiC%3Dg2R6CcqUfQ-_V6TTTxV2iS4xTbz9a10%2BXw%40mail.gmail.com%3E



> Speedups in the view indexer
> 
>
> Key: COUCHDB-1186
> URL: https://issues.apache.org/jira/browse/COUCHDB-1186
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: Filipe Manana
>Assignee: Filipe Manana
> Fix For: 1.2
>
>
> The patches at [1] and [2] do 2 distinct optimizations to the view indexer
> 1) Use a NIF to implement couch_view:less_json/2;
> 2) Multiple small optimizations to couch_view_updater - the main one is to 
> decode the view server's JSON only in the updater's write process, avoiding 2 
> EJSON term copying phases (couch_os_process -> updater processes and writes 
> work queue)
> [1] - 
> https://github.com/fdmanana/couchdb/commit/3935a4a991abc32132c078e908dbc11925605602
> [2] - 
> https://github.com/fdmanana/couchdb/commit/cce325378723c863f05cca2192ac7bd58eedde1c
> Using these 2 patches, I've seen significant improvements to view generation 
> time. Here I present as example the databases at:
> A) http://fdmanana.couchone.com/indexer_test_2
> B) http://fdmanana.couchone.com/indexer_test_3
> ## Trunk
> ### database A
> $ time curl 
> http://localhost:5985/indexer_test_2/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> 
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871
>.73},{"x":153746.28,"y":190006.59}]}
> ]}
> real  19m46.007s
> user  0m0.024s
> sys   0m0.020s
> ### Database B
> $ time curl 
> http://localhost:5985/indexer_test_3/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> 
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871
>.73},{"x":153746.28,"y":190006.59}]}
> ]}
> real  21m41.958s
> user  0m0.004s
> sys   0m0.028s
> ## Trunk + the 2 patches
> ### Database A
>   $ time curl 
> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
>   {"total_rows":1102400,"offset":0,"rows":[
>   
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.7
>   3},{"x":153746.28,"y":190006.59}]}
>   ]}
>   real16m1.820s
>   user0m0.000s
>   sys 0m0.028s
>   (versus 19m46 with trunk)
> ### Database B
>   $ time curl 
> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
>   {"total_rows":1102400,"offset":0,"rows":[
>   
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.7
>   3},{"x":153746.28,"y":190006.59}]}
>   ]}
>   real17m22.778s
>   user0m0.020s
>   sys 0m0.016s
>   (versus 21m41s with trunk)
> Repeating these tests, always clearing my OS/fs cache before running them 
> (via `echo 3 > /proc/sys/vm/drop_caches`), I always get about the same 
> relative differences.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [VOTE] Apache CouchDB 1.2.0 release, second round

2012-02-28 Thread Filipe David Manana
Jason, made some more tests with larger documents (template is
https://gist.github.com/1930804) and a different map function:

function(doc) {
   emit([doc.type, doc.category], doc.nested.coords);
}

(patch http://friendpaste.com/5C99aqXocN6N6H1BAYIigs)

Here's the results I got ( https://gist.github.com/1930807 )


Before COUCHDB-1186

fdmanana 23:21:05 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-a68a792-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-a68a792-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real5m6.676s
user0m0.009s
sys 0m0.010s


After COUCHDB-1186

fdmanana 23:50:07 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-f023052-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real5m1.395s
user0m0.008s
sys 0m0.010s


After COUCHDB-1186 + better queueing patch (
http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w )

fdmanana 00:14:25 ~/git/hub/slow_couchdb (master)> docs=50
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0a-f023052-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
()
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":50,"offset":0,"rows":[
{"id":"00051ef7-d735-48d7-9ba8-5a21a86e8d57","key":["dwarf","assassin"],"value":[{"x":31227.35,"y":31529.73},{"x":116667.85,"y":82008.25},{"x":224.11,"y":36652.41},{"x":128565.95,"y":6780.2},{"x":165230.43,"y":176208.63}]}
]}

real4m48.175s
user0m0.008s
sys 0m0.009s


Disk model is APPLE SSD TS128C, quad core machine, 8Gb of ram.

Unfortunately I don't have access to the machine I used for the tests
in COUCHDB-1186 (spinning disk, Linux) before next week.


On Mon, Feb 27, 2012 at 7:49 PM, Paul Davis  wrote:
> On Mon, Feb 27, 2012 at 7:18 PM, Filipe David Manana
>  wrote:
>> Jason, can't reproduce those results, not even close:
>>
>> http://friendpaste.com/1L4pHH8WQchaLIMCWhKX9Z
>>
>> Before COUCHDB-1186
>>
>> fdmanana 16:58:02 ~/git/hub/slow_couchdb (master)> docs=50
>> batch=5 ./bench.sh small_doc.tpl
>> Server: CouchDB/1.2.0a-a68a792-git (Erlang OTP/R14B03)
>> {"couchdb":"Welcome","version":"1.2.0a-a68a792-git"}
>>
>> [INFO] Created DB named `db1'
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> Building view.
>> {"total_rows":50,"offset":0,"rows":[
>> {"id":"doc1","key":1,"value":1}
>> ]}
>>
>> real    0m56.241s
>> user    0m0.006s
>> sys     0m0.005s
>>
>>
>> After COUCHDB-1186
>>
>> fdmanana 17:02:02 ~/git/hub/slow_couchdb (master)> docs=50
>> batch=5 ./bench.sh small_doc.tpl
>> Server: CouchDB/1.2.0a-f023052-git (Erlang OTP/R14B03)
>> {"couchdb":"Welcome","version":"1.2.0a-f023052-git"}
>>
>> [INFO] Created DB named `db1'
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> [INFO] Uploaded 5 documents via _bulk_docs
>> Building view.
>> {"total_rows":50,"offset":0,"rows":[
>> {"id":"doc1","key":1,"value":1}
>> ]}
>>
>> real    1m11.694s
>> user    0m0.006s
>> sys     0m0.005s
>> fdmanana 17:06:01 ~/git/hub/slow_couchdb (master)>
>>
>>
>> 1.2.0a-f023052-git with patch
>> http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w  applied on top
>>
>> fdmanana 17:06:53 ~/git/hub/slow_couchdb (master)> docs

Re: Managing Git identities?

2012-02-28 Thread Jason Smith
On Tue, Feb 28, 2012 at 5:02 AM, Robert Newson  wrote:
> for my part, I don't set user.email in my global .gitconfig because
> I've often committed with the wrong address. Leaving it undefined then
> gives you a warning when you commit. I then set the right local value
> and --amend --reset-author. Pretty sure our apache repo insists on
> apache.org addresses too.


Whoa.


Adopted. Thanks.

-- 
Iris Couch