[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-07 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-495:
--

Attachment: numbers-davisp.txt
perf.py
R13B1-uca-bif.patch

I implemented a BIF for the UCA algorithm to see how much that would give us in 
terms of a speed boost. As it turns out, not a whole lot. The numbers I 
collected are attached as a file.

The three tests I did were for the current method that uses couch_erl_driver, 
using the UCA collation bif, and just using < for binaries (all three without 
changing anything else about less_json).

That is all.

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
> numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca-bif.patch, 
> term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-07 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-495:
--

Attachment: (was: perf.py)

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
> outputv.patch, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-06 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-495:
--

Attachment: perf.py

A new perf test script. Uses couchdbkit because couchdb-python is broken.

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
> outputv.patch, perf.py, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-495:
--

Attachment: less_json.patch

An optimization for less_json to minimize the number of calls required. Uses 
traditional -1, 0, -1 internally so we don't have to call "A < B ? -1 : (B < A 
? 1 : 0)"

Improved runtime on the perf test from 14 minutes -> ~12

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
> outputv.patch, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-495:
--

Attachment: outputv.patch

Damien suggested trying to use the outputv driver callback instead of the 
control callback to avoid copying binaries in and out of the VM. So I did. Its 
made things dog slow.

Hand wavy explanation, maybe too many strings are < 64bytes and thus encause 
memcpy wrath? No idea.

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, outputv.patch, 
> term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-495:


Attachment: couch_perf.py

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, couch_perf.py, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Chris Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Anderson updated COUCHDB-495:
---

Attachment: term_collate.diff

This patch (term_collate.diff) gives an option to use raw Erlang term 
comparisons for the entire collation function. It is much faster than ICU 
collation (about 2x) and removes the single largest view generation bottleneck. 
It significantly changes view collation behavior (when the option is used) but 
that is to be expected.

However, it is not very clean, and it is incomplete. I've noted most of the 
shortcomings in the ticket description so I won't restate them.

If you plan to work on the bug, I'd suggest you NOT start by applying this 
patch, but rather, read and understand it, so you can see why I suggest the 
refactorings listed in the ticket. This patch is basically a dead end, but a 
good learning experience...

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Chris Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Anderson updated COUCHDB-495:
---

Attachment: binary_collate.diff

This patch (binary_collate.diff) provides an option to skip ICU for string 
comparisons in view collation. Instead you can use "raw" collation which just 
compares strings using Erlang's built in < operator. The rest of JSON collation 
is untouched by the patch.

It's decently clean, and works. It's a bit incomplete (only 100% implemented 
for temp_views) but what matters is that it's not much faster. So there's no 
point in proceeding down this road.

Attached for illustration purposes.

> Make views twice as fast
> 
>
> Key: COUCHDB-495
> URL: https://issues.apache.org/jira/browse/COUCHDB-495
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Reporter: Chris Anderson
> Fix For: 0.11
>
> Attachments: binary_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.