On 15/05/2009 4:47 AM, Brian Candler wrote:
On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
(1) people who are storing large documents in CouchDB but not indexing them
at all (I guess this is possible, e.g. if the doc ids are well-known or
stored in other documents, but this isn't the most common way of working)

The proposal would exclude a document from *all* views in a particular
design doc. So you're only going to get a benefit from this if you have a
large number of documents (or a number of large documents) which are not
required to be indexed in any view in that design doc.

Yep - and that is the point. Consider Jan's example, where it was filtering on doc['type']. If a database had (say) 10 potential values of 'type', then all filters that only care about a single type will only care about 1 in 10 of those documents.

Taking this to its extreme, we tested Jan's patch on a view which matches very few document in a large database. Rebuilding that view with a filter was 18 times faster than without the filter. We put this down to the fact the filter managed to avoid the json encode/decode step for the vast majority of the docs in the database. IOW, on my test database, 6 minutes is spent before the filters can actually do anything (ie, that is just the json processing), whereas using the filter to avoid that json step brings it down to 20 seconds.

So while not everyone will be able to see such significant speedups, many may find it extremely useful.

And it's reasonable, given that (as I understand it) each document is
already only passed once to the view server, in order to be indexed by all
the views in that design document.

I agree there is lots that can and should be done to speed up views that do indeed care about most of the docs - such views spend less time relatively in the json encode step and more time in the interpreter. As an experiment, I "ported" one of our views that does look at most of the docs from javascript to erlangview, and the performance increase was far more modest (20% maybe). I suspect the javascript interpreter is faster than erlang, so I suspect that there will be a level of view complexity where using javascript *increases* view performance over erlang, even when factoring in the json processing...

Cheers,

Mark

Reply via email to