Drat... I actually may just came from place where knowing how to keep my doc types in separate databases —and being able to speed up the map-reduce churn of querying a reduce-with-group query with view filters— would have save me a TON of work!
Urgh... At worst, I'll put it in my blog... :^( On Thu, May 14, 2009 at 8:25 PM, Mark Hammond <skippy.hamm...@gmail.com> wrote: > On 15/05/2009 4:47 AM, Brian Candler wrote: >> >> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote: >>> >>> (1) people who are storing large documents in CouchDB but not indexing >>> them >>> at all (I guess this is possible, e.g. if the doc ids are well-known or >>> stored in other documents, but this isn't the most common way of working) >> >> The proposal would exclude a document from *all* views in a particular >> design doc. So you're only going to get a benefit from this if you have a >> large number of documents (or a number of large documents) which are not >> required to be indexed in any view in that design doc. > > Yep - and that is the point. Consider Jan's example, where it was filtering > on doc['type']. If a database had (say) 10 potential values of 'type', then > all filters that only care about a single type will only care about 1 in 10 > of those documents. > > Taking this to its extreme, we tested Jan's patch on a view which matches > very few document in a large database. Rebuilding that view with a filter > was 18 times faster than without the filter. We put this down to the fact > the filter managed to avoid the json encode/decode step for the vast > majority of the docs in the database. IOW, on my test database, 6 minutes > is spent before the filters can actually do anything (ie, that is just the > json processing), whereas using the filter to avoid that json step brings it > down to 20 seconds. > > So while not everyone will be able to see such significant speedups, many > may find it extremely useful. > >> And it's reasonable, given that (as I understand it) each document is >> already only passed once to the view server, in order to be indexed by all >> the views in that design document. > > I agree there is lots that can and should be done to speed up views that do > indeed care about most of the docs - such views spend less time relatively > in the json encode step and more time in the interpreter. As an experiment, > I "ported" one of our views that does look at most of the docs from > javascript to erlangview, and the performance increase was far more modest > (20% maybe). I suspect the javascript interpreter is faster than erlang, so > I suspect that there will be a level of view complexity where using > javascript *increases* view performance over erlang, even when factoring in > the json processing... > > Cheers, > > Mark >