Re: View Filter
Just 2 cents On Fri, May 15, 2009 at 09:38, Brian Candler wrote: [cut] > It might be possible to make the feature more general though. For example, > suppose each view had its own filter, and the erlang server took the *union* > of those filters to work out which documents to send. Then, when sending a > document, it sent a list of which views to process it with. This could be > used to simplify the view code by removing the doc.type test, whilst getting > the performance benefit automatically. +1. I think this optimization is not premature, and need for it emerged from experiences of many. It seems sane to limit the number of docs sent to the view server and speed up the calculation of view indexes this way. > Example: > > views:{ > view1:{ > filter:[{type:"foo"}], > map:... > } > view2:{ > filter:[{type:"foo"},{type:"bar"}], > map:... > } > } > > When a document of type foo is sent, it would be sent to the view engine > with a list ["view1","view2"] of the views to be invoked on it. A document > of type bar would have ["view2"]. A document of type baz would not be sent > at all. > > But maybe this is too complicated, and going further down this route ends up > with an erlang view server anyway. +1 ;-] cheers, Wojtek
Re: View Filter
On 14 May 2009, at 16:53, Zachary Zolton wrote: Please reiterate, Brian. I'm not quite sure what you're getting at here: (1) people who are storing large documents in CouchDB but not indexing them at all (I guess this is possible, e.g. if the doc ids are well-known or stored in other documents, but this isn't the most common way of working) I do agree, though, that only being able to filter at the design doc level limits the utility of view filtering. Given that a design doc is supposed to be an application's "view" of the database, would we want to encourage folks to make a different design doc for each type of data they store in the database? No! :) The patch is only meant for "post-application". I.e. you create views and design docs as it makes sense for your application and then if, and only if, you end up in a situation where this patch gives you speed, you use it. I don't want to encourage a "one view per design doc" setup. My gut says "one design doc per application" —but I could be all mixed up! Your gut is correct ;) But some apps might need more than one design doc. Cheers Jan -- Cheers, Zach On Thu, May 14, 2009 at 2:43 AM, Brian Candler wrote: On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote: So, this sounds like a big win for those who like to store many document types in the same database with a "type descriminator" field. ... but only if all views in the same design doc are filtered by the same set of types. That is, you can only use it to exclude documents which are not used by *any* view. Therefore the benefit is for: (1) people who are storing large documents in CouchDB but not indexing them at all (I guess this is possible, e.g. if the doc ids are well- known or stored in other documents, but this isn't the most common way of working) (2) people who have a separate design document for each "type" of object. They would most likely get the same or better performance benefit by having a single design document with all their views. I also think there are other pinch-points in view generation which need working on, although perhaps they are not as quick wins as this one. For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can insert a set of 1300 documents in ~2 secs using _bulk_docs. However the first view request (generating ~6000 keys) takes around 35 seconds to respond. Regards, Brian.
Re: View Filter
On 15 May 2009, at 09:38, Brian Candler wrote: On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote: The proposal would exclude a document from *all* views in a particular design doc. So you're only going to get a benefit from this if you have a large number of documents (or a number of large documents) which are not required to be indexed in any view in that design doc. Yep - and that is the point. Consider Jan's example, where it was filtering on doc['type']. If a database had (say) 10 potential values of 'type', then all filters that only care about a single type will only care about 1 in 10 of those documents. Sure, as long as *none* of the views in that design document care about a significant proportion of the documents. It's unusual that people will have docs which are completely unindexed, so I think this patch mainly helps in the case where the user has 10 separate design documents, each of which is only interested in documents of one type. Of course, that's a perfectly legitimate way of using CouchDB, and I don't oppose this change at all. It might be possible to make the feature more general though. For example, suppose each view had its own filter, and the erlang server took the *union* of those filters to work out which documents to send. Then, when sending a document, it sent a list of which views to process it with. This could be used to simplify the view code by removing the doc.type test, whilst getting the performance benefit automatically. Like I said in the original mail. This wouldn't be possible without a major rewrite of the view serverand I'd rather not do that in the light of other, more important changes. Cheers Jan -- Example: views:{ view1:{ filter:[{type:"foo"}], map:... } view2:{ filter:[{type:"foo"},{type:"bar"}], map:... } } When a document of type foo is sent, it would be sent to the view engine with a list ["view1","view2"] of the views to be invoked on it. A document of type bar would have ["view2"]. A document of type baz would not be sent at all. But maybe this is too complicated, and going further down this route ends up with an erlang view server anyway. Taking this to its extreme, we tested Jan's patch on a view which matches very few document in a large database. Rebuilding that view with a filter was 18 times faster than without the filter. We put this down to the fact the filter managed to avoid the json encode/decode step for the vast majority of the docs in the database. You also avoided sending the docs over the socket and waiting for the response. So maybe latency is also part of the problem. Depends whether the view server interface does any sort of pipelining of requests. Regards, Brian.
Re: View Filter
Jan Lehnardt wrote: Hi, [..] I wrote a patch* that introduces the concept of a view filter. A new design doc option acts as a document filter and prevents a doc from getting serialized and sent to the view server. This is useful to avoid unnecessary computation when using views that use the `if(doc.type == "foo") {…` pattern. [..] Nice patch. I would love to see that change in CouchDB, since some of my CouchDB based projects really could use that. I keep my views in different disign document for maintainability reasons and thus would benefit from it *a lot*. :) Kind regards, Kore
Re: View Filter
On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote: >> The proposal would exclude a document from *all* views in a particular >> design doc. So you're only going to get a benefit from this if you have a >> large number of documents (or a number of large documents) which are not >> required to be indexed in any view in that design doc. > > Yep - and that is the point. Consider Jan's example, where it was > filtering on doc['type']. If a database had (say) 10 potential values > of 'type', then all filters that only care about a single type will only > care about 1 in 10 of those documents. Sure, as long as *none* of the views in that design document care about a significant proportion of the documents. It's unusual that people will have docs which are completely unindexed, so I think this patch mainly helps in the case where the user has 10 separate design documents, each of which is only interested in documents of one type. Of course, that's a perfectly legitimate way of using CouchDB, and I don't oppose this change at all. It might be possible to make the feature more general though. For example, suppose each view had its own filter, and the erlang server took the *union* of those filters to work out which documents to send. Then, when sending a document, it sent a list of which views to process it with. This could be used to simplify the view code by removing the doc.type test, whilst getting the performance benefit automatically. Example: views:{ view1:{ filter:[{type:"foo"}], map:... } view2:{ filter:[{type:"foo"},{type:"bar"}], map:... } } When a document of type foo is sent, it would be sent to the view engine with a list ["view1","view2"] of the views to be invoked on it. A document of type bar would have ["view2"]. A document of type baz would not be sent at all. But maybe this is too complicated, and going further down this route ends up with an erlang view server anyway. > Taking this to its extreme, we tested Jan's patch on a view which > matches very few document in a large database. Rebuilding that view > with a filter was 18 times faster than without the filter. We put this > down to the fact the filter managed to avoid the json encode/decode step > for the vast majority of the docs in the database. You also avoided sending the docs over the socket and waiting for the response. So maybe latency is also part of the problem. Depends whether the view server interface does any sort of pipelining of requests. Regards, Brian.
Re: View Filter
Drat... I actually may just came from place where knowing how to keep my doc types in separate databases —and being able to speed up the map-reduce churn of querying a reduce-with-group query with view filters— would have save me a TON of work! Urgh... At worst, I'll put it in my blog... :^( On Thu, May 14, 2009 at 8:25 PM, Mark Hammond wrote: > On 15/05/2009 4:47 AM, Brian Candler wrote: >> >> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote: >>> >>> (1) people who are storing large documents in CouchDB but not indexing >>> them >>> at all (I guess this is possible, e.g. if the doc ids are well-known or >>> stored in other documents, but this isn't the most common way of working) >> >> The proposal would exclude a document from *all* views in a particular >> design doc. So you're only going to get a benefit from this if you have a >> large number of documents (or a number of large documents) which are not >> required to be indexed in any view in that design doc. > > Yep - and that is the point. Consider Jan's example, where it was filtering > on doc['type']. If a database had (say) 10 potential values of 'type', then > all filters that only care about a single type will only care about 1 in 10 > of those documents. > > Taking this to its extreme, we tested Jan's patch on a view which matches > very few document in a large database. Rebuilding that view with a filter > was 18 times faster than without the filter. We put this down to the fact > the filter managed to avoid the json encode/decode step for the vast > majority of the docs in the database. IOW, on my test database, 6 minutes > is spent before the filters can actually do anything (ie, that is just the > json processing), whereas using the filter to avoid that json step brings it > down to 20 seconds. > > So while not everyone will be able to see such significant speedups, many > may find it extremely useful. > >> And it's reasonable, given that (as I understand it) each document is >> already only passed once to the view server, in order to be indexed by all >> the views in that design document. > > I agree there is lots that can and should be done to speed up views that do > indeed care about most of the docs - such views spend less time relatively > in the json encode step and more time in the interpreter. As an experiment, > I "ported" one of our views that does look at most of the docs from > javascript to erlangview, and the performance increase was far more modest > (20% maybe). I suspect the javascript interpreter is faster than erlang, so > I suspect that there will be a level of view complexity where using > javascript *increases* view performance over erlang, even when factoring in > the json processing... > > Cheers, > > Mark >
Re: View Filter
On 15/05/2009 4:47 AM, Brian Candler wrote: On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote: (1) people who are storing large documents in CouchDB but not indexing them at all (I guess this is possible, e.g. if the doc ids are well-known or stored in other documents, but this isn't the most common way of working) The proposal would exclude a document from *all* views in a particular design doc. So you're only going to get a benefit from this if you have a large number of documents (or a number of large documents) which are not required to be indexed in any view in that design doc. Yep - and that is the point. Consider Jan's example, where it was filtering on doc['type']. If a database had (say) 10 potential values of 'type', then all filters that only care about a single type will only care about 1 in 10 of those documents. Taking this to its extreme, we tested Jan's patch on a view which matches very few document in a large database. Rebuilding that view with a filter was 18 times faster than without the filter. We put this down to the fact the filter managed to avoid the json encode/decode step for the vast majority of the docs in the database. IOW, on my test database, 6 minutes is spent before the filters can actually do anything (ie, that is just the json processing), whereas using the filter to avoid that json step brings it down to 20 seconds. So while not everyone will be able to see such significant speedups, many may find it extremely useful. And it's reasonable, given that (as I understand it) each document is already only passed once to the view server, in order to be indexed by all the views in that design document. I agree there is lots that can and should be done to speed up views that do indeed care about most of the docs - such views spend less time relatively in the json encode step and more time in the interpreter. As an experiment, I "ported" one of our views that does look at most of the docs from javascript to erlangview, and the performance increase was far more modest (20% maybe). I suspect the javascript interpreter is faster than erlang, so I suspect that there will be a level of view complexity where using javascript *increases* view performance over erlang, even when factoring in the json processing... Cheers, Mark
Re: View Filter
Moreover, many of my attempts to have different types of docs in one database (for joins, etc) have ended up with my moving them into separate databases. It's been pretty easy (most of the time) to do that work in my Ruby code!
Re: View Filter
On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote: > (1) people who are storing large documents in CouchDB but not indexing them > at all (I guess this is possible, e.g. if the doc ids are well-known or > stored in other documents, but this isn't the most common way of working) The proposal would exclude a document from *all* views in a particular design doc. So you're only going to get a benefit from this if you have a large number of documents (or a number of large documents) which are not required to be indexed in any view in that design doc. > I do agree, though, that only being able to filter at the design doc > level limits the utility of view filtering. And it's reasonable, given that (as I understand it) each document is already only passed once to the view server, in order to be indexed by all the views in that design document. > Given that a design doc is > supposed to be an application's "view" of the database, would we want > to encourage folks to make a different design doc for each type of > data they store in the database? My gut says "one design doc per > application" —but I could be all mixed up! I have been ending up with views which run across *all* the documents in a database - for example, a generic "search" box which lets the user type in a search term and hit any matching type of object. Having a single design document holding all my views means that each document only needs to be sent once to the view server. Regards, Brian.
Re: View Filter
Please reiterate, Brian. I'm not quite sure what you're getting at here: (1) people who are storing large documents in CouchDB but not indexing them at all (I guess this is possible, e.g. if the doc ids are well-known or stored in other documents, but this isn't the most common way of working) I do agree, though, that only being able to filter at the design doc level limits the utility of view filtering. Given that a design doc is supposed to be an application's "view" of the database, would we want to encourage folks to make a different design doc for each type of data they store in the database? My gut says "one design doc per application" —but I could be all mixed up! Cheers, Zach On Thu, May 14, 2009 at 2:43 AM, Brian Candler wrote: > On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote: >> So, this sounds like a big win for those who like to store many >> document types in the same database with a "type descriminator" field. > > ... but only if all views in the same design doc are filtered by the same > set of types. That is, you can only use it to exclude documents which are > not used by *any* view. Therefore the benefit is for: > > (1) people who are storing large documents in CouchDB but not indexing them > at all (I guess this is possible, e.g. if the doc ids are well-known or > stored in other documents, but this isn't the most common way of working) > > (2) people who have a separate design document for each "type" of object. > They would most likely get the same or better performance benefit by having > a single design document with all their views. > > I also think there are other pinch-points in view generation which need > working on, although perhaps they are not as quick wins as this one. > > For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can insert a set > of 1300 documents in ~2 secs using _bulk_docs. However the first view > request (generating ~6000 keys) takes around 35 seconds to respond. > > Regards, > > Brian. >
Re: View Filter
On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote: > So, this sounds like a big win for those who like to store many > document types in the same database with a "type descriminator" field. ... but only if all views in the same design doc are filtered by the same set of types. That is, you can only use it to exclude documents which are not used by *any* view. Therefore the benefit is for: (1) people who are storing large documents in CouchDB but not indexing them at all (I guess this is possible, e.g. if the doc ids are well-known or stored in other documents, but this isn't the most common way of working) (2) people who have a separate design document for each "type" of object. They would most likely get the same or better performance benefit by having a single design document with all their views. I also think there are other pinch-points in view generation which need working on, although perhaps they are not as quick wins as this one. For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can insert a set of 1300 documents in ~2 secs using _bulk_docs. However the first view request (generating ~6000 keys) takes around 35 seconds to respond. Regards, Brian.
Re: View Filter
On Wed, May 13, 2009 at 5:26 PM, kowsik wrote: > We had a thread a month or so ago about view server optimization where > this came up. By having document 'classes', it's possible to have > design views that only get applied to certain documents. > > While I like this, I can immediately see someone asking for AND > instead of the OR (implemented below) in the filter. The next > generalization will be not just top level attributes, but nested ones > too. > > So the optimization really is that when there are multiple view > functions in a single design document that all apply to the same > document class, we want to reject early and not have to invoke the map > function. > > One possible alternative is to have the filter implemented in the view > server, but invoked once for each design document (potentially with > multiple views). This allows the user to control what the filter > predicate looks like and can implement any scheme that s/he likes. > > Thoughts? > My guess is that most of the speedup of filtering comes from avoiding the serialization/deserialization dance, not the actual application of JavaScript methods. Ie, once the view server is computing the filter, its probably little to no savings. HTH, Paul > K. > > On Wed, May 13, 2009 at 7:41 AM, Zachary Zolton > wrote: >> So, this sounds like a big win for those who like to store many >> document types in the same database with a "type descriminator" field. >> >> It often takes me a a bit of thinking to decide whether or not to >> store different document in the same database. So, I suppose a feature >> like this would alleviate a possible negative side effect of that >> choice, by reducing the time it takes to filter out the different >> document types. Shooting from the hip, I'd say I like this feature! >> >> –Zach >> >> On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt wrote: >>> Hi, >>> >>> I made views faster! :) >>> >>> I wrote a patch* that introduces the concept of a view filter. >>> A new design doc option acts as a document filter and >>> prevents a doc from getting serialized and sent to the view >>> server. This is useful to avoid unnecessary computation >>> when using views that use the `if(doc.type == "foo") {…` >>> pattern. >>> >>> * >>> http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6f29984e10ac861c >>> >>> The filter works like this: >>> { >>> _id:"_design/test_foo", >>> language: "javascript", >>> options: { >>> filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :) >>> }, >>> views: { >>> all_docs: { // really, only all foo and bar docs >>> map: "function(doc) { emit(doc.integer, null); }" >>> } >>> } >>> }; >>> >>> If *any* of the `{field, value}`** objects match a *top level* field and >>> value in a document, it gets sent to the view server. >>> >>> ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`, >>> this patch works for you. >>> >>> A few notes: >>> >>> - It would be nice if we could extend this so… >>> No — I don't like to add any more bells an whistles to this as >>> eventually this will lead to pure Erlang views which we want >>> to get anyway. >>> >>> - In the light of other view server improvements, this might prove >>> to gain only marginal speed. >>> >>> - Can we have a filter per view, not a filter per design doc? — Not >>> without major reworking of the view server and with losing other >>> optimisations. >>> >>> I don't think I should just go and commit this without discussion. In >>> fact, I'd opt to only include this patch if there's demand. I'm happy >>> to maintain the patch outside of CouchDB for those who need that >>> speedup. >>> >>> Cheers >>> Jan >>> -- >>> >>> >> >
Re: View Filter
We had a thread a month or so ago about view server optimization where this came up. By having document 'classes', it's possible to have design views that only get applied to certain documents. While I like this, I can immediately see someone asking for AND instead of the OR (implemented below) in the filter. The next generalization will be not just top level attributes, but nested ones too. So the optimization really is that when there are multiple view functions in a single design document that all apply to the same document class, we want to reject early and not have to invoke the map function. One possible alternative is to have the filter implemented in the view server, but invoked once for each design document (potentially with multiple views). This allows the user to control what the filter predicate looks like and can implement any scheme that s/he likes. Thoughts? K. On Wed, May 13, 2009 at 7:41 AM, Zachary Zolton wrote: > So, this sounds like a big win for those who like to store many > document types in the same database with a "type descriminator" field. > > It often takes me a a bit of thinking to decide whether or not to > store different document in the same database. So, I suppose a feature > like this would alleviate a possible negative side effect of that > choice, by reducing the time it takes to filter out the different > document types. Shooting from the hip, I'd say I like this feature! > > –Zach > > On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt wrote: >> Hi, >> >> I made views faster! :) >> >> I wrote a patch* that introduces the concept of a view filter. >> A new design doc option acts as a document filter and >> prevents a doc from getting serialized and sent to the view >> server. This is useful to avoid unnecessary computation >> when using views that use the `if(doc.type == "foo") {…` >> pattern. >> >> * >> http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6f29984e10ac861c >> >> The filter works like this: >> { >> _id:"_design/test_foo", >> language: "javascript", >> options: { >> filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :) >> }, >> views: { >> all_docs: { // really, only all foo and bar docs >> map: "function(doc) { emit(doc.integer, null); }" >> } >> } >> }; >> >> If *any* of the `{field, value}`** objects match a *top level* field and >> value in a document, it gets sent to the view server. >> >> ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`, >> this patch works for you. >> >> A few notes: >> >> - It would be nice if we could extend this so… >> No — I don't like to add any more bells an whistles to this as >> eventually this will lead to pure Erlang views which we want >> to get anyway. >> >> - In the light of other view server improvements, this might prove >> to gain only marginal speed. >> >> - Can we have a filter per view, not a filter per design doc? — Not >> without major reworking of the view server and with losing other >> optimisations. >> >> I don't think I should just go and commit this without discussion. In >> fact, I'd opt to only include this patch if there's demand. I'm happy >> to maintain the patch outside of CouchDB for those who need that >> speedup. >> >> Cheers >> Jan >> -- >> >> >
Re: View Filter
So, this sounds like a big win for those who like to store many document types in the same database with a "type descriminator" field. It often takes me a a bit of thinking to decide whether or not to store different document in the same database. So, I suppose a feature like this would alleviate a possible negative side effect of that choice, by reducing the time it takes to filter out the different document types. Shooting from the hip, I'd say I like this feature! –Zach On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt wrote: > Hi, > > I made views faster! :) > > I wrote a patch* that introduces the concept of a view filter. > A new design doc option acts as a document filter and > prevents a doc from getting serialized and sent to the view > server. This is useful to avoid unnecessary computation > when using views that use the `if(doc.type == "foo") {…` > pattern. > > * > http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6f29984e10ac861c > > The filter works like this: > { > _id:"_design/test_foo", > language: "javascript", > options: { > filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :) > }, > views: { > all_docs: { // really, only all foo and bar docs > map: "function(doc) { emit(doc.integer, null); }" > } > } > }; > > If *any* of the `{field, value}`** objects match a *top level* field and > value in a document, it gets sent to the view server. > > ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`, > this patch works for you. > > A few notes: > > - It would be nice if we could extend this so… > No — I don't like to add any more bells an whistles to this as > eventually this will lead to pure Erlang views which we want > to get anyway. > > - In the light of other view server improvements, this might prove > to gain only marginal speed. > > - Can we have a filter per view, not a filter per design doc? — Not > without major reworking of the view server and with losing other > optimisations. > > I don't think I should just go and commit this without discussion. In > fact, I'd opt to only include this patch if there's demand. I'm happy > to maintain the patch outside of CouchDB for those who need that > speedup. > > Cheers > Jan > -- > >
View Filter
Hi, I made views faster! :) I wrote a patch* that introduces the concept of a view filter. A new design doc option acts as a document filter and prevents a doc from getting serialized and sent to the view server. This is useful to avoid unnecessary computation when using views that use the `if(doc.type == "foo") {…` pattern. * http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6f29984e10ac861c The filter works like this: { _id:"_design/test_foo", language: "javascript", options: { filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :) }, views: { all_docs: { // really, only all foo and bar docs map: "function(doc) { emit(doc.integer, null); }" } } }; If *any* of the `{field, value}`** objects match a *top level* field and value in a document, it gets sent to the view server. ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`, this patch works for you. A few notes: - It would be nice if we could extend this so… No — I don't like to add any more bells an whistles to this as eventually this will lead to pure Erlang views which we want to get anyway. - In the light of other view server improvements, this might prove to gain only marginal speed. - Can we have a filter per view, not a filter per design doc? — Not without major reworking of the view server and with losing other optimisations. I don't think I should just go and commit this without discussion. In fact, I'd opt to only include this patch if there's demand. I'm happy to maintain the patch outside of CouchDB for those who need that speedup. Cheers Jan --