Re: multiview on github
On Tue, Sep 21, 2010 at 18:32, Robert Newson wrote: > "Becoming a committer is as easy as writing enough accepted patches > that everyone gets tired of applying them for you." == awesome So. I think you're safe, Bob. > > So it's not because we're awesome? I'm crushed. > > B.
Re: multiview on github
"Becoming a committer is as easy as writing enough accepted patches that everyone gets tired of applying them for you." So it's not because we're awesome? I'm crushed. B. On Tue, Sep 21, 2010 at 5:19 PM, Paul Davis wrote: >> 1) How do you get a row count with a view for a startkey and endkey >> that would solve one of my problems? > > Looks like we don't have an API for it yet, but the basic idea is that > you run a reduce with the given query parameters to get this info. In > all views there's a built-in reduce function that does row counting, > so its just a matter of exposing an API to query this. There use to be > an example in couch_db.erl that did this with just a startkey for > enum_docs_since but it appears to have changed to be more complicated > for _changes. > >> 2) How do you test for document id inclusion in the results of a view? > > How do you mean? I'm proposing the bloom filter method which is just a > constant-space set data-structure that can be used to test for > existence of a key. The first draft implementation would just stream a > query to build a bloom filter for each query. > > >> >> fti and spatial code is only called if the query asks for it, I will >> look into this. > > I'm not sure on how best to handle this, I just know that I really > don't like seeing spatial/fti specific code in trunk when the spatial > and fti code is not. > >> >> ok, it is really unclear in couchdb when to use supervisor, >> gen_servers, I wrote multiview as a gen_server since I thought it >> similar to an EJB and encapsulated unit of work that I wanted to >> delegate tasks to and not hog the HTTP process. >> >> Saying that if couch_query_rings use gen_server delegates as you >> recommend below then that will achieve that goal. > > Its a bit complicated and end the end comes down to just having the > experience. Though its important to remember that Erlang processes are > extremely lightweight. Doing operations directly in the HTTP request > processes is fine because each request has its own process (well, > keep-alive requests re-use the process, but that's orthogonal). > > Whether or not the ring uses a gen_server the idea was just to > abstract the different query nodes in the ring as a Pid which should > make the code cleaner and easier to understand as well as allow for > the other query types to be added in dynamically. > >> >> plugins would be good, but honestly it isn't hard to change local.ini, >> With the multiview I would rather see focus on external >> http_db_handlers such as FTI and getting them streaming the results >> rather than having to write a complete result on one stdio line. >> >> I would like this is trunk mainly because I want to hack on trunk and >> to do that I need to be a committer :-) Plugins work fine. > > When I say plugins, I'm generally just referring to formalizing how > external code should integrate with CouchDB. Ie, making use of > default.d instead of editing default.ini or local.ini directly. > > As to updating the external API, there was some talk at CouchCamp on > changing the current system to allow a bit more flexibility to this by > giving couchdb a reverse proxy system for externals instead of using a > stdio protocol. If we did that, then multiview could just define a > simple api that various external indexers could choose to support. And > the same would work for internal indexers as well. > > Becoming a committer is as easy as writing enough accepted patches > that everyone gets tired of applying them for you. We're always > looking for more help. > > HTH, > Paul Davis >
Re: multiview on github
> 1) How do you get a row count with a view for a startkey and endkey > that would solve one of my problems? Looks like we don't have an API for it yet, but the basic idea is that you run a reduce with the given query parameters to get this info. In all views there's a built-in reduce function that does row counting, so its just a matter of exposing an API to query this. There use to be an example in couch_db.erl that did this with just a startkey for enum_docs_since but it appears to have changed to be more complicated for _changes. > 2) How do you test for document id inclusion in the results of a view? How do you mean? I'm proposing the bloom filter method which is just a constant-space set data-structure that can be used to test for existence of a key. The first draft implementation would just stream a query to build a bloom filter for each query. > > fti and spatial code is only called if the query asks for it, I will > look into this. I'm not sure on how best to handle this, I just know that I really don't like seeing spatial/fti specific code in trunk when the spatial and fti code is not. > > ok, it is really unclear in couchdb when to use supervisor, > gen_servers, I wrote multiview as a gen_server since I thought it > similar to an EJB and encapsulated unit of work that I wanted to > delegate tasks to and not hog the HTTP process. > > Saying that if couch_query_rings use gen_server delegates as you > recommend below then that will achieve that goal. Its a bit complicated and end the end comes down to just having the experience. Though its important to remember that Erlang processes are extremely lightweight. Doing operations directly in the HTTP request processes is fine because each request has its own process (well, keep-alive requests re-use the process, but that's orthogonal). Whether or not the ring uses a gen_server the idea was just to abstract the different query nodes in the ring as a Pid which should make the code cleaner and easier to understand as well as allow for the other query types to be added in dynamically. > > plugins would be good, but honestly it isn't hard to change local.ini, > With the multiview I would rather see focus on external > http_db_handlers such as FTI and getting them streaming the results > rather than having to write a complete result on one stdio line. > > I would like this is trunk mainly because I want to hack on trunk and > to do that I need to be a committer :-) Plugins work fine. When I say plugins, I'm generally just referring to formalizing how external code should integrate with CouchDB. Ie, making use of default.d instead of editing default.ini or local.ini directly. As to updating the external API, there was some talk at CouchCamp on changing the current system to allow a bit more flexibility to this by giving couchdb a reverse proxy system for externals instead of using a stdio protocol. If we did that, then multiview could just define a simple api that various external indexers could choose to support. And the same would work for internal indexers as well. Becoming a committer is as easy as writing enough accepted patches that everyone gets tired of applying them for you. We're always looking for more help. HTH, Paul Davis
Re: multiview on github
Paul, fantastic, thanks for the feedback and you aren't b*tching this is what I wanted, comments inline. On Tue, Sep 21, 2010 at 9:37 AM, Paul Davis wrote: > Norman, > > Sorry its taken me so long to review this code. In its current form I > would have to -1 adding the current implementation to trunk for a > couple reasons. I'm roughly +0 on the general outline of the algorithm > for future inclusion, but I'll discuss that below. > > The biggest issue that jumps out is that its unbounded in its use of > memory. If I'm reading this code correctly, each view/spatial/fti > query grabs its entire list of document id's and creates a record that > stores this list. Then you create a ring of processes that then copies > these lists possibly multiple times and in the worst way as the larger > the list, the more times its copied. Then inside the ring the queries > are being re-run for each test of an id being present which is > confuses me because they could be using the list of id's that were > calculated during the calls to multiview:query_view_count/3. Granted I > could be reading this wrong, but its a bit hard to follow in places. > Also, at least for fti and views, you don't actually need to enumerate > the entire thing to get a row count as they can both report a count > efficiently. I'm not sure about spatial, but even if it can't yet, I > would imagine it could be implemented. > only with external is the entire list being held in memory, the code streams the results, at most one id in memory at any one time. 1) How do you get a row count with a view for a startkey and endkey that would solve one of my problems? 2) How do you test for document id inclusion in the results of a view? > And now for a list of nits about mechanics: > > The source code for this patch is completely unlike anything else in > CouchDB. There are lots of differences that add up to make this alone > reason to prevent it from entering into trunk: > > The file headers in source files should be removed and replaced with > ASF license headers. > Source files must be less than eighty columns wide. > You've accidentally committed local_dev.ini and etc/init/couchdb. Consider that done, I will add this on the next commit. > I'd like to see more tests in the futon tests. ok > If this ends up in trunk, it will not be able to depend on the spatial > and fti handlers existing if they're not also in trunk. This might be > solvable with an abstraction that can be dynamically added if they're > present. fti and spatial code is only called if the query asks for it, I will look into this. > AFAICT, error reporting doesn't seem to exist, and it looks like > there's a lot of new surface area for generating errors. > The supervisor/gen_server pattern that's going on here doesn't appear > to have a reason. As in, I can't see a reason the gen_server even > needs to exist. Just make the multiview:query call from the HTTP > process. ok, it is really unclear in couchdb when to use supervisor, gen_servers, I wrote multiview as a gen_server since I thought it similar to an EJB and encapsulated unit of work that I wanted to delegate tasks to and not hog the HTTP process. Saying that if couch_query_rings use gen_server delegates as you recommend below then that will achieve that goal. > In Erlang, the term Node generally refers to a remote VM. Using the > variable Node in your query ring code confused me greatly until I > realized it was just pid's. ok, I will change this > You should generally avoid raw message passing in Erlang. Using a > gen_server for each of the different ring members depending on query > type would be more appropriate. ok, I will mod this as well. > If you are using gen_servers, you should fill out more of the > callbacks to do meaningful things, ie, logging and/or dying on > unexpected messages. Silently ignoring that sort of thing can lead to > very hard to track bugs. ok > Module names should be prefixed with couchdb_ if they're going into > trunk as part of couchdb. > I'm not sure I like the generous use of pmap and friends. I understand > that it'd ideally reduce latency, but at the burden of reducing a > node's ability to handle concurrency. Not sure on the best solution to > this though. > In the couple places that have the big case statements for handling > each type of view query, I'd transform those into functions to make > things easier to follow. thanks, I am all for code clarity, thanks for the feedback. > Support for view parameters is limited to startkey and endkey. At the > very least, start_docid and end_docid should be supported. The other > various parameters affecting collation should also probably be > supported. limit and count would be nice. I'm sure there are probably > others too, but there are also ones that probably don't need to be > included. > How should reduces be handled, if at all? I don't see them being > handled now, but I can assure you that people will want some sort of > support
Re: multiview on github
Norman, Sorry its taken me so long to review this code. In its current form I would have to -1 adding the current implementation to trunk for a couple reasons. I'm roughly +0 on the general outline of the algorithm for future inclusion, but I'll discuss that below. The biggest issue that jumps out is that its unbounded in its use of memory. If I'm reading this code correctly, each view/spatial/fti query grabs its entire list of document id's and creates a record that stores this list. Then you create a ring of processes that then copies these lists possibly multiple times and in the worst way as the larger the list, the more times its copied. Then inside the ring the queries are being re-run for each test of an id being present which is confuses me because they could be using the list of id's that were calculated during the calls to multiview:query_view_count/3. Granted I could be reading this wrong, but its a bit hard to follow in places. Also, at least for fti and views, you don't actually need to enumerate the entire thing to get a row count as they can both report a count efficiently. I'm not sure about spatial, but even if it can't yet, I would imagine it could be implemented. And now for a list of nits about mechanics: The source code for this patch is completely unlike anything else in CouchDB. There are lots of differences that add up to make this alone reason to prevent it from entering into trunk: The file headers in source files should be removed and replaced with ASF license headers. Source files must be less than eighty columns wide. You've accidentally committed local_dev.ini and etc/init/couchdb. I'd like to see more tests in the futon tests. If this ends up in trunk, it will not be able to depend on the spatial and fti handlers existing if they're not also in trunk. This might be solvable with an abstraction that can be dynamically added if they're present. AFAICT, error reporting doesn't seem to exist, and it looks like there's a lot of new surface area for generating errors. The supervisor/gen_server pattern that's going on here doesn't appear to have a reason. As in, I can't see a reason the gen_server even needs to exist. Just make the multiview:query call from the HTTP process. In Erlang, the term Node generally refers to a remote VM. Using the variable Node in your query ring code confused me greatly until I realized it was just pid's. You should generally avoid raw message passing in Erlang. Using a gen_server for each of the different ring members depending on query type would be more appropriate. If you are using gen_servers, you should fill out more of the callbacks to do meaningful things, ie, logging and/or dying on unexpected messages. Silently ignoring that sort of thing can lead to very hard to track bugs. Module names should be prefixed with couchdb_ if they're going into trunk as part of couchdb. I'm not sure I like the generous use of pmap and friends. I understand that it'd ideally reduce latency, but at the burden of reducing a node's ability to handle concurrency. Not sure on the best solution to this though. In the couple places that have the big case statements for handling each type of view query, I'd transform those into functions to make things easier to follow. Support for view parameters is limited to startkey and endkey. At the very least, start_docid and end_docid should be supported. The other various parameters affecting collation should also probably be supported. limit and count would be nice. I'm sure there are probably others too, but there are also ones that probably don't need to be included. How should reduces be handled, if at all? I don't see them being handled now, but I can assure you that people will want some sort of support if this goes into trunk. Passing the view groups between processes does not seem like a good idea. I'd have to look back at the view_group code to double check that though. Now I'll stop bitching and tell you that there is actually some hope and I'm intrigued where this could go. The current algorithm structure you have is pretty interesting. I think with a couple improvements it would go along way. If I were to write this, I would start by cleaning up the row counts code to give a quicker response without iterating each query. Once you have the row counts, for every query except the largest, iterate over the output to generate a bloom filter of id's. contained in that view query. Then to send data to the client you just iterate over the largest query checking that the id is in each of the bloom filters. For a NIF version of Bloom filters, check here: http://github.com/basho/ebloom There's also a blog post by Jonathan Ellis from the Cassandra group that gives some pretty good details on Bloom filters: http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html Also there's probably a wikipedia page. This layout also gives the ability to perform future optimizations that would make things even more qui
Re: multiview on github
Bob, thanks, that is interesting. I will checkout your code and see if I can get it working, I wrote couchdb-clucene and am interested in a lightweight text search for couchdb. I also liked your work with ontylog, but I can't mix GPL with anything I am doing. Norman On Mon, Sep 20, 2010 at 7:22 PM, Robert Dionne wrote: > Norman, > > Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. > Think of it as the contents of my /etc directory. > > The indexer I'm chipping away at is just a proof of concept hacked up from > Joe Armstrong's Erlang book (with his permission). Anyone is welcome to use > it that as they see fit, though it does have restrictions from Armstrong > press. It's been great for me to learn erlang and explore the couch > internals. It's also nice to have something nice and light running in couch. > > My thoughts about plugins have nothing to do with licenses. I'd like the > fact that couchdb is simple and lean and more rock solid. I'm not sure > multiview, geocouch, fti, or any other indexers belong in the core. With > multiview I think there's perhaps something more general that might be part > of core but I haven't given it a lot of thought yet. > > Cheers, > > Bob > > > > > On Sep 20, 2010, at 7:02 PM, Norman Barker wrote: > >> Bob, >> >> I can see why plugins might work for you since your ontology / >> indexing code is GPL, however I am more than happy for the multiview >> to be apache licensed and would like to see it in trunk. >> >> I like the concept of plugins as it creates a stable API for third >> parties, but I think a multiview is a core feature of CouchDB. >> >> Norman >> >> On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne >> wrote: >>> I see, neat. >>> >>> I ask because you might treat disjunction and conjunction differently in >>> terms of whether you run around the ring or broadcast to all the nodes. For >>> conjunctions you need all to succeed so broadcast might fare better whereas >>> for disjunctions only one need succeed. I suppose it would depend largely >>> on the number of views and the amount of each computation. >>> >>> Anyway I guess I have mixed feelings about seeing this in core. I see a lot >>> of folks already struggling to get their arms around working with >>> map/reduce. It would make a good plugin for advanced users. Actually the >>> ability to have plugins is almost there now. I have an indexer that only >>> requires some ini file mods and getting the code on the classpath. I think >>> all that's needed at this point is: >>> >>> 1. conventions for a plugins directory >>> >>> 2. way of specing gen_servers in order to supervise them >>> >>> 3. some apis around some of the internals. >>> >>> I'm oversimplifying it for sure, the devils in the details and it's the >>> kind of thing programmers love to argue about ad nauseum but no one wants >>> to do it (myself included :) >>> >>> Best, >>> >>> Bob >>> >>> >>> >>> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote: >>> Bob, it is just checking that a given id participates in a view, if it makes it around the ring then it wins and gets streamed to the client, adding disjoints would be fairly simple. Currently the only way I can check if an id is in a view is to loop over the results of each view, hence each node in the ring is in its own process to keep things moving. A use case is two views, one that emits datetime (numeric) and another view that emits values, e.g. A, B, C ..., the query would then be to find the all documents with value A between start time and end time. Norman On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne wrote: > I took another peek at this and I'm curious as to what it's doing. Is it > just checking that a given id participates in a view? So if it makes it > around the ring it wins? Or is it actually computing the result of > passing the doc thru all the views? > > If the answer is the former then would disjunction also be something one > might want? I'm just curious, I don't have a use case and I forget the > original discussion around this. I sort of think of views as a functional > mapping from the database to some subset. That's not entirely accurate > given there's this reduce phase also. So I could imagine composing views > in a functional way, but the same thing can be had with just a different > map function that is the composition. > > Anyway if you have a brief description of this, with a use case, it > would help. > > Cheers, > > Bob > > > > > On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: > >> Chris, James >> >> thanks for bumping this, we are using this internally at 'scale' >> (million+ keys). I want this to work for couchdb as we want to give >> back for such a great product and support this going forward, so any
Re: multiview on github
Norman, Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. Think of it as the contents of my /etc directory. The indexer I'm chipping away at is just a proof of concept hacked up from Joe Armstrong's Erlang book (with his permission). Anyone is welcome to use it that as they see fit, though it does have restrictions from Armstrong press. It's been great for me to learn erlang and explore the couch internals. It's also nice to have something nice and light running in couch. My thoughts about plugins have nothing to do with licenses. I'd like the fact that couchdb is simple and lean and more rock solid. I'm not sure multiview, geocouch, fti, or any other indexers belong in the core. With multiview I think there's perhaps something more general that might be part of core but I haven't given it a lot of thought yet. Cheers, Bob On Sep 20, 2010, at 7:02 PM, Norman Barker wrote: > Bob, > > I can see why plugins might work for you since your ontology / > indexing code is GPL, however I am more than happy for the multiview > to be apache licensed and would like to see it in trunk. > > I like the concept of plugins as it creates a stable API for third > parties, but I think a multiview is a core feature of CouchDB. > > Norman > > On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne > wrote: >> I see, neat. >> >> I ask because you might treat disjunction and conjunction differently in >> terms of whether you run around the ring or broadcast to all the nodes. For >> conjunctions you need all to succeed so broadcast might fare better whereas >> for disjunctions only one need succeed. I suppose it would depend largely on >> the number of views and the amount of each computation. >> >> Anyway I guess I have mixed feelings about seeing this in core. I see a lot >> of folks already struggling to get their arms around working with >> map/reduce. It would make a good plugin for advanced users. Actually the >> ability to have plugins is almost there now. I have an indexer that only >> requires some ini file mods and getting the code on the classpath. I think >> all that's needed at this point is: >> >> 1. conventions for a plugins directory >> >> 2. way of specing gen_servers in order to supervise them >> >> 3. some apis around some of the internals. >> >> I'm oversimplifying it for sure, the devils in the details and it's the kind >> of thing programmers love to argue about ad nauseum but no one wants to do >> it (myself included :) >> >> Best, >> >> Bob >> >> >> >> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote: >> >>> Bob, >>> >>> it is just checking that a given id participates in a view, if it >>> makes it around the ring then it wins and gets streamed to the client, >>> adding disjoints would be fairly simple. Currently the only way I can >>> check if an id is in a view is to loop over the results of each view, >>> hence each node in the ring is in its own process to keep things >>> moving. >>> >>> A use case is two views, one that emits datetime (numeric) and another >>> view that emits values, e.g. A, B, C ..., the query would then be to >>> find the all documents with value A between start time and end time. >>> >>> Norman >>> >>> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne >>> wrote: I took another peek at this and I'm curious as to what it's doing. Is it just checking that a given id participates in a view? So if it makes it around the ring it wins? Or is it actually computing the result of passing the doc thru all the views? If the answer is the former then would disjunction also be something one might want? I'm just curious, I don't have a use case and I forget the original discussion around this. I sort of think of views as a functional mapping from the database to some subset. That's not entirely accurate given there's this reduce phase also. So I could imagine composing views in a functional way, but the same thing can be had with just a different map function that is the composition. Anyway if you have a brief description of this, with a use case, it would help. Cheers, Bob On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: > Chris, James > > thanks for bumping this, we are using this internally at 'scale' > (million+ keys). I want this to work for couchdb as we want to give > back for such a great product and support this going forward, so any > suggestions welcomed and we will test and add them to the local github > account with the aim of getting this into trunk. > > Norman > > On Fri, Sep 17, 2010 at 7:00 PM, James Hayton > wrote: >> I want to use it! I just haven't gotten around to it. I was going to >> try >> and test it out this weekend and if I am able, I will certainly report >> back >> what I find.
Re: multiview on github
Bob, I can see why plugins might work for you since your ontology / indexing code is GPL, however I am more than happy for the multiview to be apache licensed and would like to see it in trunk. I like the concept of plugins as it creates a stable API for third parties, but I think a multiview is a core feature of CouchDB. Norman On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne wrote: > I see, neat. > > I ask because you might treat disjunction and conjunction differently in > terms of whether you run around the ring or broadcast to all the nodes. For > conjunctions you need all to succeed so broadcast might fare better whereas > for disjunctions only one need succeed. I suppose it would depend largely on > the number of views and the amount of each computation. > > Anyway I guess I have mixed feelings about seeing this in core. I see a lot > of folks already struggling to get their arms around working with map/reduce. > It would make a good plugin for advanced users. Actually the ability to have > plugins is almost there now. I have an indexer that only requires some ini > file mods and getting the code on the classpath. I think all that's needed at > this point is: > > 1. conventions for a plugins directory > > 2. way of specing gen_servers in order to supervise them > > 3. some apis around some of the internals. > > I'm oversimplifying it for sure, the devils in the details and it's the kind > of thing programmers love to argue about ad nauseum but no one wants to do it > (myself included :) > > Best, > > Bob > > > > On Sep 19, 2010, at 10:22 AM, Norman Barker wrote: > >> Bob, >> >> it is just checking that a given id participates in a view, if it >> makes it around the ring then it wins and gets streamed to the client, >> adding disjoints would be fairly simple. Currently the only way I can >> check if an id is in a view is to loop over the results of each view, >> hence each node in the ring is in its own process to keep things >> moving. >> >> A use case is two views, one that emits datetime (numeric) and another >> view that emits values, e.g. A, B, C ..., the query would then be to >> find the all documents with value A between start time and end time. >> >> Norman >> >> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne >> wrote: >>> I took another peek at this and I'm curious as to what it's doing. Is it >>> just checking that a given id participates in a view? So if it makes it >>> around the ring it wins? Or is it actually computing the result of passing >>> the doc thru all the views? >>> >>> If the answer is the former then would disjunction also be something one >>> might want? I'm just curious, I don't have a use case and I forget the >>> original discussion around this. I sort of think of views as a functional >>> mapping from the database to some subset. That's not entirely accurate >>> given there's this reduce phase also. So I could imagine composing views in >>> a functional way, but the same thing can be had with just a different map >>> function that is the composition. >>> >>> Anyway if you have a brief description of this, with a use case, it would >>> help. >>> >>> Cheers, >>> >>> Bob >>> >>> >>> >>> >>> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: >>> Chris, James thanks for bumping this, we are using this internally at 'scale' (million+ keys). I want this to work for couchdb as we want to give back for such a great product and support this going forward, so any suggestions welcomed and we will test and add them to the local github account with the aim of getting this into trunk. Norman On Fri, Sep 17, 2010 at 7:00 PM, James Hayton wrote: > I want to use it! I just haven't gotten around to it. I was going to try > and test it out this weekend and if I am able, I will certainly report > back > what I find. > > James > > On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: > >> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker >> wrote: >>> Bob, >>> >>> I can and have been testing the multiview at this scale, it is ok >>> (fast enough), but I think being able to test inclusion of a document >>> id in a view without having to loop would be a considerable speed >>> improvement. If you have any ideas let me know. >>> >> >> I just want to bump this thread, as I think this is a useful feature. >> I don't expect to be able to test it in the coming weeks, but if I did >> I would. Is anyone besides Norman using this? Has anyone used it at >> scale? >> >> Cheers, >> Chris >> >>> thanks, >>> >>> Norman >>> >>> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson >>> >> wrote: I'm sorry, I've had no time to play with this at scale. On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker >> wrote: > Hi, > > are th
Re: multiview on github
I see, neat. I ask because you might treat disjunction and conjunction differently in terms of whether you run around the ring or broadcast to all the nodes. For conjunctions you need all to succeed so broadcast might fare better whereas for disjunctions only one need succeed. I suppose it would depend largely on the number of views and the amount of each computation. Anyway I guess I have mixed feelings about seeing this in core. I see a lot of folks already struggling to get their arms around working with map/reduce. It would make a good plugin for advanced users. Actually the ability to have plugins is almost there now. I have an indexer that only requires some ini file mods and getting the code on the classpath. I think all that's needed at this point is: 1. conventions for a plugins directory 2. way of specing gen_servers in order to supervise them 3. some apis around some of the internals. I'm oversimplifying it for sure, the devils in the details and it's the kind of thing programmers love to argue about ad nauseum but no one wants to do it (myself included :) Best, Bob On Sep 19, 2010, at 10:22 AM, Norman Barker wrote: > Bob, > > it is just checking that a given id participates in a view, if it > makes it around the ring then it wins and gets streamed to the client, > adding disjoints would be fairly simple. Currently the only way I can > check if an id is in a view is to loop over the results of each view, > hence each node in the ring is in its own process to keep things > moving. > > A use case is two views, one that emits datetime (numeric) and another > view that emits values, e.g. A, B, C ..., the query would then be to > find the all documents with value A between start time and end time. > > Norman > > On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne > wrote: >> I took another peek at this and I'm curious as to what it's doing. Is it >> just checking that a given id participates in a view? So if it makes it >> around the ring it wins? Or is it actually computing the result of passing >> the doc thru all the views? >> >> If the answer is the former then would disjunction also be something one >> might want? I'm just curious, I don't have a use case and I forget the >> original discussion around this. I sort of think of views as a functional >> mapping from the database to some subset. That's not entirely accurate given >> there's this reduce phase also. So I could imagine composing views in a >> functional way, but the same thing can be had with just a different map >> function that is the composition. >> >> Anyway if you have a brief description of this, with a use case, it would >> help. >> >> Cheers, >> >> Bob >> >> >> >> >> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: >> >>> Chris, James >>> >>> thanks for bumping this, we are using this internally at 'scale' >>> (million+ keys). I want this to work for couchdb as we want to give >>> back for such a great product and support this going forward, so any >>> suggestions welcomed and we will test and add them to the local github >>> account with the aim of getting this into trunk. >>> >>> Norman >>> >>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton >>> wrote: I want to use it! I just haven't gotten around to it. I was going to try and test it out this weekend and if I am able, I will certainly report back what I find. James On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: > On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker > wrote: >> Bob, >> >> I can and have been testing the multiview at this scale, it is ok >> (fast enough), but I think being able to test inclusion of a document >> id in a view without having to loop would be a considerable speed >> improvement. If you have any ideas let me know. >> > > I just want to bump this thread, as I think this is a useful feature. > I don't expect to be able to test it in the coming weeks, but if I did > I would. Is anyone besides Norman using this? Has anyone used it at > scale? > > Cheers, > Chris > >> thanks, >> >> Norman >> >> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson > wrote: >>> I'm sorry, I've had no time to play with this at scale. >>> >>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker > wrote: Hi, are there any more comments on this, if not can you describe the process (in particular how to obtain a wiki and jira account for couchdb which I have been unable to do) and I will start documenting this so we can put this into the trunk. Bob, were you able to do any more testing with large views, are there any suggestions on how to speed up the document id inclusion test as described below? thanks, Norman On Mon, A
Re: multiview on github
Bob, it is just checking that a given id participates in a view, if it makes it around the ring then it wins and gets streamed to the client, adding disjoints would be fairly simple. Currently the only way I can check if an id is in a view is to loop over the results of each view, hence each node in the ring is in its own process to keep things moving. A use case is two views, one that emits datetime (numeric) and another view that emits values, e.g. A, B, C ..., the query would then be to find the all documents with value A between start time and end time. Norman On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne wrote: > I took another peek at this and I'm curious as to what it's doing. Is it just > checking that a given id participates in a view? So if it makes it around the > ring it wins? Or is it actually computing the result of passing the doc thru > all the views? > > If the answer is the former then would disjunction also be something one > might want? I'm just curious, I don't have a use case and I forget the > original discussion around this. I sort of think of views as a functional > mapping from the database to some subset. That's not entirely accurate given > there's this reduce phase also. So I could imagine composing views in a > functional way, but the same thing can be had with just a different map > function that is the composition. > > Anyway if you have a brief description of this, with a use case, it would > help. > > Cheers, > > Bob > > > > > On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: > >> Chris, James >> >> thanks for bumping this, we are using this internally at 'scale' >> (million+ keys). I want this to work for couchdb as we want to give >> back for such a great product and support this going forward, so any >> suggestions welcomed and we will test and add them to the local github >> account with the aim of getting this into trunk. >> >> Norman >> >> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton >> wrote: >>> I want to use it! I just haven't gotten around to it. I was going to try >>> and test it out this weekend and if I am able, I will certainly report back >>> what I find. >>> >>> James >>> >>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: >>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker wrote: > Bob, > > I can and have been testing the multiview at this scale, it is ok > (fast enough), but I think being able to test inclusion of a document > id in a view without having to loop would be a considerable speed > improvement. If you have any ideas let me know. > I just want to bump this thread, as I think this is a useful feature. I don't expect to be able to test it in the coming weeks, but if I did I would. Is anyone besides Norman using this? Has anyone used it at scale? Cheers, Chris > thanks, > > Norman > > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson wrote: >> I'm sorry, I've had no time to play with this at scale. >> >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker wrote: >>> Hi, >>> >>> are there any more comments on this, if not can you describe the >>> process (in particular how to obtain a wiki and jira account for >>> couchdb which I have been unable to do) and I will start documenting >>> this so we can put this into the trunk. >>> >>> Bob, were you able to do any more testing with large views, are there >>> any suggestions on how to speed up the document id inclusion test as >>> described below? >>> >>> thanks, >>> >>> Norman >>> >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker < norman.bar...@gmail.com> wrote: Bob, thanks for the feedback and for taking a look at the code. Guidelines on when to use a supervisor within couchdb with a gen_server would be appreciated, currently I have a supervisor and a gen_server, but if couchdb has a supervision process I could remove that layer. I think plugins is a great idea, however intersection of views is such as common request, perhaps there needs to plugin system and if a plugin is rated enough it goes into trunk as a core feature. the four (or slightly more) summary is here http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl % % send an id from the start list to the next node in the ring, if the id is in adjacent node then the this node sends to the next ring node % if the id gets all round the ring and back to the start node then is has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in par
Re: multiview on github
I took another peek at this and I'm curious as to what it's doing. Is it just checking that a given id participates in a view? So if it makes it around the ring it wins? Or is it actually computing the result of passing the doc thru all the views? If the answer is the former then would disjunction also be something one might want? I'm just curious, I don't have a use case and I forget the original discussion around this. I sort of think of views as a functional mapping from the database to some subset. That's not entirely accurate given there's this reduce phase also. So I could imagine composing views in a functional way, but the same thing can be had with just a different map function that is the composition. Anyway if you have a brief description of this, with a use case, it would help. Cheers, Bob On Sep 17, 2010, at 11:32 PM, Norman Barker wrote: > Chris, James > > thanks for bumping this, we are using this internally at 'scale' > (million+ keys). I want this to work for couchdb as we want to give > back for such a great product and support this going forward, so any > suggestions welcomed and we will test and add them to the local github > account with the aim of getting this into trunk. > > Norman > > On Fri, Sep 17, 2010 at 7:00 PM, James Hayton > wrote: >> I want to use it! I just haven't gotten around to it. I was going to try >> and test it out this weekend and if I am able, I will certainly report back >> what I find. >> >> James >> >> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: >> >>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker >>> wrote: Bob, I can and have been testing the multiview at this scale, it is ok (fast enough), but I think being able to test inclusion of a document id in a view without having to loop would be a considerable speed improvement. If you have any ideas let me know. >>> >>> I just want to bump this thread, as I think this is a useful feature. >>> I don't expect to be able to test it in the coming weeks, but if I did >>> I would. Is anyone besides Norman using this? Has anyone used it at >>> scale? >>> >>> Cheers, >>> Chris >>> thanks, Norman On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson >>> wrote: > I'm sorry, I've had no time to play with this at scale. > > On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker >>> wrote: >> Hi, >> >> are there any more comments on this, if not can you describe the >> process (in particular how to obtain a wiki and jira account for >> couchdb which I have been unable to do) and I will start documenting >> this so we can put this into the trunk. >> >> Bob, were you able to do any more testing with large views, are there >> any suggestions on how to speed up the document id inclusion test as >> described below? >> >> thanks, >> >> Norman >> >> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker < >>> norman.bar...@gmail.com> wrote: >>> Bob, >>> >>> thanks for the feedback and for taking a look at the code. Guidelines >>> on when to use a supervisor within couchdb with a gen_server would be >>> appreciated, currently I have a supervisor and a gen_server, but if >>> couchdb has a supervision process I could remove that layer. >>> >>> I think plugins is a great idea, however intersection of views is such >>> as common request, perhaps there needs to plugin system and if a >>> plugin is rated enough it goes into trunk as a core feature. >>> >>> the four (or slightly more) summary is here >>> >>> >>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl >>> >>> % >>> % send an id from the start list to the next node in the ring, if the >>> id is in adjacent node then the this node sends to the next ring node >>> >>> % if the id gets all round the ring and back to the start node then is >>> has intersected all queries and should be included. The nodes in the >>> ring >>> % should be sorted in size from small to large for this to be >>> effective >>> % >>> % In addition send the initial id list round in parallel >>> >>> it really needs some eyes from the core couchdb coders to see how to >>> speed up the inclusion testing, looping is bad even if it is done in >>> parallel. >>> >>> Multiview is usable, I am using it with some pretty big mega-views (as >>> per the raindrop) model, I am also available to add features to this >>> as this is core part of our work and we want to give it to couch as a >>> contribution. >>> >>> thanks, >>> >>> Norman >>> >>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne >>> wrote: Hi Norman, I took a peek at multiview. I haven't followed this too closely on >>> the mailing list but this is *view intersection
Re: multiview on github
Chris, James thanks for bumping this, we are using this internally at 'scale' (million+ keys). I want this to work for couchdb as we want to give back for such a great product and support this going forward, so any suggestions welcomed and we will test and add them to the local github account with the aim of getting this into trunk. Norman On Fri, Sep 17, 2010 at 7:00 PM, James Hayton wrote: > I want to use it! I just haven't gotten around to it. I was going to try > and test it out this weekend and if I am able, I will certainly report back > what I find. > > James > > On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: > >> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker >> wrote: >> > Bob, >> > >> > I can and have been testing the multiview at this scale, it is ok >> > (fast enough), but I think being able to test inclusion of a document >> > id in a view without having to loop would be a considerable speed >> > improvement. If you have any ideas let me know. >> > >> >> I just want to bump this thread, as I think this is a useful feature. >> I don't expect to be able to test it in the coming weeks, but if I did >> I would. Is anyone besides Norman using this? Has anyone used it at >> scale? >> >> Cheers, >> Chris >> >> > thanks, >> > >> > Norman >> > >> > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson >> wrote: >> >> I'm sorry, I've had no time to play with this at scale. >> >> >> >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker >> wrote: >> >>> Hi, >> >>> >> >>> are there any more comments on this, if not can you describe the >> >>> process (in particular how to obtain a wiki and jira account for >> >>> couchdb which I have been unable to do) and I will start documenting >> >>> this so we can put this into the trunk. >> >>> >> >>> Bob, were you able to do any more testing with large views, are there >> >>> any suggestions on how to speed up the document id inclusion test as >> >>> described below? >> >>> >> >>> thanks, >> >>> >> >>> Norman >> >>> >> >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker < >> norman.bar...@gmail.com> wrote: >> Bob, >> >> thanks for the feedback and for taking a look at the code. Guidelines >> on when to use a supervisor within couchdb with a gen_server would be >> appreciated, currently I have a supervisor and a gen_server, but if >> couchdb has a supervision process I could remove that layer. >> >> I think plugins is a great idea, however intersection of views is such >> as common request, perhaps there needs to plugin system and if a >> plugin is rated enough it goes into trunk as a core feature. >> >> the four (or slightly more) summary is here >> >> >> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl >> >> % >> % send an id from the start list to the next node in the ring, if the >> id is in adjacent node then the this node sends to the next ring node >> >> % if the id gets all round the ring and back to the start node then is >> has intersected all queries and should be included. The nodes in the >> ring >> % should be sorted in size from small to large for this to be >> effective >> % >> % In addition send the initial id list round in parallel >> >> it really needs some eyes from the core couchdb coders to see how to >> speed up the inclusion testing, looping is bad even if it is done in >> parallel. >> >> Multiview is usable, I am using it with some pretty big mega-views (as >> per the raindrop) model, I am also available to add features to this >> as this is core part of our work and we want to give it to couch as a >> contribution. >> >> thanks, >> >> Norman >> >> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne >> wrote: >> > Hi Norman, >> > >> > I took a peek at multiview. I haven't followed this too closely on >> the mailing list but this is *view intersection*? Is there a 5 line summary >> of what this does somewhere? >> > >> > I'm curious as to why the daemon needs to be a supervisor, most if >> not all of the other daemons are gen_servers. OTP allows this but I think >> this is a good area where some CouchDB guidelines on plugins would apply. >> > >> > It strikes me that views, the use of map/reduce, etc. are one of the >> trickier aspects of using CouchDB, particularly for new users coming from >> the SQL world. People are also reporting issues with performance of views, I >> guess often because reduce functions go out of control. >> > >> > I think the project would be better served if features like this >> were available as plugins. I would put GeoCouch in the same category. Its >> very neat and timely (given everyone wants to know where everyone else is >> using their telephone but without talking other than asynchronously), but a >> server plugin architecture that woul
Re: multiview on github
I want to use it! I just haven't gotten around to it. I was going to try and test it out this weekend and if I am able, I will certainly report back what I find. James On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson wrote: > On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker > wrote: > > Bob, > > > > I can and have been testing the multiview at this scale, it is ok > > (fast enough), but I think being able to test inclusion of a document > > id in a view without having to loop would be a considerable speed > > improvement. If you have any ideas let me know. > > > > I just want to bump this thread, as I think this is a useful feature. > I don't expect to be able to test it in the coming weeks, but if I did > I would. Is anyone besides Norman using this? Has anyone used it at > scale? > > Cheers, > Chris > > > thanks, > > > > Norman > > > > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson > wrote: > >> I'm sorry, I've had no time to play with this at scale. > >> > >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker > wrote: > >>> Hi, > >>> > >>> are there any more comments on this, if not can you describe the > >>> process (in particular how to obtain a wiki and jira account for > >>> couchdb which I have been unable to do) and I will start documenting > >>> this so we can put this into the trunk. > >>> > >>> Bob, were you able to do any more testing with large views, are there > >>> any suggestions on how to speed up the document id inclusion test as > >>> described below? > >>> > >>> thanks, > >>> > >>> Norman > >>> > >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker < > norman.bar...@gmail.com> wrote: > Bob, > > thanks for the feedback and for taking a look at the code. Guidelines > on when to use a supervisor within couchdb with a gen_server would be > appreciated, currently I have a supervisor and a gen_server, but if > couchdb has a supervision process I could remove that layer. > > I think plugins is a great idea, however intersection of views is such > as common request, perhaps there needs to plugin system and if a > plugin is rated enough it goes into trunk as a core feature. > > the four (or slightly more) summary is here > > > http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl > > % > % send an id from the start list to the next node in the ring, if the > id is in adjacent node then the this node sends to the next ring node > > % if the id gets all round the ring and back to the start node then is > has intersected all queries and should be included. The nodes in the > ring > % should be sorted in size from small to large for this to be > effective > % > % In addition send the initial id list round in parallel > > it really needs some eyes from the core couchdb coders to see how to > speed up the inclusion testing, looping is bad even if it is done in > parallel. > > Multiview is usable, I am using it with some pretty big mega-views (as > per the raindrop) model, I am also available to add features to this > as this is core part of our work and we want to give it to couch as a > contribution. > > thanks, > > Norman > > On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne > wrote: > > Hi Norman, > > > > I took a peek at multiview. I haven't followed this too closely on > the mailing list but this is *view intersection*? Is there a 5 line summary > of what this does somewhere? > > > > I'm curious as to why the daemon needs to be a supervisor, most if > not all of the other daemons are gen_servers. OTP allows this but I think > this is a good area where some CouchDB guidelines on plugins would apply. > > > > It strikes me that views, the use of map/reduce, etc. are one of the > trickier aspects of using CouchDB, particularly for new users coming from > the SQL world. People are also reporting issues with performance of views, I > guess often because reduce functions go out of control. > > > > I think the project would be better served if features like this > were available as plugins. I would put GeoCouch in the same category. Its > very neat and timely (given everyone wants to know where everyone else is > using their telephone but without talking other than asynchronously), but a > server plugin architecture that would allow this to be done cleanly should > come first. > > > > This is just my opinion. I'd love to see some of the project > founders and committers weigh in on this and set some direction. > > > > Best regards, > > > > Bob > > > > > > > > > > > > On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: > > > >> I would like to take this multiview code and have it added to trunk > if > >> possible, what are the next steps? > >> > >> thanks, > >> > >>>
Re: multiview on github
On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker wrote: > Bob, > > I can and have been testing the multiview at this scale, it is ok > (fast enough), but I think being able to test inclusion of a document > id in a view without having to loop would be a considerable speed > improvement. If you have any ideas let me know. > I just want to bump this thread, as I think this is a useful feature. I don't expect to be able to test it in the coming weeks, but if I did I would. Is anyone besides Norman using this? Has anyone used it at scale? Cheers, Chris > thanks, > > Norman > > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson > wrote: >> I'm sorry, I've had no time to play with this at scale. >> >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker >> wrote: >>> Hi, >>> >>> are there any more comments on this, if not can you describe the >>> process (in particular how to obtain a wiki and jira account for >>> couchdb which I have been unable to do) and I will start documenting >>> this so we can put this into the trunk. >>> >>> Bob, were you able to do any more testing with large views, are there >>> any suggestions on how to speed up the document id inclusion test as >>> described below? >>> >>> thanks, >>> >>> Norman >>> >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker >>> wrote: Bob, thanks for the feedback and for taking a look at the code. Guidelines on when to use a supervisor within couchdb with a gen_server would be appreciated, currently I have a supervisor and a gen_server, but if couchdb has a supervision process I could remove that layer. I think plugins is a great idea, however intersection of views is such as common request, perhaps there needs to plugin system and if a plugin is rated enough it goes into trunk as a core feature. the four (or slightly more) summary is here http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl % % send an id from the start list to the next node in the ring, if the id is in adjacent node then the this node sends to the next ring node % if the id gets all round the ring and back to the start node then is has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in parallel it really needs some eyes from the core couchdb coders to see how to speed up the inclusion testing, looping is bad even if it is done in parallel. Multiview is usable, I am using it with some pretty big mega-views (as per the raindrop) model, I am also available to add features to this as this is core part of our work and we want to give it to couch as a contribution. thanks, Norman On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne wrote: > Hi Norman, > > I took a peek at multiview. I haven't followed this too closely on the > mailing list but this is *view intersection*? Is there a 5 line summary > of what this does somewhere? > > I'm curious as to why the daemon needs to be a supervisor, most if not > all of the other daemons are gen_servers. OTP allows this but I think > this is a good area where some CouchDB guidelines on plugins would apply. > > It strikes me that views, the use of map/reduce, etc. are one of the > trickier aspects of using CouchDB, particularly for new users coming from > the SQL world. People are also reporting issues with performance of > views, I guess often because reduce functions go out of control. > > I think the project would be better served if features like this were > available as plugins. I would put GeoCouch in the same category. Its very > neat and timely (given everyone wants to know where everyone else is > using their telephone but without talking other than asynchronously), but > a server plugin architecture that would allow this to be done cleanly > should come first. > > This is just my opinion. I'd love to see some of the project founders > and committers weigh in on this and set some direction. > > Best regards, > > Bob > > > > > > On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: > >> I would like to take this multiview code and have it added to trunk if >> possible, what are the next steps? >> >> thanks, >> >> Norman >> >> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker >> wrote: >>> I have made >>> >>> http://github.com/normanb/couchdb >>> >>> which is a fork of the latest couchdb trunk with the multiview code >>> and tests added. >>> >>> If geocouch is available then it can still be used. >>> >>> There are a couple of questions about the multiview on th
Re: multiview on github
Bob, I can and have been testing the multiview at this scale, it is ok (fast enough), but I think being able to test inclusion of a document id in a view without having to loop would be a considerable speed improvement. If you have any ideas let me know. thanks, Norman On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson wrote: > I'm sorry, I've had no time to play with this at scale. > > On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker > wrote: >> Hi, >> >> are there any more comments on this, if not can you describe the >> process (in particular how to obtain a wiki and jira account for >> couchdb which I have been unable to do) and I will start documenting >> this so we can put this into the trunk. >> >> Bob, were you able to do any more testing with large views, are there >> any suggestions on how to speed up the document id inclusion test as >> described below? >> >> thanks, >> >> Norman >> >> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker >> wrote: >>> Bob, >>> >>> thanks for the feedback and for taking a look at the code. Guidelines >>> on when to use a supervisor within couchdb with a gen_server would be >>> appreciated, currently I have a supervisor and a gen_server, but if >>> couchdb has a supervision process I could remove that layer. >>> >>> I think plugins is a great idea, however intersection of views is such >>> as common request, perhaps there needs to plugin system and if a >>> plugin is rated enough it goes into trunk as a core feature. >>> >>> the four (or slightly more) summary is here >>> >>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl >>> >>> % >>> % send an id from the start list to the next node in the ring, if the >>> id is in adjacent node then the this node sends to the next ring node >>> >>> % if the id gets all round the ring and back to the start node then is >>> has intersected all queries and should be included. The nodes in the >>> ring >>> % should be sorted in size from small to large for this to be effective >>> % >>> % In addition send the initial id list round in parallel >>> >>> it really needs some eyes from the core couchdb coders to see how to >>> speed up the inclusion testing, looping is bad even if it is done in >>> parallel. >>> >>> Multiview is usable, I am using it with some pretty big mega-views (as >>> per the raindrop) model, I am also available to add features to this >>> as this is core part of our work and we want to give it to couch as a >>> contribution. >>> >>> thanks, >>> >>> Norman >>> >>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne >>> wrote: Hi Norman, I took a peek at multiview. I haven't followed this too closely on the mailing list but this is *view intersection*? Is there a 5 line summary of what this does somewhere? I'm curious as to why the daemon needs to be a supervisor, most if not all of the other daemons are gen_servers. OTP allows this but I think this is a good area where some CouchDB guidelines on plugins would apply. It strikes me that views, the use of map/reduce, etc. are one of the trickier aspects of using CouchDB, particularly for new users coming from the SQL world. People are also reporting issues with performance of views, I guess often because reduce functions go out of control. I think the project would be better served if features like this were available as plugins. I would put GeoCouch in the same category. Its very neat and timely (given everyone wants to know where everyone else is using their telephone but without talking other than asynchronously), but a server plugin architecture that would allow this to be done cleanly should come first. This is just my opinion. I'd love to see some of the project founders and committers weigh in on this and set some direction. Best regards, Bob On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: > I would like to take this multiview code and have it added to trunk if > possible, what are the next steps? > > thanks, > > Norman > > On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker > wrote: >> I have made >> >> http://github.com/normanb/couchdb >> >> which is a fork of the latest couchdb trunk with the multiview code >> and tests added. >> >> If geocouch is available then it can still be used. >> >> There are a couple of questions about the multiview on the user /dev >> list so I will be adding some more test cases during today. >> >> thanks, >> >> Norman >> >> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker >> wrote: >>> this is possible, I forked geocouch since I use it, but I have already >>> separated the geocouch dependencies from the trunk. >>> >>> I can do this tomorrow, certainly be interested in any feedback. >>> >
Re: multiview on github
I'm sorry, I've had no time to play with this at scale. On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker wrote: > Hi, > > are there any more comments on this, if not can you describe the > process (in particular how to obtain a wiki and jira account for > couchdb which I have been unable to do) and I will start documenting > this so we can put this into the trunk. > > Bob, were you able to do any more testing with large views, are there > any suggestions on how to speed up the document id inclusion test as > described below? > > thanks, > > Norman > > On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker > wrote: >> Bob, >> >> thanks for the feedback and for taking a look at the code. Guidelines >> on when to use a supervisor within couchdb with a gen_server would be >> appreciated, currently I have a supervisor and a gen_server, but if >> couchdb has a supervision process I could remove that layer. >> >> I think plugins is a great idea, however intersection of views is such >> as common request, perhaps there needs to plugin system and if a >> plugin is rated enough it goes into trunk as a core feature. >> >> the four (or slightly more) summary is here >> >> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl >> >> % >> % send an id from the start list to the next node in the ring, if the >> id is in adjacent node then the this node sends to the next ring node >> >> % if the id gets all round the ring and back to the start node then is >> has intersected all queries and should be included. The nodes in the >> ring >> % should be sorted in size from small to large for this to be effective >> % >> % In addition send the initial id list round in parallel >> >> it really needs some eyes from the core couchdb coders to see how to >> speed up the inclusion testing, looping is bad even if it is done in >> parallel. >> >> Multiview is usable, I am using it with some pretty big mega-views (as >> per the raindrop) model, I am also available to add features to this >> as this is core part of our work and we want to give it to couch as a >> contribution. >> >> thanks, >> >> Norman >> >> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne >> wrote: >>> Hi Norman, >>> >>> I took a peek at multiview. I haven't followed this too closely on the >>> mailing list but this is *view intersection*? Is there a 5 line summary of >>> what this does somewhere? >>> >>> I'm curious as to why the daemon needs to be a supervisor, most if not all >>> of the other daemons are gen_servers. OTP allows this but I think this is a >>> good area where some CouchDB guidelines on plugins would apply. >>> >>> It strikes me that views, the use of map/reduce, etc. are one of the >>> trickier aspects of using CouchDB, particularly for new users coming from >>> the SQL world. People are also reporting issues with performance of views, >>> I guess often because reduce functions go out of control. >>> >>> I think the project would be better served if features like this were >>> available as plugins. I would put GeoCouch in the same category. Its very >>> neat and timely (given everyone wants to know where everyone else is using >>> their telephone but without talking other than asynchronously), but a >>> server plugin architecture that would allow this to be done cleanly should >>> come first. >>> >>> This is just my opinion. I'd love to see some of the project founders and >>> committers weigh in on this and set some direction. >>> >>> Best regards, >>> >>> Bob >>> >>> >>> >>> >>> >>> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: >>> I would like to take this multiview code and have it added to trunk if possible, what are the next steps? thanks, Norman On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker wrote: > I have made > > http://github.com/normanb/couchdb > > which is a fork of the latest couchdb trunk with the multiview code > and tests added. > > If geocouch is available then it can still be used. > > There are a couple of questions about the multiview on the user /dev > list so I will be adding some more test cases during today. > > thanks, > > Norman > > On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker > wrote: >> this is possible, I forked geocouch since I use it, but I have already >> separated the geocouch dependencies from the trunk. >> >> I can do this tomorrow, certainly be interested in any feedback. >> >> thanks, >> >> Norman >> >> >> >> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische >> wrote: >>> On 08/18/2010 03:26 AM, J Chris Anderson wrote: On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: > Hi, > > I have made the changes as recommended, adding a test case > multiview.js and also adding the userCtx to open the db. > > I have also forked geoco
Re: multiview on github
Hi, are there any more comments on this, if not can you describe the process (in particular how to obtain a wiki and jira account for couchdb which I have been unable to do) and I will start documenting this so we can put this into the trunk. Bob, were you able to do any more testing with large views, are there any suggestions on how to speed up the document id inclusion test as described below? thanks, Norman On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker wrote: > Bob, > > thanks for the feedback and for taking a look at the code. Guidelines > on when to use a supervisor within couchdb with a gen_server would be > appreciated, currently I have a supervisor and a gen_server, but if > couchdb has a supervision process I could remove that layer. > > I think plugins is a great idea, however intersection of views is such > as common request, perhaps there needs to plugin system and if a > plugin is rated enough it goes into trunk as a core feature. > > the four (or slightly more) summary is here > > http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl > > % > % send an id from the start list to the next node in the ring, if the > id is in adjacent node then the this node sends to the next ring node > > % if the id gets all round the ring and back to the start node then is > has intersected all queries and should be included. The nodes in the > ring > % should be sorted in size from small to large for this to be effective > % > % In addition send the initial id list round in parallel > > it really needs some eyes from the core couchdb coders to see how to > speed up the inclusion testing, looping is bad even if it is done in > parallel. > > Multiview is usable, I am using it with some pretty big mega-views (as > per the raindrop) model, I am also available to add features to this > as this is core part of our work and we want to give it to couch as a > contribution. > > thanks, > > Norman > > On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne > wrote: >> Hi Norman, >> >> I took a peek at multiview. I haven't followed this too closely on the >> mailing list but this is *view intersection*? Is there a 5 line summary of >> what this does somewhere? >> >> I'm curious as to why the daemon needs to be a supervisor, most if not all >> of the other daemons are gen_servers. OTP allows this but I think this is a >> good area where some CouchDB guidelines on plugins would apply. >> >> It strikes me that views, the use of map/reduce, etc. are one of the >> trickier aspects of using CouchDB, particularly for new users coming from >> the SQL world. People are also reporting issues with performance of views, I >> guess often because reduce functions go out of control. >> >> I think the project would be better served if features like this were >> available as plugins. I would put GeoCouch in the same category. Its very >> neat and timely (given everyone wants to know where everyone else is using >> their telephone but without talking other than asynchronously), but a server >> plugin architecture that would allow this to be done cleanly should come >> first. >> >> This is just my opinion. I'd love to see some of the project founders and >> committers weigh in on this and set some direction. >> >> Best regards, >> >> Bob >> >> >> >> >> >> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: >> >>> I would like to take this multiview code and have it added to trunk if >>> possible, what are the next steps? >>> >>> thanks, >>> >>> Norman >>> >>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker >>> wrote: I have made http://github.com/normanb/couchdb which is a fork of the latest couchdb trunk with the multiview code and tests added. If geocouch is available then it can still be used. There are a couple of questions about the multiview on the user /dev list so I will be adding some more test cases during today. thanks, Norman On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker wrote: > this is possible, I forked geocouch since I use it, but I have already > separated the geocouch dependencies from the trunk. > > I can do this tomorrow, certainly be interested in any feedback. > > thanks, > > Norman > > > > On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische > wrote: >> On 08/18/2010 03:26 AM, J Chris Anderson wrote: >>> >>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >>> Hi, I have made the changes as recommended, adding a test case multiview.js and also adding the userCtx to open the db. I have also forked geocouch and this is available here >>> >>> this patch seems important (especially as people are already asking for >>> help using it on user@) >>> >>> to get it committed, it either must remove the dependency on GeoCouch, >>> or >>> b
Re: multiview on github
Bob, thanks for the feedback and for taking a look at the code. Guidelines on when to use a supervisor within couchdb with a gen_server would be appreciated, currently I have a supervisor and a gen_server, but if couchdb has a supervision process I could remove that layer. I think plugins is a great idea, however intersection of views is such as common request, perhaps there needs to plugin system and if a plugin is rated enough it goes into trunk as a core feature. the four (or slightly more) summary is here http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl % % send an id from the start list to the next node in the ring, if the id is in adjacent node then the this node sends to the next ring node % if the id gets all round the ring and back to the start node then is has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in parallel it really needs some eyes from the core couchdb coders to see how to speed up the inclusion testing, looping is bad even if it is done in parallel. Multiview is usable, I am using it with some pretty big mega-views (as per the raindrop) model, I am also available to add features to this as this is core part of our work and we want to give it to couch as a contribution. thanks, Norman On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne wrote: > Hi Norman, > > I took a peek at multiview. I haven't followed this too closely on the > mailing list but this is *view intersection*? Is there a 5 line summary of > what this does somewhere? > > I'm curious as to why the daemon needs to be a supervisor, most if not all > of the other daemons are gen_servers. OTP allows this but I think this is a > good area where some CouchDB guidelines on plugins would apply. > > It strikes me that views, the use of map/reduce, etc. are one of the > trickier aspects of using CouchDB, particularly for new users coming from the > SQL world. People are also reporting issues with performance of views, I > guess often because reduce functions go out of control. > > I think the project would be better served if features like this were > available as plugins. I would put GeoCouch in the same category. Its very > neat and timely (given everyone wants to know where everyone else is using > their telephone but without talking other than asynchronously), but a server > plugin architecture that would allow this to be done cleanly should come > first. > > This is just my opinion. I'd love to see some of the project founders and > committers weigh in on this and set some direction. > > Best regards, > > Bob > > > > > > On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: > >> I would like to take this multiview code and have it added to trunk if >> possible, what are the next steps? >> >> thanks, >> >> Norman >> >> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker >> wrote: >>> I have made >>> >>> http://github.com/normanb/couchdb >>> >>> which is a fork of the latest couchdb trunk with the multiview code >>> and tests added. >>> >>> If geocouch is available then it can still be used. >>> >>> There are a couple of questions about the multiview on the user /dev >>> list so I will be adding some more test cases during today. >>> >>> thanks, >>> >>> Norman >>> >>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker >>> wrote: this is possible, I forked geocouch since I use it, but I have already separated the geocouch dependencies from the trunk. I can do this tomorrow, certainly be interested in any feedback. thanks, Norman On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische wrote: > On 08/18/2010 03:26 AM, J Chris Anderson wrote: >> >> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >> >>> Hi, >>> >>> I have made the changes as recommended, adding a test case >>> multiview.js and also adding the userCtx to open the db. >>> >>> I have also forked geocouch and this is available here >>> >> >> this patch seems important (especially as people are already asking for >> help using it on user@) >> >> to get it committed, it either must remove the dependency on GeoCouch, or >> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. >> >> Is it possible / useful to make a version that doesn't use GeoCouch? And >> then to make the GeoCouch capabilities part GeoCouch for now? >> >> Chris >> > > Hi Norman, > > if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to > GeoCouch itself (as GeoCouch isn't ready for trunk yet). > > Lately I haven't been that responsive when it comes to GeoCouch, but that > will change (in about a month) after holidays and FOSS4G. > > Cheers, > Volker > >>> > >
Re: multiview on github
Hi Norman, I took a peek at multiview. I haven't followed this too closely on the mailing list but this is *view intersection*? Is there a 5 line summary of what this does somewhere? I'm curious as to why the daemon needs to be a supervisor, most if not all of the other daemons are gen_servers. OTP allows this but I think this is a good area where some CouchDB guidelines on plugins would apply. It strikes me that views, the use of map/reduce, etc. are one of the trickier aspects of using CouchDB, particularly for new users coming from the SQL world. People are also reporting issues with performance of views, I guess often because reduce functions go out of control. I think the project would be better served if features like this were available as plugins. I would put GeoCouch in the same category. Its very neat and timely (given everyone wants to know where everyone else is using their telephone but without talking other than asynchronously), but a server plugin architecture that would allow this to be done cleanly should come first. This is just my opinion. I'd love to see some of the project founders and committers weigh in on this and set some direction. Best regards, Bob On Aug 22, 2010, at 5:45 PM, Norman Barker wrote: > I would like to take this multiview code and have it added to trunk if > possible, what are the next steps? > > thanks, > > Norman > > On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker > wrote: >> I have made >> >> http://github.com/normanb/couchdb >> >> which is a fork of the latest couchdb trunk with the multiview code >> and tests added. >> >> If geocouch is available then it can still be used. >> >> There are a couple of questions about the multiview on the user /dev >> list so I will be adding some more test cases during today. >> >> thanks, >> >> Norman >> >> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker >> wrote: >>> this is possible, I forked geocouch since I use it, but I have already >>> separated the geocouch dependencies from the trunk. >>> >>> I can do this tomorrow, certainly be interested in any feedback. >>> >>> thanks, >>> >>> Norman >>> >>> >>> >>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische >>> wrote: On 08/18/2010 03:26 AM, J Chris Anderson wrote: > > On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: > >> Hi, >> >> I have made the changes as recommended, adding a test case >> multiview.js and also adding the userCtx to open the db. >> >> I have also forked geocouch and this is available here >> > > this patch seems important (especially as people are already asking for > help using it on user@) > > to get it committed, it either must remove the dependency on GeoCouch, or > become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. > > Is it possible / useful to make a version that doesn't use GeoCouch? And > then to make the GeoCouch capabilities part GeoCouch for now? > > Chris > Hi Norman, if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to GeoCouch itself (as GeoCouch isn't ready for trunk yet). Lately I haven't been that responsive when it comes to GeoCouch, but that will change (in about a month) after holidays and FOSS4G. Cheers, Volker >>> >>
Re: multiview on github
that should be 'couchdb should not be in version control', sorry not used to git. On Sun, Aug 22, 2010 at 9:22 PM, Norman Barker wrote: > Bob, > > I am testing on 1+ documents, I appreciate that we need to > establish when a multi-process as opposed to a tbd (suggestions > welcome) approach is required. The startkey / endkey is an issue > though, is there a better way to test inclusion? > > The speed of the multiview is directly linked to the size of the > smallest view result though, so total documents isn't a factor. > > I am still thinking about fti, I am testing with clucene, but the > external handler problem is the same, how to make it stream in order. > > I will fix the local_dev.ini problem tomorrow, couchdb should be in > version control. > > Any hints on how to test inclusion are appreciated, it will greatly > speed up collation. > > thanks, > > Norman > > > > On Sun, Aug 22, 2010 at 4:15 PM, Robert Newson > wrote: >> I'm concerned about the performance of this on non-trivial databases, >> given the iteration of all items between startkey and endkey. I don't >> have time to test it this week but I'd be interested to hear the time >> it took to do a multiview on two views of, say, a million rows each >> (especially as compared to the two normal view calls). >> >> I was also intrigued to see the code handles fti too, a problem I have >> spent some time thinking about without finding a satisfactorily >> performant solution too. I note that, as written, it doesn't appear to >> work because the fti call (I'm assuming couchdb-lucene) will only >> return the top N matching hits, so at best you can filter those >> against another view (perhaps that's useful?). The trick to merging a >> view and an fti result together would be to get the results from both >> in the same order and step through the rows, filtering as you go. >> Sorting in Lucene has a large memory hit so I gave up on that >> solution. >> >> Finally, your patch appears to add two generated files (local_dev.ini >> and etc/init.d/couchdb) to the branch which should be fixed (add your >> settings to default.init.tpl.in instead). >> >> I should end by saying that if the problems above can be solved then >> this would be a very useful addition to CouchDB and one that is >> frequently requested. It might also be a model for multi-machine >> views. >> >> B. >> >> On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker >> wrote: >>> I would like to take this multiview code and have it added to trunk if >>> possible, what are the next steps? >>> >>> thanks, >>> >>> Norman >>> >>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker >>> wrote: I have made http://github.com/normanb/couchdb which is a fork of the latest couchdb trunk with the multiview code and tests added. If geocouch is available then it can still be used. There are a couple of questions about the multiview on the user /dev list so I will be adding some more test cases during today. thanks, Norman On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker wrote: > this is possible, I forked geocouch since I use it, but I have already > separated the geocouch dependencies from the trunk. > > I can do this tomorrow, certainly be interested in any feedback. > > thanks, > > Norman > > > > On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische > wrote: >> On 08/18/2010 03:26 AM, J Chris Anderson wrote: >>> >>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >>> Hi, I have made the changes as recommended, adding a test case multiview.js and also adding the userCtx to open the db. I have also forked geocouch and this is available here >>> >>> this patch seems important (especially as people are already asking for >>> help using it on user@) >>> >>> to get it committed, it either must remove the dependency on GeoCouch, >>> or >>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. >>> >>> Is it possible / useful to make a version that doesn't use GeoCouch? And >>> then to make the GeoCouch capabilities part GeoCouch for now? >>> >>> Chris >>> >> >> Hi Norman, >> >> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits >> to >> GeoCouch itself (as GeoCouch isn't ready for trunk yet). >> >> Lately I haven't been that responsive when it comes to GeoCouch, but that >> will change (in about a month) after holidays and FOSS4G. >> >> Cheers, >> Volker >> > >>> >> >
Re: multiview on github
Bob, I am testing on 1+ documents, I appreciate that we need to establish when a multi-process as opposed to a tbd (suggestions welcome) approach is required. The startkey / endkey is an issue though, is there a better way to test inclusion? The speed of the multiview is directly linked to the size of the smallest view result though, so total documents isn't a factor. I am still thinking about fti, I am testing with clucene, but the external handler problem is the same, how to make it stream in order. I will fix the local_dev.ini problem tomorrow, couchdb should be in version control. Any hints on how to test inclusion are appreciated, it will greatly speed up collation. thanks, Norman On Sun, Aug 22, 2010 at 4:15 PM, Robert Newson wrote: > I'm concerned about the performance of this on non-trivial databases, > given the iteration of all items between startkey and endkey. I don't > have time to test it this week but I'd be interested to hear the time > it took to do a multiview on two views of, say, a million rows each > (especially as compared to the two normal view calls). > > I was also intrigued to see the code handles fti too, a problem I have > spent some time thinking about without finding a satisfactorily > performant solution too. I note that, as written, it doesn't appear to > work because the fti call (I'm assuming couchdb-lucene) will only > return the top N matching hits, so at best you can filter those > against another view (perhaps that's useful?). The trick to merging a > view and an fti result together would be to get the results from both > in the same order and step through the rows, filtering as you go. > Sorting in Lucene has a large memory hit so I gave up on that > solution. > > Finally, your patch appears to add two generated files (local_dev.ini > and etc/init.d/couchdb) to the branch which should be fixed (add your > settings to default.init.tpl.in instead). > > I should end by saying that if the problems above can be solved then > this would be a very useful addition to CouchDB and one that is > frequently requested. It might also be a model for multi-machine > views. > > B. > > On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker > wrote: >> I would like to take this multiview code and have it added to trunk if >> possible, what are the next steps? >> >> thanks, >> >> Norman >> >> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker >> wrote: >>> I have made >>> >>> http://github.com/normanb/couchdb >>> >>> which is a fork of the latest couchdb trunk with the multiview code >>> and tests added. >>> >>> If geocouch is available then it can still be used. >>> >>> There are a couple of questions about the multiview on the user /dev >>> list so I will be adding some more test cases during today. >>> >>> thanks, >>> >>> Norman >>> >>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker >>> wrote: this is possible, I forked geocouch since I use it, but I have already separated the geocouch dependencies from the trunk. I can do this tomorrow, certainly be interested in any feedback. thanks, Norman On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische wrote: > On 08/18/2010 03:26 AM, J Chris Anderson wrote: >> >> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >> >>> Hi, >>> >>> I have made the changes as recommended, adding a test case >>> multiview.js and also adding the userCtx to open the db. >>> >>> I have also forked geocouch and this is available here >>> >> >> this patch seems important (especially as people are already asking for >> help using it on user@) >> >> to get it committed, it either must remove the dependency on GeoCouch, or >> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. >> >> Is it possible / useful to make a version that doesn't use GeoCouch? And >> then to make the GeoCouch capabilities part GeoCouch for now? >> >> Chris >> > > Hi Norman, > > if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to > GeoCouch itself (as GeoCouch isn't ready for trunk yet). > > Lately I haven't been that responsive when it comes to GeoCouch, but that > will change (in about a month) after holidays and FOSS4G. > > Cheers, > Volker > >>> >> >
Re: multiview on github
I'm concerned about the performance of this on non-trivial databases, given the iteration of all items between startkey and endkey. I don't have time to test it this week but I'd be interested to hear the time it took to do a multiview on two views of, say, a million rows each (especially as compared to the two normal view calls). I was also intrigued to see the code handles fti too, a problem I have spent some time thinking about without finding a satisfactorily performant solution too. I note that, as written, it doesn't appear to work because the fti call (I'm assuming couchdb-lucene) will only return the top N matching hits, so at best you can filter those against another view (perhaps that's useful?). The trick to merging a view and an fti result together would be to get the results from both in the same order and step through the rows, filtering as you go. Sorting in Lucene has a large memory hit so I gave up on that solution. Finally, your patch appears to add two generated files (local_dev.ini and etc/init.d/couchdb) to the branch which should be fixed (add your settings to default.init.tpl.in instead). I should end by saying that if the problems above can be solved then this would be a very useful addition to CouchDB and one that is frequently requested. It might also be a model for multi-machine views. B. On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker wrote: > I would like to take this multiview code and have it added to trunk if > possible, what are the next steps? > > thanks, > > Norman > > On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker > wrote: >> I have made >> >> http://github.com/normanb/couchdb >> >> which is a fork of the latest couchdb trunk with the multiview code >> and tests added. >> >> If geocouch is available then it can still be used. >> >> There are a couple of questions about the multiview on the user /dev >> list so I will be adding some more test cases during today. >> >> thanks, >> >> Norman >> >> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker >> wrote: >>> this is possible, I forked geocouch since I use it, but I have already >>> separated the geocouch dependencies from the trunk. >>> >>> I can do this tomorrow, certainly be interested in any feedback. >>> >>> thanks, >>> >>> Norman >>> >>> >>> >>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische >>> wrote: On 08/18/2010 03:26 AM, J Chris Anderson wrote: > > On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: > >> Hi, >> >> I have made the changes as recommended, adding a test case >> multiview.js and also adding the userCtx to open the db. >> >> I have also forked geocouch and this is available here >> > > this patch seems important (especially as people are already asking for > help using it on user@) > > to get it committed, it either must remove the dependency on GeoCouch, or > become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. > > Is it possible / useful to make a version that doesn't use GeoCouch? And > then to make the GeoCouch capabilities part GeoCouch for now? > > Chris > Hi Norman, if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to GeoCouch itself (as GeoCouch isn't ready for trunk yet). Lately I haven't been that responsive when it comes to GeoCouch, but that will change (in about a month) after holidays and FOSS4G. Cheers, Volker >>> >> >
Re: multiview on github
I would like to take this multiview code and have it added to trunk if possible, what are the next steps? thanks, Norman On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker wrote: > I have made > > http://github.com/normanb/couchdb > > which is a fork of the latest couchdb trunk with the multiview code > and tests added. > > If geocouch is available then it can still be used. > > There are a couple of questions about the multiview on the user /dev > list so I will be adding some more test cases during today. > > thanks, > > Norman > > On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker > wrote: >> this is possible, I forked geocouch since I use it, but I have already >> separated the geocouch dependencies from the trunk. >> >> I can do this tomorrow, certainly be interested in any feedback. >> >> thanks, >> >> Norman >> >> >> >> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische >> wrote: >>> On 08/18/2010 03:26 AM, J Chris Anderson wrote: On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: > Hi, > > I have made the changes as recommended, adding a test case > multiview.js and also adding the userCtx to open the db. > > I have also forked geocouch and this is available here > this patch seems important (especially as people are already asking for help using it on user@) to get it committed, it either must remove the dependency on GeoCouch, or become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. Is it possible / useful to make a version that doesn't use GeoCouch? And then to make the GeoCouch capabilities part GeoCouch for now? Chris >>> >>> Hi Norman, >>> >>> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to >>> GeoCouch itself (as GeoCouch isn't ready for trunk yet). >>> >>> Lately I haven't been that responsive when it comes to GeoCouch, but that >>> will change (in about a month) after holidays and FOSS4G. >>> >>> Cheers, >>> Volker >>> >> >
Re: multiview on github
I have made http://github.com/normanb/couchdb which is a fork of the latest couchdb trunk with the multiview code and tests added. If geocouch is available then it can still be used. There are a couple of questions about the multiview on the user /dev list so I will be adding some more test cases during today. thanks, Norman On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker wrote: > this is possible, I forked geocouch since I use it, but I have already > separated the geocouch dependencies from the trunk. > > I can do this tomorrow, certainly be interested in any feedback. > > thanks, > > Norman > > > > On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische > wrote: >> On 08/18/2010 03:26 AM, J Chris Anderson wrote: >>> >>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >>> Hi, I have made the changes as recommended, adding a test case multiview.js and also adding the userCtx to open the db. I have also forked geocouch and this is available here >>> >>> this patch seems important (especially as people are already asking for >>> help using it on user@) >>> >>> to get it committed, it either must remove the dependency on GeoCouch, or >>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. >>> >>> Is it possible / useful to make a version that doesn't use GeoCouch? And >>> then to make the GeoCouch capabilities part GeoCouch for now? >>> >>> Chris >>> >> >> Hi Norman, >> >> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to >> GeoCouch itself (as GeoCouch isn't ready for trunk yet). >> >> Lately I haven't been that responsive when it comes to GeoCouch, but that >> will change (in about a month) after holidays and FOSS4G. >> >> Cheers, >> Volker >> >
Re: multiview on github
this is possible, I forked geocouch since I use it, but I have already separated the geocouch dependencies from the trunk. I can do this tomorrow, certainly be interested in any feedback. thanks, Norman On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische wrote: > On 08/18/2010 03:26 AM, J Chris Anderson wrote: >> >> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: >> >>> Hi, >>> >>> I have made the changes as recommended, adding a test case >>> multiview.js and also adding the userCtx to open the db. >>> >>> I have also forked geocouch and this is available here >>> >> >> this patch seems important (especially as people are already asking for >> help using it on user@) >> >> to get it committed, it either must remove the dependency on GeoCouch, or >> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. >> >> Is it possible / useful to make a version that doesn't use GeoCouch? And >> then to make the GeoCouch capabilities part GeoCouch for now? >> >> Chris >> > > Hi Norman, > > if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to > GeoCouch itself (as GeoCouch isn't ready for trunk yet). > > Lately I haven't been that responsive when it comes to GeoCouch, but that > will change (in about a month) after holidays and FOSS4G. > > Cheers, > Volker >
Re: multiview on github
On 08/18/2010 03:26 AM, J Chris Anderson wrote: On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: Hi, I have made the changes as recommended, adding a test case multiview.js and also adding the userCtx to open the db. I have also forked geocouch and this is available here this patch seems important (especially as people are already asking for help using it on user@) to get it committed, it either must remove the dependency on GeoCouch, or become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. Is it possible / useful to make a version that doesn't use GeoCouch? And then to make the GeoCouch capabilities part GeoCouch for now? Chris Hi Norman, if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to GeoCouch itself (as GeoCouch isn't ready for trunk yet). Lately I haven't been that responsive when it comes to GeoCouch, but that will change (in about a month) after holidays and FOSS4G. Cheers, Volker
Re: multiview on github
On Aug 16, 2010, at 4:38 PM, Norman Barker wrote: > Hi, > > I have made the changes as recommended, adding a test case > multiview.js and also adding the userCtx to open the db. > > I have also forked geocouch and this is available here > this patch seems important (especially as people are already asking for help using it on user@) to get it committed, it either must remove the dependency on GeoCouch, or become part of CouchDB when (and if) GeoCouch becomes part of CouchDB. Is it possible / useful to make a version that doesn't use GeoCouch? And then to make the GeoCouch capabilities part GeoCouch for now? Chris > http://github.com/normanb/couchdb > > ./bootstrap > ./configure > make dev > utils/run > > should do it and then the simple test case is available in Futon. > > The test case multiview.js takes two views which emit docs which run > from 0 .. 100, view 1 emits those documents with ids which are > multiples of 3, view 2 emits those which are multiples of 4. The > _multiview request is the intersection of view 1 and view 2 resulting > those documents whose ids are multiples of 12. > > Any comments appreciated, particular concerning the following; > > 1) in the module multiview, is there a quicker way to find the counts > from startkey to endkey rather than iterating? > 2) In the module couch_query_ring is there a quicker way to test for > inclusion rather than iterating? > > Many thanks, > > Norman > > On Fri, Aug 6, 2010 at 10:16 AM, Norman Barker > wrote: >> Chris, >> >> I will make those changes, it might be a couple of days as I am on travel. >> >> I will clone geocouch as a starting point and add javascript tests as >> you suggest. >> >> I am bench marking with around 1 docs and a couple of views >> (including geocouch), the main issue with the folding over every >> document to both find the number of docs in a view slice (between >> startkey and endkey) and then again to test inclusion between views. >> >> I am interested in taking this forward and appreciate any code feedback. >> >> thanks, >> >> Norman >> >> On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson wrote: >>> >>> On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote: >>> Hi Norman, I still plan to look at your code, I know the others here are fairly busy too, sorry for the review delay :) >>> >>> The code looks clean (but could use better comments about where in the flow >>> each module comes into play). I don't think we can guess about performance, >>> instead we should benchmark to make sure the ring approach is right. >>> >>> In CouchDB currently, it is possible to isolate requests against a single >>> db. So you use the security settings to prevent access to databases, etc. >>> For this, using the userCtx and switching away from couch_db:open_int() >>> would make a big difference. >>> >>> This way people can query across dbs if they have read access to all of >>> them. >>> >>> I think if you package this as a CouchDB fork on Github and add a few >>> JavaScript tests, it will be really useful for some folks. I like that it >>> has geo support. Maybe we can target it for inclusion in trunk just after >>> GeoCouch goes in trunk (if Volker wants to put it in.) >>> >>> Also, for realtime hacking on this, you might find that the #couchdb IRC >>> channel on Freenode is a good place to solicit feedback. There are a lot of >>> people on there doing Geo things that would benefit from this. (They really >>> wanna be able to intersect a Geo query with a Map Reduce query, etc.) >>> >>> Chris >>> Cheers Jan -- On 5 Aug 2010, at 18:12, Norman Barker wrote: > Hi, > > is there any interest in the multiview, I have fixed (3) below, but am > still interested in approaches for (1) and (2). > > thanks, > > Norman > > On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker > wrote: >> Hi, >> >> a very initial version of the multiview is at >> http://github.com/normanb/couchdb-multiview for discussion. >> >> The views are intersected by using a ring of processes where each node >> in the ring represents a view as follows; >> >> % send an id from the start list to the next node in the ring, if the >> id is in adjacent node then this node sends to the next ring node >> % if the id gets all round the ring and back to the start node then it >> has intersected all queries and should be included. The nodes in the >> ring >> % should be sorted in size from small to large for this to be effective >> % >> % In addition send the initial id list round in parallel >> >> this is implemented in the couch_query_ring module. >> >> I have a couple of questions >> >> 1) in the module multiview, is there a quicker way to find the counts >> from startkey to endkey rather than iterating? >> 2) In the module couch_q
Re: multiview on github
Hi, I have made the changes as recommended, adding a test case multiview.js and also adding the userCtx to open the db. I have also forked geocouch and this is available here http://github.com/normanb/couchdb ./bootstrap ./configure make dev utils/run should do it and then the simple test case is available in Futon. The test case multiview.js takes two views which emit docs which run from 0 .. 100, view 1 emits those documents with ids which are multiples of 3, view 2 emits those which are multiples of 4. The _multiview request is the intersection of view 1 and view 2 resulting those documents whose ids are multiples of 12. Any comments appreciated, particular concerning the following; 1) in the module multiview, is there a quicker way to find the counts from startkey to endkey rather than iterating? 2) In the module couch_query_ring is there a quicker way to test for inclusion rather than iterating? Many thanks, Norman On Fri, Aug 6, 2010 at 10:16 AM, Norman Barker wrote: > Chris, > > I will make those changes, it might be a couple of days as I am on travel. > > I will clone geocouch as a starting point and add javascript tests as > you suggest. > > I am bench marking with around 1 docs and a couple of views > (including geocouch), the main issue with the folding over every > document to both find the number of docs in a view slice (between > startkey and endkey) and then again to test inclusion between views. > > I am interested in taking this forward and appreciate any code feedback. > > thanks, > > Norman > > On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson wrote: >> >> On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote: >> >>> Hi Norman, >>> >>> I still plan to look at your code, I know the others here >>> are fairly busy too, sorry for the review delay :) >>> >> >> The code looks clean (but could use better comments about where in the flow >> each module comes into play). I don't think we can guess about performance, >> instead we should benchmark to make sure the ring approach is right. >> >> In CouchDB currently, it is possible to isolate requests against a single >> db. So you use the security settings to prevent access to databases, etc. >> For this, using the userCtx and switching away from couch_db:open_int() >> would make a big difference. >> >> This way people can query across dbs if they have read access to all of them. >> >> I think if you package this as a CouchDB fork on Github and add a few >> JavaScript tests, it will be really useful for some folks. I like that it >> has geo support. Maybe we can target it for inclusion in trunk just after >> GeoCouch goes in trunk (if Volker wants to put it in.) >> >> Also, for realtime hacking on this, you might find that the #couchdb IRC >> channel on Freenode is a good place to solicit feedback. There are a lot of >> people on there doing Geo things that would benefit from this. (They really >> wanna be able to intersect a Geo query with a Map Reduce query, etc.) >> >> Chris >> >>> Cheers >>> Jan >>> -- >>> >>> >>> On 5 Aug 2010, at 18:12, Norman Barker wrote: >>> Hi, is there any interest in the multiview, I have fixed (3) below, but am still interested in approaches for (1) and (2). thanks, Norman On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker wrote: > Hi, > > a very initial version of the multiview is at > http://github.com/normanb/couchdb-multiview for discussion. > > The views are intersected by using a ring of processes where each node > in the ring represents a view as follows; > > % send an id from the start list to the next node in the ring, if the > id is in adjacent node then this node sends to the next ring node > % if the id gets all round the ring and back to the start node then it > has intersected all queries and should be included. The nodes in the > ring > % should be sorted in size from small to large for this to be effective > % > % In addition send the initial id list round in parallel > > this is implemented in the couch_query_ring module. > > I have a couple of questions > > 1) in the module multiview, is there a quicker way to find the counts > from startkey to endkey rather than iterating? > 2) In the module couch_query_ring is there a quicker way to test for > inclusion rather than iterating? > 3) Finally, if I hit this concurrently I get an exception, > > [error] [<0.201.0>] Uncaught error in HTTP request: {exit, > {noproc, > {gen_server,call, > > (so ignore my previous email, I am able to trap the msg) > > I am going to look into (3) but if you have seen this before. > > I am developing on windows, but also test on linux I will work on > getting a linux makefile, but the Makefile.win should be a start. > > Any help an
Re: multiview on github
Chris, I will make those changes, it might be a couple of days as I am on travel. I will clone geocouch as a starting point and add javascript tests as you suggest. I am bench marking with around 1 docs and a couple of views (including geocouch), the main issue with the folding over every document to both find the number of docs in a view slice (between startkey and endkey) and then again to test inclusion between views. I am interested in taking this forward and appreciate any code feedback. thanks, Norman On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson wrote: > > On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote: > >> Hi Norman, >> >> I still plan to look at your code, I know the others here >> are fairly busy too, sorry for the review delay :) >> > > The code looks clean (but could use better comments about where in the flow > each module comes into play). I don't think we can guess about performance, > instead we should benchmark to make sure the ring approach is right. > > In CouchDB currently, it is possible to isolate requests against a single db. > So you use the security settings to prevent access to databases, etc. For > this, using the userCtx and switching away from couch_db:open_int() would > make a big difference. > > This way people can query across dbs if they have read access to all of them. > > I think if you package this as a CouchDB fork on Github and add a few > JavaScript tests, it will be really useful for some folks. I like that it has > geo support. Maybe we can target it for inclusion in trunk just after > GeoCouch goes in trunk (if Volker wants to put it in.) > > Also, for realtime hacking on this, you might find that the #couchdb IRC > channel on Freenode is a good place to solicit feedback. There are a lot of > people on there doing Geo things that would benefit from this. (They really > wanna be able to intersect a Geo query with a Map Reduce query, etc.) > > Chris > >> Cheers >> Jan >> -- >> >> >> On 5 Aug 2010, at 18:12, Norman Barker wrote: >> >>> Hi, >>> >>> is there any interest in the multiview, I have fixed (3) below, but am >>> still interested in approaches for (1) and (2). >>> >>> thanks, >>> >>> Norman >>> >>> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker >>> wrote: Hi, a very initial version of the multiview is at http://github.com/normanb/couchdb-multiview for discussion. The views are intersected by using a ring of processes where each node in the ring represents a view as follows; % send an id from the start list to the next node in the ring, if the id is in adjacent node then this node sends to the next ring node % if the id gets all round the ring and back to the start node then it has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in parallel this is implemented in the couch_query_ring module. I have a couple of questions 1) in the module multiview, is there a quicker way to find the counts from startkey to endkey rather than iterating? 2) In the module couch_query_ring is there a quicker way to test for inclusion rather than iterating? 3) Finally, if I hit this concurrently I get an exception, [error] [<0.201.0>] Uncaught error in HTTP request: {exit, {noproc, {gen_server,call, (so ignore my previous email, I am able to trap the msg) I am going to look into (3) but if you have seen this before. I am developing on windows, but also test on linux I will work on getting a linux makefile, but the Makefile.win should be a start. Any help and comments appreciated. Norman >> > >
Re: multiview on github
Hi Norman, wow, I didn't know it hat GeoCouch support. Sounds great! I need to have a closer look. Not just now (sorry for that). Cheers, Volker On 08/06/2010 02:26 AM, J Chris Anderson wrote: The code looks clean (but could use better comments about where in the flow each module comes into play). I don't think we can guess about performance, instead we should benchmark to make sure the ring approach is right. In CouchDB currently, it is possible to isolate requests against a single db. So you use the security settings to prevent access to databases, etc. For this, using the userCtx and switching away from couch_db:open_int() would make a big difference. This way people can query across dbs if they have read access to all of them. I think if you package this as a CouchDB fork on Github and add a few JavaScript tests, it will be really useful for some folks. I like that it has geo support. Maybe we can target it for inclusion in trunk just after GeoCouch goes in trunk (if Volker wants to put it in.) Also, for realtime hacking on this, you might find that the #couchdb IRC channel on Freenode is a good place to solicit feedback. There are a lot of people on there doing Geo things that would benefit from this. (They really wanna be able to intersect a Geo query with a Map Reduce query, etc.) Chris On 5 Aug 2010, at 18:12, Norman Barker wrote: Hi, is there any interest in the multiview, I have fixed (3) below, but am still interested in approaches for (1) and (2). thanks, Norman On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker wrote: Hi, a very initial version of the multiview is at http://github.com/normanb/couchdb-multiview for discussion. The views are intersected by using a ring of processes where each node in the ring represents a view as follows; % send an id from the start list to the next node in the ring, if the id is in adjacent node then this node sends to the next ring node % if the id gets all round the ring and back to the start node then it has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in parallel this is implemented in the couch_query_ring module. I have a couple of questions 1) in the module multiview, is there a quicker way to find the counts from startkey to endkey rather than iterating? 2) In the module couch_query_ring is there a quicker way to test for inclusion rather than iterating? 3) Finally, if I hit this concurrently I get an exception, [error] [<0.201.0>] Uncaught error in HTTP request: {exit, {noproc, {gen_server,call, (so ignore my previous email, I am able to trap the msg) I am going to look into (3) but if you have seen this before. I am developing on windows, but also test on linux I will work on getting a linux makefile, but the Makefile.win should be a start. Any help and comments appreciated. Norman
Re: multiview on github
On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote: > Hi Norman, > > I still plan to look at your code, I know the others here > are fairly busy too, sorry for the review delay :) > The code looks clean (but could use better comments about where in the flow each module comes into play). I don't think we can guess about performance, instead we should benchmark to make sure the ring approach is right. In CouchDB currently, it is possible to isolate requests against a single db. So you use the security settings to prevent access to databases, etc. For this, using the userCtx and switching away from couch_db:open_int() would make a big difference. This way people can query across dbs if they have read access to all of them. I think if you package this as a CouchDB fork on Github and add a few JavaScript tests, it will be really useful for some folks. I like that it has geo support. Maybe we can target it for inclusion in trunk just after GeoCouch goes in trunk (if Volker wants to put it in.) Also, for realtime hacking on this, you might find that the #couchdb IRC channel on Freenode is a good place to solicit feedback. There are a lot of people on there doing Geo things that would benefit from this. (They really wanna be able to intersect a Geo query with a Map Reduce query, etc.) Chris > Cheers > Jan > -- > > > On 5 Aug 2010, at 18:12, Norman Barker wrote: > >> Hi, >> >> is there any interest in the multiview, I have fixed (3) below, but am >> still interested in approaches for (1) and (2). >> >> thanks, >> >> Norman >> >> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker >> wrote: >>> Hi, >>> >>> a very initial version of the multiview is at >>> http://github.com/normanb/couchdb-multiview for discussion. >>> >>> The views are intersected by using a ring of processes where each node >>> in the ring represents a view as follows; >>> >>> % send an id from the start list to the next node in the ring, if the >>> id is in adjacent node then this node sends to the next ring node >>> % if the id gets all round the ring and back to the start node then it >>> has intersected all queries and should be included. The nodes in the >>> ring >>> % should be sorted in size from small to large for this to be effective >>> % >>> % In addition send the initial id list round in parallel >>> >>> this is implemented in the couch_query_ring module. >>> >>> I have a couple of questions >>> >>> 1) in the module multiview, is there a quicker way to find the counts >>> from startkey to endkey rather than iterating? >>> 2) In the module couch_query_ring is there a quicker way to test for >>> inclusion rather than iterating? >>> 3) Finally, if I hit this concurrently I get an exception, >>> >>> [error] [<0.201.0>] Uncaught error in HTTP request: {exit, >>>{noproc, >>> {gen_server,call, >>> >>> (so ignore my previous email, I am able to trap the msg) >>> >>> I am going to look into (3) but if you have seen this before. >>> >>> I am developing on windows, but also test on linux I will work on >>> getting a linux makefile, but the Makefile.win should be a start. >>> >>> Any help and comments appreciated. >>> >>> Norman >>> >
Re: multiview on github
Hi Norman, I still plan to look at your code, I know the others here are fairly busy too, sorry for the review delay :) Cheers Jan -- On 5 Aug 2010, at 18:12, Norman Barker wrote: > Hi, > > is there any interest in the multiview, I have fixed (3) below, but am > still interested in approaches for (1) and (2). > > thanks, > > Norman > > On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker > wrote: >> Hi, >> >> a very initial version of the multiview is at >> http://github.com/normanb/couchdb-multiview for discussion. >> >> The views are intersected by using a ring of processes where each node >> in the ring represents a view as follows; >> >> % send an id from the start list to the next node in the ring, if the >> id is in adjacent node then this node sends to the next ring node >> % if the id gets all round the ring and back to the start node then it >> has intersected all queries and should be included. The nodes in the >> ring >> % should be sorted in size from small to large for this to be effective >> % >> % In addition send the initial id list round in parallel >> >> this is implemented in the couch_query_ring module. >> >> I have a couple of questions >> >> 1) in the module multiview, is there a quicker way to find the counts >> from startkey to endkey rather than iterating? >> 2) In the module couch_query_ring is there a quicker way to test for >> inclusion rather than iterating? >> 3) Finally, if I hit this concurrently I get an exception, >> >> [error] [<0.201.0>] Uncaught error in HTTP request: {exit, >> {noproc, >> {gen_server,call, >> >> (so ignore my previous email, I am able to trap the msg) >> >> I am going to look into (3) but if you have seen this before. >> >> I am developing on windows, but also test on linux I will work on >> getting a linux makefile, but the Makefile.win should be a start. >> >> Any help and comments appreciated. >> >> Norman >>
Re: multiview on github
Hi, is there any interest in the multiview, I have fixed (3) below, but am still interested in approaches for (1) and (2). thanks, Norman On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker wrote: > Hi, > > a very initial version of the multiview is at > http://github.com/normanb/couchdb-multiview for discussion. > > The views are intersected by using a ring of processes where each node > in the ring represents a view as follows; > > % send an id from the start list to the next node in the ring, if the > id is in adjacent node then this node sends to the next ring node > % if the id gets all round the ring and back to the start node then it > has intersected all queries and should be included. The nodes in the > ring > % should be sorted in size from small to large for this to be effective > % > % In addition send the initial id list round in parallel > > this is implemented in the couch_query_ring module. > > I have a couple of questions > > 1) in the module multiview, is there a quicker way to find the counts > from startkey to endkey rather than iterating? > 2) In the module couch_query_ring is there a quicker way to test for > inclusion rather than iterating? > 3) Finally, if I hit this concurrently I get an exception, > > [error] [<0.201.0>] Uncaught error in HTTP request: {exit, > {noproc, > {gen_server,call, > > (so ignore my previous email, I am able to trap the msg) > > I am going to look into (3) but if you have seen this before. > > I am developing on windows, but also test on linux I will work on > getting a linux makefile, but the Makefile.win should be a start. > > Any help and comments appreciated. > > Norman >
multiview on github
Hi, a very initial version of the multiview is at http://github.com/normanb/couchdb-multiview for discussion. The views are intersected by using a ring of processes where each node in the ring represents a view as follows; % send an id from the start list to the next node in the ring, if the id is in adjacent node then this node sends to the next ring node % if the id gets all round the ring and back to the start node then it has intersected all queries and should be included. The nodes in the ring % should be sorted in size from small to large for this to be effective % % In addition send the initial id list round in parallel this is implemented in the couch_query_ring module. I have a couple of questions 1) in the module multiview, is there a quicker way to find the counts from startkey to endkey rather than iterating? 2) In the module couch_query_ring is there a quicker way to test for inclusion rather than iterating? 3) Finally, if I hit this concurrently I get an exception, [error] [<0.201.0>] Uncaught error in HTTP request: {exit, {noproc, {gen_server,call, (so ignore my previous email, I am able to trap the msg) I am going to look into (3) but if you have seen this before. I am developing on windows, but also test on linux I will work on getting a linux makefile, but the Makefile.win should be a start. Any help and comments appreciated. Norman