Re: multiview on github

2010-09-21 Thread Randall Leeds
On Tue, Sep 21, 2010 at 18:32, Robert Newson  wrote:
> "Becoming a committer is as easy as writing enough accepted patches
> that everyone gets tired of applying them for you."

==

awesome

So. I think you're safe, Bob.

>
> So it's not because we're awesome? I'm crushed.
>
> B.


Re: multiview on github

2010-09-21 Thread Robert Newson
"Becoming a committer is as easy as writing enough accepted patches
that everyone gets tired of applying them for you."

So it's not because we're awesome? I'm crushed.

B.

On Tue, Sep 21, 2010 at 5:19 PM, Paul Davis  wrote:
>> 1) How do you get a row count with a view for a startkey and endkey
>> that would solve one of my problems?
>
> Looks like we don't have an API for it yet, but the basic idea is that
> you run a reduce with the given query parameters to get this info. In
> all views there's a built-in reduce function that does row counting,
> so its just a matter of exposing an API to query this. There use to be
> an example in couch_db.erl that did this with just a startkey for
> enum_docs_since but it appears to have changed to be more complicated
> for _changes.
>
>> 2) How do you test for document id inclusion in the results of a view?
>
> How do you mean? I'm proposing the bloom filter method which is just a
> constant-space set data-structure that can be used to test for
> existence of a key. The first draft implementation would just stream a
> query to build a bloom filter for each query.
>
>
>> 
>> fti and spatial code is only called if the query asks for it, I will
>> look into this.
>
> I'm not sure on how best to handle this, I just know that I really
> don't like seeing spatial/fti specific code in trunk when the spatial
> and fti code is not.
>
>> 
>> ok, it is really unclear in couchdb when to use supervisor,
>> gen_servers, I wrote multiview as a gen_server since I thought it
>> similar to an EJB and encapsulated unit of work that I wanted to
>> delegate tasks to and not hog the HTTP process.
>>
>> Saying that if couch_query_rings use gen_server delegates as you
>> recommend below then that will achieve that goal.
>
> Its a bit complicated and end the end comes down to just having the
> experience. Though its important to remember that Erlang processes are
> extremely lightweight. Doing operations directly in the HTTP request
> processes is fine because each request has its own process (well,
> keep-alive requests re-use the process, but that's orthogonal).
>
> Whether or not the ring uses a gen_server the idea was just to
> abstract the different query nodes in the ring as a Pid which should
> make the code cleaner and easier to understand as well as allow for
> the other query types to be added in dynamically.
>
>> 
>> plugins would be good, but honestly it isn't hard to change local.ini,
>> With the multiview I would rather see focus on external
>> http_db_handlers such as FTI and getting them streaming the results
>> rather than having to write a complete result on one stdio line.
>>
>> I would like this is trunk mainly because I want to hack on trunk and
>> to do that I need to be a committer :-) Plugins work fine.
>
> When I say plugins, I'm generally just referring to formalizing how
> external code should integrate with CouchDB. Ie, making use of
> default.d instead of editing default.ini or local.ini directly.
>
> As to updating the external API, there was some talk at CouchCamp on
> changing the current system to allow a bit more flexibility to this by
> giving couchdb a reverse proxy system for externals instead of using a
> stdio protocol. If we did that, then multiview could just define a
> simple api that various external indexers could choose to support. And
> the same would work for internal indexers as well.
>
> Becoming a committer is as easy as writing enough accepted patches
> that everyone gets tired of applying them for you. We're always
> looking for more help.
>
> HTH,
> Paul Davis
>


Re: multiview on github

2010-09-21 Thread Paul Davis
> 1) How do you get a row count with a view for a startkey and endkey
> that would solve one of my problems?

Looks like we don't have an API for it yet, but the basic idea is that
you run a reduce with the given query parameters to get this info. In
all views there's a built-in reduce function that does row counting,
so its just a matter of exposing an API to query this. There use to be
an example in couch_db.erl that did this with just a startkey for
enum_docs_since but it appears to have changed to be more complicated
for _changes.

> 2) How do you test for document id inclusion in the results of a view?

How do you mean? I'm proposing the bloom filter method which is just a
constant-space set data-structure that can be used to test for
existence of a key. The first draft implementation would just stream a
query to build a bloom filter for each query.


> 
> fti and spatial code is only called if the query asks for it, I will
> look into this.

I'm not sure on how best to handle this, I just know that I really
don't like seeing spatial/fti specific code in trunk when the spatial
and fti code is not.

> 
> ok, it is really unclear in couchdb when to use supervisor,
> gen_servers, I wrote multiview as a gen_server since I thought it
> similar to an EJB and encapsulated unit of work that I wanted to
> delegate tasks to and not hog the HTTP process.
>
> Saying that if couch_query_rings use gen_server delegates as you
> recommend below then that will achieve that goal.

Its a bit complicated and end the end comes down to just having the
experience. Though its important to remember that Erlang processes are
extremely lightweight. Doing operations directly in the HTTP request
processes is fine because each request has its own process (well,
keep-alive requests re-use the process, but that's orthogonal).

Whether or not the ring uses a gen_server the idea was just to
abstract the different query nodes in the ring as a Pid which should
make the code cleaner and easier to understand as well as allow for
the other query types to be added in dynamically.

> 
> plugins would be good, but honestly it isn't hard to change local.ini,
> With the multiview I would rather see focus on external
> http_db_handlers such as FTI and getting them streaming the results
> rather than having to write a complete result on one stdio line.
>
> I would like this is trunk mainly because I want to hack on trunk and
> to do that I need to be a committer :-) Plugins work fine.

When I say plugins, I'm generally just referring to formalizing how
external code should integrate with CouchDB. Ie, making use of
default.d instead of editing default.ini or local.ini directly.

As to updating the external API, there was some talk at CouchCamp on
changing the current system to allow a bit more flexibility to this by
giving couchdb a reverse proxy system for externals instead of using a
stdio protocol. If we did that, then multiview could just define a
simple api that various external indexers could choose to support. And
the same would work for internal indexers as well.

Becoming a committer is as easy as writing enough accepted patches
that everyone gets tired of applying them for you. We're always
looking for more help.

HTH,
Paul Davis


Re: multiview on github

2010-09-21 Thread Norman Barker
Paul,

fantastic, thanks for the feedback and you aren't b*tching this is
what I wanted, comments inline.

On Tue, Sep 21, 2010 at 9:37 AM, Paul Davis  wrote:
> Norman,
>
> Sorry its taken me so long to review this code. In its current form I
> would have to -1 adding the current implementation to trunk for a
> couple reasons. I'm roughly +0 on the general outline of the algorithm
> for future inclusion, but I'll discuss that below.
>
> The biggest issue that jumps out is that its unbounded in its use of
> memory. If I'm reading this code correctly, each view/spatial/fti
> query grabs its entire list of document id's and creates a record that
> stores this list. Then you create a ring of processes that then copies
> these lists possibly multiple times and in the worst way as the larger
> the list, the more times its copied. Then inside the ring the queries
> are being re-run for each test of an id being present which is
> confuses me because they could be using the list of id's that were
> calculated during the calls to multiview:query_view_count/3. Granted I
> could be reading this wrong, but its a bit hard to follow in places.
> Also, at least for fti and views, you don't actually need to enumerate
> the entire thing to get a row count as they can both report a count
> efficiently. I'm not sure about spatial, but even if it can't yet, I
> would imagine it could be implemented.
>

only with external is the entire list being held in memory, the code
streams the results, at most one id in memory at any one time.

1) How do you get a row count with a view for a startkey and endkey
that would solve one of my problems?

2) How do you test for document id inclusion in the results of a view?


> And now for a list of nits about mechanics:
>
> The source code for this patch is completely unlike anything else in
> CouchDB. There are lots of differences that add up to make this alone
> reason to prevent it from entering into trunk:
>
> The file headers in source files should be removed and replaced with
> ASF license headers.
> Source files must be less than eighty columns wide.
> You've accidentally committed local_dev.ini and etc/init/couchdb.

Consider that done, I will add this on the next commit.

> I'd like to see more tests in the futon tests.

ok

> If this ends up in trunk, it will not be able to depend on the spatial
> and fti handlers existing if they're not also in trunk. This might be
> solvable with an abstraction that can be dynamically added if they're
> present.

fti and spatial code is only called if the query asks for it, I will
look into this.

> AFAICT, error reporting doesn't seem to exist, and it looks like
> there's a lot of new surface area for generating errors.
> The supervisor/gen_server pattern that's going on here doesn't appear
> to have a reason. As in, I can't see a reason the gen_server even
> needs to exist. Just make the multiview:query call from the HTTP
> process.

ok, it is really unclear in couchdb when to use supervisor,
gen_servers, I wrote multiview as a gen_server since I thought it
similar to an EJB and encapsulated unit of work that I wanted to
delegate tasks to and not hog the HTTP process.

Saying that if couch_query_rings use gen_server delegates as you
recommend below then that will achieve that goal.


> In Erlang, the term Node generally refers to a remote VM. Using the
> variable Node in your query ring code confused me greatly until I
> realized it was just pid's.

ok, I will change this

> You should generally avoid raw message passing in Erlang. Using a
> gen_server for each of the different ring members depending on query
> type would be more appropriate.

ok, I will mod this as well.

> If you are using gen_servers, you should fill out more of the
> callbacks to do meaningful things, ie, logging and/or dying on
> unexpected messages. Silently ignoring that sort of thing can lead to
> very hard to track bugs.

ok

> Module names should be prefixed with couchdb_ if they're going into
> trunk as part of couchdb.
> I'm not sure I like the generous use of pmap and friends. I understand
> that it'd ideally reduce latency, but at the burden of reducing a
> node's ability to handle concurrency. Not sure on the best solution to
> this though.
> In the couple places that have the big case statements for handling
> each type of view query, I'd transform those into functions to make
> things easier to follow.

thanks, I am all for code clarity, thanks for the feedback.

> Support for view parameters is limited to startkey and endkey. At the
> very least, start_docid and end_docid should be supported. The other
> various parameters affecting collation should also probably be
> supported. limit and count would be nice. I'm sure there are probably
> others too, but there are also ones that probably don't need to be
> included.
> How should reduces be handled, if at all? I don't see them being
> handled now, but I can assure you that people will want some sort of
> support 

Re: multiview on github

2010-09-21 Thread Paul Davis
Norman,

Sorry its taken me so long to review this code. In its current form I
would have to -1 adding the current implementation to trunk for a
couple reasons. I'm roughly +0 on the general outline of the algorithm
for future inclusion, but I'll discuss that below.

The biggest issue that jumps out is that its unbounded in its use of
memory. If I'm reading this code correctly, each view/spatial/fti
query grabs its entire list of document id's and creates a record that
stores this list. Then you create a ring of processes that then copies
these lists possibly multiple times and in the worst way as the larger
the list, the more times its copied. Then inside the ring the queries
are being re-run for each test of an id being present which is
confuses me because they could be using the list of id's that were
calculated during the calls to multiview:query_view_count/3. Granted I
could be reading this wrong, but its a bit hard to follow in places.
Also, at least for fti and views, you don't actually need to enumerate
the entire thing to get a row count as they can both report a count
efficiently. I'm not sure about spatial, but even if it can't yet, I
would imagine it could be implemented.

And now for a list of nits about mechanics:

The source code for this patch is completely unlike anything else in
CouchDB. There are lots of differences that add up to make this alone
reason to prevent it from entering into trunk:

The file headers in source files should be removed and replaced with
ASF license headers.
Source files must be less than eighty columns wide.
You've accidentally committed local_dev.ini and etc/init/couchdb.
I'd like to see more tests in the futon tests.
If this ends up in trunk, it will not be able to depend on the spatial
and fti handlers existing if they're not also in trunk. This might be
solvable with an abstraction that can be dynamically added if they're
present.
AFAICT, error reporting doesn't seem to exist, and it looks like
there's a lot of new surface area for generating errors.
The supervisor/gen_server pattern that's going on here doesn't appear
to have a reason. As in, I can't see a reason the gen_server even
needs to exist. Just make the multiview:query call from the HTTP
process.
In Erlang, the term Node generally refers to a remote VM. Using the
variable Node in your query ring code confused me greatly until I
realized it was just pid's.
You should generally avoid raw message passing in Erlang. Using a
gen_server for each of the different ring members depending on query
type would be more appropriate.
If you are using gen_servers, you should fill out more of the
callbacks to do meaningful things, ie, logging and/or dying on
unexpected messages. Silently ignoring that sort of thing can lead to
very hard to track bugs.
Module names should be prefixed with couchdb_ if they're going into
trunk as part of couchdb.
I'm not sure I like the generous use of pmap and friends. I understand
that it'd ideally reduce latency, but at the burden of reducing a
node's ability to handle concurrency. Not sure on the best solution to
this though.
In the couple places that have the big case statements for handling
each type of view query, I'd transform those into functions to make
things easier to follow.
Support for view parameters is limited to startkey and endkey. At the
very least, start_docid and end_docid should be supported. The other
various parameters affecting collation should also probably be
supported. limit and count would be nice. I'm sure there are probably
others too, but there are also ones that probably don't need to be
included.
How should reduces be handled, if at all? I don't see them being
handled now, but I can assure you that people will want some sort of
support if this goes into trunk.
Passing the view groups between processes does not seem like a good
idea. I'd have to look back at the view_group code to double check
that though.

Now I'll stop bitching and tell you that there is actually some hope
and I'm intrigued where this could go.

The current algorithm structure you have is pretty interesting. I
think with a couple improvements it would go along way.

If I were to write this, I would start by cleaning up the row counts
code to give a quicker response without iterating each query. Once you
have the row counts, for every query except the largest, iterate over
the output to generate a bloom filter of id's. contained in that view
query. Then to send data to the client you just iterate over the
largest query checking that the id is in each of the bloom filters.

For a NIF version of Bloom filters, check here:
http://github.com/basho/ebloom There's also a blog post by Jonathan
Ellis from the Cassandra group that gives some pretty good details on
Bloom filters: 
http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html
Also there's probably a wikipedia page.

This layout also gives the ability to perform future optimizations
that would make things even more qui

Re: multiview on github

2010-09-20 Thread Norman Barker
Bob,

thanks, that is interesting. I will checkout your code and see if I
can get it working, I wrote couchdb-clucene and am interested in a
lightweight text search for couchdb. I also liked your work with
ontylog, but I can't mix GPL with anything I am doing.

Norman

On Mon, Sep 20, 2010 at 7:22 PM, Robert Dionne
 wrote:
> Norman,
>
>  Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. 
> Think of it as the contents of my /etc directory.
>
>  The indexer I'm chipping away at is just a proof of concept hacked up from 
> Joe Armstrong's Erlang book (with his permission). Anyone is welcome to use 
> it that as they see fit, though it does have restrictions from Armstrong 
> press. It's been great for me to learn erlang and explore the couch 
> internals. It's also nice to have something nice and light running in couch.
>
>  My thoughts about plugins have nothing to do with licenses. I'd like the 
> fact that couchdb is simple and lean and more rock solid. I'm not sure 
> multiview, geocouch, fti, or any other indexers belong in the core. With 
> multiview I think there's perhaps something more general that might be part 
> of core but I haven't given it a lot of thought yet.
>
> Cheers,
>
> Bob
>
>
>
>
> On Sep 20, 2010, at 7:02 PM, Norman Barker wrote:
>
>> Bob,
>>
>> I can see why plugins might work for you since your ontology /
>> indexing code is GPL, however I am more than happy for the multiview
>> to be apache licensed and would like to see it in trunk.
>>
>> I like the concept of plugins as it creates a stable API for third
>> parties, but I think a multiview is a core feature of CouchDB.
>>
>> Norman
>>
>> On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne
>>  wrote:
>>> I see, neat.
>>>
>>> I ask because you might treat disjunction and conjunction  differently in 
>>> terms of whether you run around the ring or broadcast to all the nodes. For 
>>> conjunctions you need all to succeed so broadcast might fare better whereas 
>>> for disjunctions only one need succeed. I suppose it would depend largely 
>>> on the number of views and the amount of each computation.
>>>
>>> Anyway I guess I have mixed feelings about seeing this in core. I see a lot 
>>> of folks already struggling to get their arms around working with 
>>> map/reduce. It would make a good plugin for advanced users. Actually the 
>>> ability to have plugins is almost there now. I have an indexer that only 
>>> requires some ini file mods and getting the code on the classpath. I think 
>>> all that's needed at this point is:
>>>
>>> 1. conventions for a plugins directory
>>>
>>> 2. way of specing gen_servers in order to supervise them
>>>
>>> 3. some apis around some of the internals.
>>>
>>> I'm oversimplifying it for sure, the devils in the details and it's the 
>>> kind of thing programmers love to argue about ad nauseum but no one wants 
>>> to do it (myself included :)
>>>
>>> Best,
>>>
>>> Bob
>>>
>>>
>>>
>>> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:
>>>
 Bob,

 it is just checking that a given id participates in a view, if it
 makes it around the ring then it wins and gets streamed to the client,
 adding disjoints would be fairly simple. Currently the only way I can
 check if an id is in a view is to loop over the results of each view,
 hence each node in the ring is in its own process to keep things
 moving.

 A use case is two views, one that emits datetime (numeric) and another
 view that emits values, e.g. A, B, C ..., the query would then be to
 find the all documents with value A between start time and end time.

 Norman

 On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
  wrote:
> I took another peek at this and I'm curious as to what it's doing. Is it 
> just checking that a given id participates in a view? So if it makes it 
> around the ring it wins? Or is it actually computing the result of 
> passing the doc thru all the views?
>
> If the answer is the former then would disjunction also be something one 
> might want? I'm just curious, I don't have a use case and I forget the 
> original discussion around this. I sort of think of views as a functional 
> mapping from the database to some subset. That's not entirely accurate 
> given there's this reduce phase also. So I could imagine composing views 
> in a functional way, but the same thing can be had with just a different 
> map function that is the composition.
>
> Anyway if you have a brief description of this, with a use case,  it 
> would help.
>
> Cheers,
>
> Bob
>
>
>
>
> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>
>> Chris, James
>>
>> thanks for bumping this, we are using this internally at 'scale'
>> (million+ keys). I want this to work for couchdb as we want to give
>> back for such a great product and support this going forward, so any

Re: multiview on github

2010-09-20 Thread Robert Dionne
Norman,

  Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. 
Think of it as the contents of my /etc directory.

  The indexer I'm chipping away at is just a proof of concept hacked up from 
Joe Armstrong's Erlang book (with his permission). Anyone is welcome to use it 
that as they see fit, though it does have restrictions from Armstrong press. 
It's been great for me to learn erlang and explore the couch internals. It's 
also nice to have something nice and light running in couch.

  My thoughts about plugins have nothing to do with licenses. I'd like the fact 
that couchdb is simple and lean and more rock solid. I'm not sure multiview, 
geocouch, fti, or any other indexers belong in the core. With multiview I think 
there's perhaps something more general that might be part of core but I haven't 
given it a lot of thought yet.

Cheers,

Bob




On Sep 20, 2010, at 7:02 PM, Norman Barker wrote:

> Bob,
> 
> I can see why plugins might work for you since your ontology /
> indexing code is GPL, however I am more than happy for the multiview
> to be apache licensed and would like to see it in trunk.
> 
> I like the concept of plugins as it creates a stable API for third
> parties, but I think a multiview is a core feature of CouchDB.
> 
> Norman
> 
> On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne
>  wrote:
>> I see, neat.
>> 
>> I ask because you might treat disjunction and conjunction  differently in 
>> terms of whether you run around the ring or broadcast to all the nodes. For 
>> conjunctions you need all to succeed so broadcast might fare better whereas 
>> for disjunctions only one need succeed. I suppose it would depend largely on 
>> the number of views and the amount of each computation.
>> 
>> Anyway I guess I have mixed feelings about seeing this in core. I see a lot 
>> of folks already struggling to get their arms around working with 
>> map/reduce. It would make a good plugin for advanced users. Actually the 
>> ability to have plugins is almost there now. I have an indexer that only 
>> requires some ini file mods and getting the code on the classpath. I think 
>> all that's needed at this point is:
>> 
>> 1. conventions for a plugins directory
>> 
>> 2. way of specing gen_servers in order to supervise them
>> 
>> 3. some apis around some of the internals.
>> 
>> I'm oversimplifying it for sure, the devils in the details and it's the kind 
>> of thing programmers love to argue about ad nauseum but no one wants to do 
>> it (myself included :)
>> 
>> Best,
>> 
>> Bob
>> 
>> 
>> 
>> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:
>> 
>>> Bob,
>>> 
>>> it is just checking that a given id participates in a view, if it
>>> makes it around the ring then it wins and gets streamed to the client,
>>> adding disjoints would be fairly simple. Currently the only way I can
>>> check if an id is in a view is to loop over the results of each view,
>>> hence each node in the ring is in its own process to keep things
>>> moving.
>>> 
>>> A use case is two views, one that emits datetime (numeric) and another
>>> view that emits values, e.g. A, B, C ..., the query would then be to
>>> find the all documents with value A between start time and end time.
>>> 
>>> Norman
>>> 
>>> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
>>>  wrote:
 I took another peek at this and I'm curious as to what it's doing. Is it 
 just checking that a given id participates in a view? So if it makes it 
 around the ring it wins? Or is it actually computing the result of passing 
 the doc thru all the views?
 
 If the answer is the former then would disjunction also be something one 
 might want? I'm just curious, I don't have a use case and I forget the 
 original discussion around this. I sort of think of views as a functional 
 mapping from the database to some subset. That's not entirely accurate 
 given there's this reduce phase also. So I could imagine composing views 
 in a functional way, but the same thing can be had with just a different 
 map function that is the composition.
 
 Anyway if you have a brief description of this, with a use case,  it would 
 help.
 
 Cheers,
 
 Bob
 
 
 
 
 On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
 
> Chris, James
> 
> thanks for bumping this, we are using this internally at 'scale'
> (million+ keys). I want this to work for couchdb as we want to give
> back for such a great product and support this going forward, so any
> suggestions welcomed and we will test and add them to the local github
> account with the aim of getting this into trunk.
> 
> Norman
> 
> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
> wrote:
>> I want to use it!  I just haven't gotten around to it.  I was going to 
>> try
>> and test it out this weekend and if I am able, I will certainly report 
>> back
>> what I find.

Re: multiview on github

2010-09-20 Thread Norman Barker
Bob,

I can see why plugins might work for you since your ontology /
indexing code is GPL, however I am more than happy for the multiview
to be apache licensed and would like to see it in trunk.

I like the concept of plugins as it creates a stable API for third
parties, but I think a multiview is a core feature of CouchDB.

Norman

On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne
 wrote:
> I see, neat.
>
> I ask because you might treat disjunction and conjunction  differently in 
> terms of whether you run around the ring or broadcast to all the nodes. For 
> conjunctions you need all to succeed so broadcast might fare better whereas 
> for disjunctions only one need succeed. I suppose it would depend largely on 
> the number of views and the amount of each computation.
>
> Anyway I guess I have mixed feelings about seeing this in core. I see a lot 
> of folks already struggling to get their arms around working with map/reduce. 
> It would make a good plugin for advanced users. Actually the ability to have 
> plugins is almost there now. I have an indexer that only requires some ini 
> file mods and getting the code on the classpath. I think all that's needed at 
> this point is:
>
> 1. conventions for a plugins directory
>
> 2. way of specing gen_servers in order to supervise them
>
> 3. some apis around some of the internals.
>
> I'm oversimplifying it for sure, the devils in the details and it's the kind 
> of thing programmers love to argue about ad nauseum but no one wants to do it 
> (myself included :)
>
> Best,
>
> Bob
>
>
>
> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:
>
>> Bob,
>>
>> it is just checking that a given id participates in a view, if it
>> makes it around the ring then it wins and gets streamed to the client,
>> adding disjoints would be fairly simple. Currently the only way I can
>> check if an id is in a view is to loop over the results of each view,
>> hence each node in the ring is in its own process to keep things
>> moving.
>>
>> A use case is two views, one that emits datetime (numeric) and another
>> view that emits values, e.g. A, B, C ..., the query would then be to
>> find the all documents with value A between start time and end time.
>>
>> Norman
>>
>> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
>>  wrote:
>>> I took another peek at this and I'm curious as to what it's doing. Is it 
>>> just checking that a given id participates in a view? So if it makes it 
>>> around the ring it wins? Or is it actually computing the result of passing 
>>> the doc thru all the views?
>>>
>>> If the answer is the former then would disjunction also be something one 
>>> might want? I'm just curious, I don't have a use case and I forget the 
>>> original discussion around this. I sort of think of views as a functional 
>>> mapping from the database to some subset. That's not entirely accurate 
>>> given there's this reduce phase also. So I could imagine composing views in 
>>> a functional way, but the same thing can be had with just a different map 
>>> function that is the composition.
>>>
>>> Anyway if you have a brief description of this, with a use case,  it would 
>>> help.
>>>
>>> Cheers,
>>>
>>> Bob
>>>
>>>
>>>
>>>
>>> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>>>
 Chris, James

 thanks for bumping this, we are using this internally at 'scale'
 (million+ keys). I want this to work for couchdb as we want to give
 back for such a great product and support this going forward, so any
 suggestions welcomed and we will test and add them to the local github
 account with the aim of getting this into trunk.

 Norman

 On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
 wrote:
> I want to use it!  I just haven't gotten around to it.  I was going to try
> and test it out this weekend and if I am able, I will certainly report 
> back
> what I find.
>
> James
>
> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>
>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
>> wrote:
>>> Bob,
>>>
>>> I can and have been testing the multiview at this scale, it is ok
>>> (fast enough), but I think being able to test inclusion of a document
>>> id in a view without having to loop would be a considerable speed
>>> improvement. If you have any ideas let me know.
>>>
>>
>> I just want to bump this thread, as I think this is a useful feature.
>> I don't expect to be able to test it in the coming weeks, but if I did
>> I would. Is anyone besides Norman using this? Has anyone used it at
>> scale?
>>
>> Cheers,
>> Chris
>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
>>> 
>> wrote:
 I'm sorry, I've had no time to play with this at scale.

 On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
 
>> wrote:
> Hi,
>
> are th

Re: multiview on github

2010-09-20 Thread Robert Dionne
I see, neat. 

I ask because you might treat disjunction and conjunction  differently in terms 
of whether you run around the ring or broadcast to all the nodes. For 
conjunctions you need all to succeed so broadcast might fare better whereas for 
disjunctions only one need succeed. I suppose it would depend largely on the 
number of views and the amount of each computation.

Anyway I guess I have mixed feelings about seeing this in core. I see a lot of 
folks already struggling to get their arms around working with map/reduce. It 
would make a good plugin for advanced users. Actually the ability to have 
plugins is almost there now. I have an indexer that only requires some ini file 
mods and getting the code on the classpath. I think all that's needed at this 
point is:

1. conventions for a plugins directory

2. way of specing gen_servers in order to supervise them

3. some apis around some of the internals.

I'm oversimplifying it for sure, the devils in the details and it's the kind of 
thing programmers love to argue about ad nauseum but no one wants to do it 
(myself included :)

Best,

Bob



On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:

> Bob,
> 
> it is just checking that a given id participates in a view, if it
> makes it around the ring then it wins and gets streamed to the client,
> adding disjoints would be fairly simple. Currently the only way I can
> check if an id is in a view is to loop over the results of each view,
> hence each node in the ring is in its own process to keep things
> moving.
> 
> A use case is two views, one that emits datetime (numeric) and another
> view that emits values, e.g. A, B, C ..., the query would then be to
> find the all documents with value A between start time and end time.
> 
> Norman
> 
> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
>  wrote:
>> I took another peek at this and I'm curious as to what it's doing. Is it 
>> just checking that a given id participates in a view? So if it makes it 
>> around the ring it wins? Or is it actually computing the result of passing 
>> the doc thru all the views?
>> 
>> If the answer is the former then would disjunction also be something one 
>> might want? I'm just curious, I don't have a use case and I forget the 
>> original discussion around this. I sort of think of views as a functional 
>> mapping from the database to some subset. That's not entirely accurate given 
>> there's this reduce phase also. So I could imagine composing views in a 
>> functional way, but the same thing can be had with just a different map 
>> function that is the composition.
>> 
>> Anyway if you have a brief description of this, with a use case,  it would 
>> help.
>> 
>> Cheers,
>> 
>> Bob
>> 
>> 
>> 
>> 
>> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>> 
>>> Chris, James
>>> 
>>> thanks for bumping this, we are using this internally at 'scale'
>>> (million+ keys). I want this to work for couchdb as we want to give
>>> back for such a great product and support this going forward, so any
>>> suggestions welcomed and we will test and add them to the local github
>>> account with the aim of getting this into trunk.
>>> 
>>> Norman
>>> 
>>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
>>> wrote:
 I want to use it!  I just haven't gotten around to it.  I was going to try
 and test it out this weekend and if I am able, I will certainly report back
 what I find.
 
 James
 
 On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
 
> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
> wrote:
>> Bob,
>> 
>> I can and have been testing the multiview at this scale, it is ok
>> (fast enough), but I think being able to test inclusion of a document
>> id in a view without having to loop would be a considerable speed
>> improvement. If you have any ideas let me know.
>> 
> 
> I just want to bump this thread, as I think this is a useful feature.
> I don't expect to be able to test it in the coming weeks, but if I did
> I would. Is anyone besides Norman using this? Has anyone used it at
> scale?
> 
> Cheers,
> Chris
> 
>> thanks,
>> 
>> Norman
>> 
>> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
> wrote:
>>> I'm sorry, I've had no time to play with this at scale.
>>> 
>>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
> wrote:
 Hi,
 
 are there any more comments on this, if not can you describe the
 process (in particular how to obtain a wiki and jira account for
 couchdb which I have been unable to do) and I will start documenting
 this so we can put this into the trunk.
 
 Bob, were you able to do any more testing with large views, are there
 any suggestions on how to speed up the document id inclusion test as
 described below?
 
 thanks,
 
 Norman
 
 On Mon, A

Re: multiview on github

2010-09-19 Thread Norman Barker
Bob,

it is just checking that a given id participates in a view, if it
makes it around the ring then it wins and gets streamed to the client,
adding disjoints would be fairly simple. Currently the only way I can
check if an id is in a view is to loop over the results of each view,
hence each node in the ring is in its own process to keep things
moving.

A use case is two views, one that emits datetime (numeric) and another
view that emits values, e.g. A, B, C ..., the query would then be to
find the all documents with value A between start time and end time.

Norman

On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
 wrote:
> I took another peek at this and I'm curious as to what it's doing. Is it just 
> checking that a given id participates in a view? So if it makes it around the 
> ring it wins? Or is it actually computing the result of passing the doc thru 
> all the views?
>
> If the answer is the former then would disjunction also be something one 
> might want? I'm just curious, I don't have a use case and I forget the 
> original discussion around this. I sort of think of views as a functional 
> mapping from the database to some subset. That's not entirely accurate given 
> there's this reduce phase also. So I could imagine composing views in a 
> functional way, but the same thing can be had with just a different map 
> function that is the composition.
>
> Anyway if you have a brief description of this, with a use case,  it would 
> help.
>
> Cheers,
>
> Bob
>
>
>
>
> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>
>> Chris, James
>>
>> thanks for bumping this, we are using this internally at 'scale'
>> (million+ keys). I want this to work for couchdb as we want to give
>> back for such a great product and support this going forward, so any
>> suggestions welcomed and we will test and add them to the local github
>> account with the aim of getting this into trunk.
>>
>> Norman
>>
>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
>> wrote:
>>> I want to use it!  I just haven't gotten around to it.  I was going to try
>>> and test it out this weekend and if I am able, I will certainly report back
>>> what I find.
>>>
>>> James
>>>
>>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>>>
 On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
 wrote:
> Bob,
>
> I can and have been testing the multiview at this scale, it is ok
> (fast enough), but I think being able to test inclusion of a document
> id in a view without having to loop would be a considerable speed
> improvement. If you have any ideas let me know.
>

 I just want to bump this thread, as I think this is a useful feature.
 I don't expect to be able to test it in the coming weeks, but if I did
 I would. Is anyone besides Norman using this? Has anyone used it at
 scale?

 Cheers,
 Chris

> thanks,
>
> Norman
>
> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
 wrote:
>> I'm sorry, I've had no time to play with this at scale.
>>
>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
 wrote:
>>> Hi,
>>>
>>> are there any more comments on this, if not can you describe the
>>> process (in particular how to obtain a wiki and jira account for
>>> couchdb which I have been unable to do) and I will start documenting
>>> this so we can put this into the trunk.
>>>
>>> Bob, were you able to do any more testing with large views, are there
>>> any suggestions on how to speed up the document id inclusion test as
>>> described below?
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
 norman.bar...@gmail.com> wrote:
 Bob,

 thanks for the feedback and for taking a look at the code. Guidelines
 on when to use a supervisor within couchdb with a gen_server would be
 appreciated, currently I have a supervisor and a gen_server, but if
 couchdb has a supervision process I could remove that layer.

 I think plugins is a great idea, however intersection of views is such
 as common request, perhaps there needs to plugin system and if a
 plugin is rated enough it goes into trunk as a core feature.

 the four (or slightly more) summary is here


 http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl

 %
 % send an id from the start list to the next node in the ring, if the
 id is in adjacent node then the this node sends to the next ring node
 
 % if the id gets all round the ring and back to the start node then is
 has intersected all queries and should be included. The nodes in the
 ring
 % should be sorted in size from small to large for this to be
 effective
 %
 % In addition send the initial id list round in par

Re: multiview on github

2010-09-19 Thread Robert Dionne
I took another peek at this and I'm curious as to what it's doing. Is it just 
checking that a given id participates in a view? So if it makes it around the 
ring it wins? Or is it actually computing the result of passing the doc thru 
all the views?

If the answer is the former then would disjunction also be something one might 
want? I'm just curious, I don't have a use case and I forget the original 
discussion around this. I sort of think of views as a functional mapping from 
the database to some subset. That's not entirely accurate given there's this 
reduce phase also. So I could imagine composing views in a functional way, but 
the same thing can be had with just a different map function that is the 
composition.

Anyway if you have a brief description of this, with a use case,  it would help.

Cheers,

Bob




On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:

> Chris, James
> 
> thanks for bumping this, we are using this internally at 'scale'
> (million+ keys). I want this to work for couchdb as we want to give
> back for such a great product and support this going forward, so any
> suggestions welcomed and we will test and add them to the local github
> account with the aim of getting this into trunk.
> 
> Norman
> 
> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
> wrote:
>> I want to use it!  I just haven't gotten around to it.  I was going to try
>> and test it out this weekend and if I am able, I will certainly report back
>> what I find.
>> 
>> James
>> 
>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>> 
>>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
>>> wrote:
 Bob,
 
 I can and have been testing the multiview at this scale, it is ok
 (fast enough), but I think being able to test inclusion of a document
 id in a view without having to loop would be a considerable speed
 improvement. If you have any ideas let me know.
 
>>> 
>>> I just want to bump this thread, as I think this is a useful feature.
>>> I don't expect to be able to test it in the coming weeks, but if I did
>>> I would. Is anyone besides Norman using this? Has anyone used it at
>>> scale?
>>> 
>>> Cheers,
>>> Chris
>>> 
 thanks,
 
 Norman
 
 On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
>>> wrote:
> I'm sorry, I've had no time to play with this at scale.
> 
> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
>>> wrote:
>> Hi,
>> 
>> are there any more comments on this, if not can you describe the
>> process (in particular how to obtain a wiki and jira account for
>> couchdb which I have been unable to do) and I will start documenting
>> this so we can put this into the trunk.
>> 
>> Bob, were you able to do any more testing with large views, are there
>> any suggestions on how to speed up the document id inclusion test as
>> described below?
>> 
>> thanks,
>> 
>> Norman
>> 
>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
>>> norman.bar...@gmail.com> wrote:
>>> Bob,
>>> 
>>> thanks for the feedback and for taking a look at the code. Guidelines
>>> on when to use a supervisor within couchdb with a gen_server would be
>>> appreciated, currently I have a supervisor and a gen_server, but if
>>> couchdb has a supervision process I could remove that layer.
>>> 
>>> I think plugins is a great idea, however intersection of views is such
>>> as common request, perhaps there needs to plugin system and if a
>>> plugin is rated enough it goes into trunk as a core feature.
>>> 
>>> the four (or slightly more) summary is here
>>> 
>>> 
>>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>> 
>>> %
>>> % send an id from the start list to the next node in the ring, if the
>>> id is in adjacent node then the this node sends to the next ring node
>>> 
>>> % if the id gets all round the ring and back to the start node then is
>>> has intersected all queries and should be included. The nodes in the
>>> ring
>>> % should be sorted in size from small to large for this to be
>>> effective
>>> %
>>> % In addition send the initial id list round in parallel
>>> 
>>> it really needs some eyes from the core couchdb coders to see how to
>>> speed up the inclusion testing, looping is bad even if it is done in
>>> parallel.
>>> 
>>> Multiview is usable, I am using it with some pretty big mega-views (as
>>> per the raindrop) model, I am also available to add features to this
>>> as this is core part of our work and we want to give it to couch as a
>>> contribution.
>>> 
>>> thanks,
>>> 
>>> Norman
>>> 
>>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>>  wrote:
 Hi Norman,
 
  I took a peek at multiview. I haven't followed this too closely on
>>> the mailing list but this is *view intersection

Re: multiview on github

2010-09-17 Thread Norman Barker
Chris, James

thanks for bumping this, we are using this internally at 'scale'
(million+ keys). I want this to work for couchdb as we want to give
back for such a great product and support this going forward, so any
suggestions welcomed and we will test and add them to the local github
account with the aim of getting this into trunk.

Norman

On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  wrote:
> I want to use it!  I just haven't gotten around to it.  I was going to try
> and test it out this weekend and if I am able, I will certainly report back
> what I find.
>
> James
>
> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>
>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
>> wrote:
>> > Bob,
>> >
>> > I can and have been testing the multiview at this scale, it is ok
>> > (fast enough), but I think being able to test inclusion of a document
>> > id in a view without having to loop would be a considerable speed
>> > improvement. If you have any ideas let me know.
>> >
>>
>> I just want to bump this thread, as I think this is a useful feature.
>> I don't expect to be able to test it in the coming weeks, but if I did
>> I would. Is anyone besides Norman using this? Has anyone used it at
>> scale?
>>
>> Cheers,
>> Chris
>>
>> > thanks,
>> >
>> > Norman
>> >
>> > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
>> wrote:
>> >> I'm sorry, I've had no time to play with this at scale.
>> >>
>> >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
>> wrote:
>> >>> Hi,
>> >>>
>> >>> are there any more comments on this, if not can you describe the
>> >>> process (in particular how to obtain a wiki and jira account for
>> >>> couchdb which I have been unable to do) and I will start documenting
>> >>> this so we can put this into the trunk.
>> >>>
>> >>> Bob, were you able to do any more testing with large views, are there
>> >>> any suggestions on how to speed up the document id inclusion test as
>> >>> described below?
>> >>>
>> >>> thanks,
>> >>>
>> >>> Norman
>> >>>
>> >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
>> norman.bar...@gmail.com> wrote:
>>  Bob,
>> 
>>  thanks for the feedback and for taking a look at the code. Guidelines
>>  on when to use a supervisor within couchdb with a gen_server would be
>>  appreciated, currently I have a supervisor and a gen_server, but if
>>  couchdb has a supervision process I could remove that layer.
>> 
>>  I think plugins is a great idea, however intersection of views is such
>>  as common request, perhaps there needs to plugin system and if a
>>  plugin is rated enough it goes into trunk as a core feature.
>> 
>>  the four (or slightly more) summary is here
>> 
>> 
>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>> 
>>  %
>>  % send an id from the start list to the next node in the ring, if the
>>  id is in adjacent node then the this node sends to the next ring node
>>  
>>  % if the id gets all round the ring and back to the start node then is
>>  has intersected all queries and should be included. The nodes in the
>>  ring
>>  % should be sorted in size from small to large for this to be
>> effective
>>  %
>>  % In addition send the initial id list round in parallel
>> 
>>  it really needs some eyes from the core couchdb coders to see how to
>>  speed up the inclusion testing, looping is bad even if it is done in
>>  parallel.
>> 
>>  Multiview is usable, I am using it with some pretty big mega-views (as
>>  per the raindrop) model, I am also available to add features to this
>>  as this is core part of our work and we want to give it to couch as a
>>  contribution.
>> 
>>  thanks,
>> 
>>  Norman
>> 
>>  On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>   wrote:
>> > Hi Norman,
>> >
>> >  I took a peek at multiview. I haven't followed this too closely on
>> the mailing list but this is *view intersection*? Is there a 5 line summary
>> of what this does somewhere?
>> >
>> >  I'm curious as to why the daemon needs to be a supervisor, most if
>> not all of the other daemons are gen_servers. OTP allows this but I think
>> this is a good area where some CouchDB guidelines on plugins would apply.
>> >
>> >  It strikes me that views, the use of map/reduce, etc. are one of the
>> trickier aspects of using CouchDB, particularly for new users coming from
>> the SQL world. People are also reporting issues with performance of views, I
>> guess often because reduce functions go out of control.
>> >
>> >  I think the project would be better served if features like this
>> were available as plugins. I would put GeoCouch in the same category. Its
>> very neat and timely (given everyone wants to know where everyone else is
>> using their telephone but without talking other than asynchronously), but a
>> server plugin architecture that woul

Re: multiview on github

2010-09-17 Thread James Hayton
I want to use it!  I just haven't gotten around to it.  I was going to try
and test it out this weekend and if I am able, I will certainly report back
what I find.

James

On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:

> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
> wrote:
> > Bob,
> >
> > I can and have been testing the multiview at this scale, it is ok
> > (fast enough), but I think being able to test inclusion of a document
> > id in a view without having to loop would be a considerable speed
> > improvement. If you have any ideas let me know.
> >
>
> I just want to bump this thread, as I think this is a useful feature.
> I don't expect to be able to test it in the coming weeks, but if I did
> I would. Is anyone besides Norman using this? Has anyone used it at
> scale?
>
> Cheers,
> Chris
>
> > thanks,
> >
> > Norman
> >
> > On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
> wrote:
> >> I'm sorry, I've had no time to play with this at scale.
> >>
> >> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
> wrote:
> >>> Hi,
> >>>
> >>> are there any more comments on this, if not can you describe the
> >>> process (in particular how to obtain a wiki and jira account for
> >>> couchdb which I have been unable to do) and I will start documenting
> >>> this so we can put this into the trunk.
> >>>
> >>> Bob, were you able to do any more testing with large views, are there
> >>> any suggestions on how to speed up the document id inclusion test as
> >>> described below?
> >>>
> >>> thanks,
> >>>
> >>> Norman
> >>>
> >>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
> norman.bar...@gmail.com> wrote:
>  Bob,
> 
>  thanks for the feedback and for taking a look at the code. Guidelines
>  on when to use a supervisor within couchdb with a gen_server would be
>  appreciated, currently I have a supervisor and a gen_server, but if
>  couchdb has a supervision process I could remove that layer.
> 
>  I think plugins is a great idea, however intersection of views is such
>  as common request, perhaps there needs to plugin system and if a
>  plugin is rated enough it goes into trunk as a core feature.
> 
>  the four (or slightly more) summary is here
> 
> 
> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
> 
>  %
>  % send an id from the start list to the next node in the ring, if the
>  id is in adjacent node then the this node sends to the next ring node
>  
>  % if the id gets all round the ring and back to the start node then is
>  has intersected all queries and should be included. The nodes in the
>  ring
>  % should be sorted in size from small to large for this to be
> effective
>  %
>  % In addition send the initial id list round in parallel
> 
>  it really needs some eyes from the core couchdb coders to see how to
>  speed up the inclusion testing, looping is bad even if it is done in
>  parallel.
> 
>  Multiview is usable, I am using it with some pretty big mega-views (as
>  per the raindrop) model, I am also available to add features to this
>  as this is core part of our work and we want to give it to couch as a
>  contribution.
> 
>  thanks,
> 
>  Norman
> 
>  On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>   wrote:
> > Hi Norman,
> >
> >  I took a peek at multiview. I haven't followed this too closely on
> the mailing list but this is *view intersection*? Is there a 5 line summary
> of what this does somewhere?
> >
> >  I'm curious as to why the daemon needs to be a supervisor, most if
> not all of the other daemons are gen_servers. OTP allows this but I think
> this is a good area where some CouchDB guidelines on plugins would apply.
> >
> >  It strikes me that views, the use of map/reduce, etc. are one of the
> trickier aspects of using CouchDB, particularly for new users coming from
> the SQL world. People are also reporting issues with performance of views, I
> guess often because reduce functions go out of control.
> >
> >  I think the project would be better served if features like this
> were available as plugins. I would put GeoCouch in the same category. Its
> very neat and timely (given everyone wants to know where everyone else is
> using their telephone but without talking other than asynchronously), but a
> server plugin architecture that would allow this to be done cleanly should
> come first.
> >
> >  This is just my opinion. I'd love to see some of the project
> founders and committers weigh in on this and set some direction.
> >
> > Best regards,
> >
> > Bob
> >
> >
> >
> >
> >
> > On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
> >
> >> I would like to take this multiview code and have it added to trunk
> if
> >> possible, what are the next steps?
> >>
> >> thanks,
> >>
> >>>

Re: multiview on github

2010-09-17 Thread Chris Anderson
On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker  wrote:
> Bob,
>
> I can and have been testing the multiview at this scale, it is ok
> (fast enough), but I think being able to test inclusion of a document
> id in a view without having to loop would be a considerable speed
> improvement. If you have any ideas let me know.
>

I just want to bump this thread, as I think this is a useful feature.
I don't expect to be able to test it in the coming weeks, but if I did
I would. Is anyone besides Norman using this? Has anyone used it at
scale?

Cheers,
Chris

> thanks,
>
> Norman
>
> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson  
> wrote:
>> I'm sorry, I've had no time to play with this at scale.
>>
>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker  
>> wrote:
>>> Hi,
>>>
>>> are there any more comments on this, if not can you describe the
>>> process (in particular how to obtain a wiki and jira account for
>>> couchdb which I have been unable to do) and I will start documenting
>>> this so we can put this into the trunk.
>>>
>>> Bob, were you able to do any more testing with large views, are there
>>> any suggestions on how to speed up the document id inclusion test as
>>> described below?
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker  
>>> wrote:
 Bob,

 thanks for the feedback and for taking a look at the code. Guidelines
 on when to use a supervisor within couchdb with a gen_server would be
 appreciated, currently I have a supervisor and a gen_server, but if
 couchdb has a supervision process I could remove that layer.

 I think plugins is a great idea, however intersection of views is such
 as common request, perhaps there needs to plugin system and if a
 plugin is rated enough it goes into trunk as a core feature.

 the four (or slightly more) summary is here

 http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl

 %
 % send an id from the start list to the next node in the ring, if the
 id is in adjacent node then the this node sends to the next ring node
 
 % if the id gets all round the ring and back to the start node then is
 has intersected all queries and should be included. The nodes in the
 ring
 % should be sorted in size from small to large for this to be effective
 %
 % In addition send the initial id list round in parallel

 it really needs some eyes from the core couchdb coders to see how to
 speed up the inclusion testing, looping is bad even if it is done in
 parallel.

 Multiview is usable, I am using it with some pretty big mega-views (as
 per the raindrop) model, I am also available to add features to this
 as this is core part of our work and we want to give it to couch as a
 contribution.

 thanks,

 Norman

 On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
  wrote:
> Hi Norman,
>
>  I took a peek at multiview. I haven't followed this too closely on the 
> mailing list but this is *view intersection*? Is there a 5 line summary 
> of what this does somewhere?
>
>  I'm curious as to why the daemon needs to be a supervisor, most if not 
> all of the other daemons are gen_servers. OTP allows this but I think 
> this is a good area where some CouchDB guidelines on plugins would apply.
>
>  It strikes me that views, the use of map/reduce, etc. are one of the 
> trickier aspects of using CouchDB, particularly for new users coming from 
> the SQL world. People are also reporting issues with performance of 
> views, I guess often because reduce functions go out of control.
>
>  I think the project would be better served if features like this were 
> available as plugins. I would put GeoCouch in the same category. Its very 
> neat and timely (given everyone wants to know where everyone else is 
> using their telephone but without talking other than asynchronously), but 
> a server plugin architecture that would allow this to be done cleanly 
> should come first.
>
>  This is just my opinion. I'd love to see some of the project founders 
> and committers weigh in on this and set some direction.
>
> Best regards,
>
> Bob
>
>
>
>
>
> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
>
>> I would like to take this multiview code and have it added to trunk if
>> possible, what are the next steps?
>>
>> thanks,
>>
>> Norman
>>
>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker 
>>  wrote:
>>> I have made
>>>
>>> http://github.com/normanb/couchdb
>>>
>>> which is a fork of the latest couchdb trunk with the multiview code
>>> and tests added.
>>>
>>> If geocouch is available then it can still be used.
>>>
>>> There are a couple of questions about the multiview on th

Re: multiview on github

2010-08-30 Thread Norman Barker
Bob,

I can and have been testing the multiview at this scale, it is ok
(fast enough), but I think being able to test inclusion of a document
id in a view without having to loop would be a considerable speed
improvement. If you have any ideas let me know.

thanks,

Norman

On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson  wrote:
> I'm sorry, I've had no time to play with this at scale.
>
> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker  
> wrote:
>> Hi,
>>
>> are there any more comments on this, if not can you describe the
>> process (in particular how to obtain a wiki and jira account for
>> couchdb which I have been unable to do) and I will start documenting
>> this so we can put this into the trunk.
>>
>> Bob, were you able to do any more testing with large views, are there
>> any suggestions on how to speed up the document id inclusion test as
>> described below?
>>
>> thanks,
>>
>> Norman
>>
>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker  
>> wrote:
>>> Bob,
>>>
>>> thanks for the feedback and for taking a look at the code. Guidelines
>>> on when to use a supervisor within couchdb with a gen_server would be
>>> appreciated, currently I have a supervisor and a gen_server, but if
>>> couchdb has a supervision process I could remove that layer.
>>>
>>> I think plugins is a great idea, however intersection of views is such
>>> as common request, perhaps there needs to plugin system and if a
>>> plugin is rated enough it goes into trunk as a core feature.
>>>
>>> the four (or slightly more) summary is here
>>>
>>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>>
>>> %
>>> % send an id from the start list to the next node in the ring, if the
>>> id is in adjacent node then the this node sends to the next ring node
>>> 
>>> % if the id gets all round the ring and back to the start node then is
>>> has intersected all queries and should be included. The nodes in the
>>> ring
>>> % should be sorted in size from small to large for this to be effective
>>> %
>>> % In addition send the initial id list round in parallel
>>>
>>> it really needs some eyes from the core couchdb coders to see how to
>>> speed up the inclusion testing, looping is bad even if it is done in
>>> parallel.
>>>
>>> Multiview is usable, I am using it with some pretty big mega-views (as
>>> per the raindrop) model, I am also available to add features to this
>>> as this is core part of our work and we want to give it to couch as a
>>> contribution.
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>>  wrote:
 Hi Norman,

  I took a peek at multiview. I haven't followed this too closely on the 
 mailing list but this is *view intersection*? Is there a 5 line summary of 
 what this does somewhere?

  I'm curious as to why the daemon needs to be a supervisor, most if not 
 all of the other daemons are gen_servers. OTP allows this but I think this 
 is a good area where some CouchDB guidelines on plugins would apply.

  It strikes me that views, the use of map/reduce, etc. are one of the 
 trickier aspects of using CouchDB, particularly for new users coming from 
 the SQL world. People are also reporting issues with performance of views, 
 I guess often because reduce functions go out of control.

  I think the project would be better served if features like this were 
 available as plugins. I would put GeoCouch in the same category. Its very 
 neat and timely (given everyone wants to know where everyone else is using 
 their telephone but without talking other than asynchronously), but a 
 server plugin architecture that would allow this to be done cleanly should 
 come first.

  This is just my opinion. I'd love to see some of the project founders and 
 committers weigh in on this and set some direction.

 Best regards,

 Bob





 On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:

> I would like to take this multiview code and have it added to trunk if
> possible, what are the next steps?
>
> thanks,
>
> Norman
>
> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
> wrote:
>> I have made
>>
>> http://github.com/normanb/couchdb
>>
>> which is a fork of the latest couchdb trunk with the multiview code
>> and tests added.
>>
>> If geocouch is available then it can still be used.
>>
>> There are a couple of questions about the multiview on the user /dev
>> list so I will be adding some more test cases during today.
>>
>> thanks,
>>
>> Norman
>>
>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
>> wrote:
>>> this is possible, I forked geocouch since I use it, but I have already
>>> separated the geocouch dependencies from the trunk.
>>>
>>> I can do this tomorrow, certainly be interested in any feedback.
>>>
>

Re: multiview on github

2010-08-30 Thread Robert Newson
I'm sorry, I've had no time to play with this at scale.

On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker  wrote:
> Hi,
>
> are there any more comments on this, if not can you describe the
> process (in particular how to obtain a wiki and jira account for
> couchdb which I have been unable to do) and I will start documenting
> this so we can put this into the trunk.
>
> Bob, were you able to do any more testing with large views, are there
> any suggestions on how to speed up the document id inclusion test as
> described below?
>
> thanks,
>
> Norman
>
> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker  
> wrote:
>> Bob,
>>
>> thanks for the feedback and for taking a look at the code. Guidelines
>> on when to use a supervisor within couchdb with a gen_server would be
>> appreciated, currently I have a supervisor and a gen_server, but if
>> couchdb has a supervision process I could remove that layer.
>>
>> I think plugins is a great idea, however intersection of views is such
>> as common request, perhaps there needs to plugin system and if a
>> plugin is rated enough it goes into trunk as a core feature.
>>
>> the four (or slightly more) summary is here
>>
>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>
>> %
>> % send an id from the start list to the next node in the ring, if the
>> id is in adjacent node then the this node sends to the next ring node
>> 
>> % if the id gets all round the ring and back to the start node then is
>> has intersected all queries and should be included. The nodes in the
>> ring
>> % should be sorted in size from small to large for this to be effective
>> %
>> % In addition send the initial id list round in parallel
>>
>> it really needs some eyes from the core couchdb coders to see how to
>> speed up the inclusion testing, looping is bad even if it is done in
>> parallel.
>>
>> Multiview is usable, I am using it with some pretty big mega-views (as
>> per the raindrop) model, I am also available to add features to this
>> as this is core part of our work and we want to give it to couch as a
>> contribution.
>>
>> thanks,
>>
>> Norman
>>
>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>  wrote:
>>> Hi Norman,
>>>
>>>  I took a peek at multiview. I haven't followed this too closely on the 
>>> mailing list but this is *view intersection*? Is there a 5 line summary of 
>>> what this does somewhere?
>>>
>>>  I'm curious as to why the daemon needs to be a supervisor, most if not all 
>>> of the other daemons are gen_servers. OTP allows this but I think this is a 
>>> good area where some CouchDB guidelines on plugins would apply.
>>>
>>>  It strikes me that views, the use of map/reduce, etc. are one of the 
>>> trickier aspects of using CouchDB, particularly for new users coming from 
>>> the SQL world. People are also reporting issues with performance of views, 
>>> I guess often because reduce functions go out of control.
>>>
>>>  I think the project would be better served if features like this were 
>>> available as plugins. I would put GeoCouch in the same category. Its very 
>>> neat and timely (given everyone wants to know where everyone else is using 
>>> their telephone but without talking other than asynchronously), but a 
>>> server plugin architecture that would allow this to be done cleanly should 
>>> come first.
>>>
>>>  This is just my opinion. I'd love to see some of the project founders and 
>>> committers weigh in on this and set some direction.
>>>
>>> Best regards,
>>>
>>> Bob
>>>
>>>
>>>
>>>
>>>
>>> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
>>>
 I would like to take this multiview code and have it added to trunk if
 possible, what are the next steps?

 thanks,

 Norman

 On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
 wrote:
> I have made
>
> http://github.com/normanb/couchdb
>
> which is a fork of the latest couchdb trunk with the multiview code
> and tests added.
>
> If geocouch is available then it can still be used.
>
> There are a couple of questions about the multiview on the user /dev
> list so I will be adding some more test cases during today.
>
> thanks,
>
> Norman
>
> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
> wrote:
>> this is possible, I forked geocouch since I use it, but I have already
>> separated the geocouch dependencies from the trunk.
>>
>> I can do this tomorrow, certainly be interested in any feedback.
>>
>> thanks,
>>
>> Norman
>>
>>
>>
>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
>> wrote:
>>> On 08/18/2010 03:26 AM, J Chris Anderson wrote:

 On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:

> Hi,
>
> I have made the changes as recommended, adding a test case
> multiview.js and also adding the userCtx to open the db.
>
> I have also forked geoco

Re: multiview on github

2010-08-30 Thread Norman Barker
Hi,

are there any more comments on this, if not can you describe the
process (in particular how to obtain a wiki and jira account for
couchdb which I have been unable to do) and I will start documenting
this so we can put this into the trunk.

Bob, were you able to do any more testing with large views, are there
any suggestions on how to speed up the document id inclusion test as
described below?

thanks,

Norman

On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker  wrote:
> Bob,
>
> thanks for the feedback and for taking a look at the code. Guidelines
> on when to use a supervisor within couchdb with a gen_server would be
> appreciated, currently I have a supervisor and a gen_server, but if
> couchdb has a supervision process I could remove that layer.
>
> I think plugins is a great idea, however intersection of views is such
> as common request, perhaps there needs to plugin system and if a
> plugin is rated enough it goes into trunk as a core feature.
>
> the four (or slightly more) summary is here
>
> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>
> %
> % send an id from the start list to the next node in the ring, if the
> id is in adjacent node then the this node sends to the next ring node
> 
> % if the id gets all round the ring and back to the start node then is
> has intersected all queries and should be included. The nodes in the
> ring
> % should be sorted in size from small to large for this to be effective
> %
> % In addition send the initial id list round in parallel
>
> it really needs some eyes from the core couchdb coders to see how to
> speed up the inclusion testing, looping is bad even if it is done in
> parallel.
>
> Multiview is usable, I am using it with some pretty big mega-views (as
> per the raindrop) model, I am also available to add features to this
> as this is core part of our work and we want to give it to couch as a
> contribution.
>
> thanks,
>
> Norman
>
> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>  wrote:
>> Hi Norman,
>>
>>  I took a peek at multiview. I haven't followed this too closely on the 
>> mailing list but this is *view intersection*? Is there a 5 line summary of 
>> what this does somewhere?
>>
>>  I'm curious as to why the daemon needs to be a supervisor, most if not all 
>> of the other daemons are gen_servers. OTP allows this but I think this is a 
>> good area where some CouchDB guidelines on plugins would apply.
>>
>>  It strikes me that views, the use of map/reduce, etc. are one of the 
>> trickier aspects of using CouchDB, particularly for new users coming from 
>> the SQL world. People are also reporting issues with performance of views, I 
>> guess often because reduce functions go out of control.
>>
>>  I think the project would be better served if features like this were 
>> available as plugins. I would put GeoCouch in the same category. Its very 
>> neat and timely (given everyone wants to know where everyone else is using 
>> their telephone but without talking other than asynchronously), but a server 
>> plugin architecture that would allow this to be done cleanly should come 
>> first.
>>
>>  This is just my opinion. I'd love to see some of the project founders and 
>> committers weigh in on this and set some direction.
>>
>> Best regards,
>>
>> Bob
>>
>>
>>
>>
>>
>> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
>>
>>> I would like to take this multiview code and have it added to trunk if
>>> possible, what are the next steps?
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
>>> wrote:
 I have made

 http://github.com/normanb/couchdb

 which is a fork of the latest couchdb trunk with the multiview code
 and tests added.

 If geocouch is available then it can still be used.

 There are a couple of questions about the multiview on the user /dev
 list so I will be adding some more test cases during today.

 thanks,

 Norman

 On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
 wrote:
> this is possible, I forked geocouch since I use it, but I have already
> separated the geocouch dependencies from the trunk.
>
> I can do this tomorrow, certainly be interested in any feedback.
>
> thanks,
>
> Norman
>
>
>
> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
> wrote:
>> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>>
>>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>>
 Hi,

 I have made the changes as recommended, adding a test case
 multiview.js and also adding the userCtx to open the db.

 I have also forked geocouch and this is available here

>>>
>>> this patch seems important (especially as people are already asking for
>>> help using it on user@)
>>>
>>> to get it committed, it either must remove the dependency on GeoCouch, 
>>> or
>>> b

Re: multiview on github

2010-08-23 Thread Norman Barker
Bob,

thanks for the feedback and for taking a look at the code. Guidelines
on when to use a supervisor within couchdb with a gen_server would be
appreciated, currently I have a supervisor and a gen_server, but if
couchdb has a supervision process I could remove that layer.

I think plugins is a great idea, however intersection of views is such
as common request, perhaps there needs to plugin system and if a
plugin is rated enough it goes into trunk as a core feature.

the four (or slightly more) summary is here

http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl

%
% send an id from the start list to the next node in the ring, if the
id is in adjacent node then the this node sends to the next ring node

% if the id gets all round the ring and back to the start node then is
has intersected all queries and should be included. The nodes in the
ring
% should be sorted in size from small to large for this to be effective
%
% In addition send the initial id list round in parallel

it really needs some eyes from the core couchdb coders to see how to
speed up the inclusion testing, looping is bad even if it is done in
parallel.

Multiview is usable, I am using it with some pretty big mega-views (as
per the raindrop) model, I am also available to add features to this
as this is core part of our work and we want to give it to couch as a
contribution.

thanks,

Norman

On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
 wrote:
> Hi Norman,
>
>  I took a peek at multiview. I haven't followed this too closely on the 
> mailing list but this is *view intersection*? Is there a 5 line summary of 
> what this does somewhere?
>
>  I'm curious as to why the daemon needs to be a supervisor, most if not all 
> of the other daemons are gen_servers. OTP allows this but I think this is a 
> good area where some CouchDB guidelines on plugins would apply.
>
>  It strikes me that views, the use of map/reduce, etc. are one of the 
> trickier aspects of using CouchDB, particularly for new users coming from the 
> SQL world. People are also reporting issues with performance of views, I 
> guess often because reduce functions go out of control.
>
>  I think the project would be better served if features like this were 
> available as plugins. I would put GeoCouch in the same category. Its very 
> neat and timely (given everyone wants to know where everyone else is using 
> their telephone but without talking other than asynchronously), but a server 
> plugin architecture that would allow this to be done cleanly should come 
> first.
>
>  This is just my opinion. I'd love to see some of the project founders and 
> committers weigh in on this and set some direction.
>
> Best regards,
>
> Bob
>
>
>
>
>
> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
>
>> I would like to take this multiview code and have it added to trunk if
>> possible, what are the next steps?
>>
>> thanks,
>>
>> Norman
>>
>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
>> wrote:
>>> I have made
>>>
>>> http://github.com/normanb/couchdb
>>>
>>> which is a fork of the latest couchdb trunk with the multiview code
>>> and tests added.
>>>
>>> If geocouch is available then it can still be used.
>>>
>>> There are a couple of questions about the multiview on the user /dev
>>> list so I will be adding some more test cases during today.
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
>>> wrote:
 this is possible, I forked geocouch since I use it, but I have already
 separated the geocouch dependencies from the trunk.

 I can do this tomorrow, certainly be interested in any feedback.

 thanks,

 Norman



 On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
 wrote:
> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>
>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>
>>> Hi,
>>>
>>> I have made the changes as recommended, adding a test case
>>> multiview.js and also adding the userCtx to open the db.
>>>
>>> I have also forked geocouch and this is available here
>>>
>>
>> this patch seems important (especially as people are already asking for
>> help using it on user@)
>>
>> to get it committed, it either must remove the dependency on GeoCouch, or
>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>>
>> Is it possible / useful to make a version that doesn't use GeoCouch? And
>> then to make the GeoCouch capabilities part GeoCouch for now?
>>
>> Chris
>>
>
> Hi Norman,
>
> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>
> Lately I haven't been that responsive when it comes to GeoCouch, but that
> will change (in about a month) after holidays and FOSS4G.
>
> Cheers,
>  Volker
>

>>>
>
>


Re: multiview on github

2010-08-23 Thread Robert Dionne
Hi Norman,

  I took a peek at multiview. I haven't followed this too closely on the 
mailing list but this is *view intersection*? Is there a 5 line summary of what 
this does somewhere? 

  I'm curious as to why the daemon needs to be a supervisor, most if not all of 
the other daemons are gen_servers. OTP allows this but I think this is a good 
area where some CouchDB guidelines on plugins would apply.

  It strikes me that views, the use of map/reduce, etc. are one of the trickier 
aspects of using CouchDB, particularly for new users coming from the SQL world. 
People are also reporting issues with performance of views, I guess often 
because reduce functions go out of control.

  I think the project would be better served if features like this were 
available as plugins. I would put GeoCouch in the same category. Its very neat 
and timely (given everyone wants to know where everyone else is using their 
telephone but without talking other than asynchronously), but a server plugin 
architecture that would allow this to be done cleanly should come first.

  This is just my opinion. I'd love to see some of the project founders and 
committers weigh in on this and set some direction.

Best regards,

Bob


 


On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:

> I would like to take this multiview code and have it added to trunk if
> possible, what are the next steps?
> 
> thanks,
> 
> Norman
> 
> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
> wrote:
>> I have made
>> 
>> http://github.com/normanb/couchdb
>> 
>> which is a fork of the latest couchdb trunk with the multiview code
>> and tests added.
>> 
>> If geocouch is available then it can still be used.
>> 
>> There are a couple of questions about the multiview on the user /dev
>> list so I will be adding some more test cases during today.
>> 
>> thanks,
>> 
>> Norman
>> 
>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
>> wrote:
>>> this is possible, I forked geocouch since I use it, but I have already
>>> separated the geocouch dependencies from the trunk.
>>> 
>>> I can do this tomorrow, certainly be interested in any feedback.
>>> 
>>> thanks,
>>> 
>>> Norman
>>> 
>>> 
>>> 
>>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
>>> wrote:
 On 08/18/2010 03:26 AM, J Chris Anderson wrote:
> 
> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
> 
>> Hi,
>> 
>> I have made the changes as recommended, adding a test case
>> multiview.js and also adding the userCtx to open the db.
>> 
>> I have also forked geocouch and this is available here
>> 
> 
> this patch seems important (especially as people are already asking for
> help using it on user@)
> 
> to get it committed, it either must remove the dependency on GeoCouch, or
> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
> 
> Is it possible / useful to make a version that doesn't use GeoCouch? And
> then to make the GeoCouch capabilities part GeoCouch for now?
> 
> Chris
> 
 
 Hi Norman,
 
 if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
 GeoCouch itself (as GeoCouch isn't ready for trunk yet).
 
 Lately I haven't been that responsive when it comes to GeoCouch, but that
 will change (in about a month) after holidays and FOSS4G.
 
 Cheers,
  Volker
 
>>> 
>> 



Re: multiview on github

2010-08-22 Thread Norman Barker
that should be 'couchdb should not be in version control', sorry not
used to git.

On Sun, Aug 22, 2010 at 9:22 PM, Norman Barker  wrote:
> Bob,
>
> I am testing on 1+ documents, I appreciate that we need to
> establish when a multi-process as opposed to a tbd (suggestions
> welcome) approach is required. The startkey / endkey is an issue
> though, is there a better way to test inclusion?
>
> The speed of the multiview is directly linked to the size of the
> smallest view result though, so total documents isn't a factor.
>
> I am still thinking about fti, I am testing with clucene, but the
> external handler problem is the same, how to make it stream in order.
>
> I will fix the local_dev.ini problem tomorrow, couchdb should be in
> version control.
>
> Any hints on how to test inclusion are appreciated, it will greatly
> speed up collation.
>
> thanks,
>
> Norman
>
>
>
> On Sun, Aug 22, 2010 at 4:15 PM, Robert Newson  
> wrote:
>> I'm concerned about the performance of this on non-trivial databases,
>> given the iteration of all items between startkey and endkey. I don't
>> have time to test it this week but I'd be interested to hear the time
>> it took to do a multiview on two views of, say, a million rows each
>> (especially as compared to the two normal view calls).
>>
>> I was also intrigued to see the code handles fti too, a problem I have
>> spent some time thinking about without finding a satisfactorily
>> performant solution too. I note that, as written, it doesn't appear to
>> work because the fti call (I'm assuming couchdb-lucene) will only
>> return the top N matching hits, so at best you can filter those
>> against another view (perhaps that's useful?). The trick to merging a
>> view and an fti result together would be to get the results from both
>> in the same order and step through the rows, filtering as you go.
>> Sorting in Lucene has a large memory hit so I gave up on that
>> solution.
>>
>> Finally, your patch appears to add two generated files (local_dev.ini
>> and etc/init.d/couchdb) to the branch which should be fixed (add your
>> settings to default.init.tpl.in instead).
>>
>> I should end by saying that if the problems above can be solved then
>> this would be a very useful addition to CouchDB and one that is
>> frequently requested. It might also be a model for multi-machine
>> views.
>>
>> B.
>>
>> On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker  
>> wrote:
>>> I would like to take this multiview code and have it added to trunk if
>>> possible, what are the next steps?
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
>>> wrote:
 I have made

 http://github.com/normanb/couchdb

 which is a fork of the latest couchdb trunk with the multiview code
 and tests added.

 If geocouch is available then it can still be used.

 There are a couple of questions about the multiview on the user /dev
 list so I will be adding some more test cases during today.

 thanks,

 Norman

 On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
 wrote:
> this is possible, I forked geocouch since I use it, but I have already
> separated the geocouch dependencies from the trunk.
>
> I can do this tomorrow, certainly be interested in any feedback.
>
> thanks,
>
> Norman
>
>
>
> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
> wrote:
>> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>>
>>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>>
 Hi,

 I have made the changes as recommended, adding a test case
 multiview.js and also adding the userCtx to open the db.

 I have also forked geocouch and this is available here

>>>
>>> this patch seems important (especially as people are already asking for
>>> help using it on user@)
>>>
>>> to get it committed, it either must remove the dependency on GeoCouch, 
>>> or
>>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>>>
>>> Is it possible / useful to make a version that doesn't use GeoCouch? And
>>> then to make the GeoCouch capabilities part GeoCouch for now?
>>>
>>> Chris
>>>
>>
>> Hi Norman,
>>
>> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits 
>> to
>> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>>
>> Lately I haven't been that responsive when it comes to GeoCouch, but that
>> will change (in about a month) after holidays and FOSS4G.
>>
>> Cheers,
>>  Volker
>>
>

>>>
>>
>


Re: multiview on github

2010-08-22 Thread Norman Barker
Bob,

I am testing on 1+ documents, I appreciate that we need to
establish when a multi-process as opposed to a tbd (suggestions
welcome) approach is required. The startkey / endkey is an issue
though, is there a better way to test inclusion?

The speed of the multiview is directly linked to the size of the
smallest view result though, so total documents isn't a factor.

I am still thinking about fti, I am testing with clucene, but the
external handler problem is the same, how to make it stream in order.

I will fix the local_dev.ini problem tomorrow, couchdb should be in
version control.

Any hints on how to test inclusion are appreciated, it will greatly
speed up collation.

thanks,

Norman



On Sun, Aug 22, 2010 at 4:15 PM, Robert Newson  wrote:
> I'm concerned about the performance of this on non-trivial databases,
> given the iteration of all items between startkey and endkey. I don't
> have time to test it this week but I'd be interested to hear the time
> it took to do a multiview on two views of, say, a million rows each
> (especially as compared to the two normal view calls).
>
> I was also intrigued to see the code handles fti too, a problem I have
> spent some time thinking about without finding a satisfactorily
> performant solution too. I note that, as written, it doesn't appear to
> work because the fti call (I'm assuming couchdb-lucene) will only
> return the top N matching hits, so at best you can filter those
> against another view (perhaps that's useful?). The trick to merging a
> view and an fti result together would be to get the results from both
> in the same order and step through the rows, filtering as you go.
> Sorting in Lucene has a large memory hit so I gave up on that
> solution.
>
> Finally, your patch appears to add two generated files (local_dev.ini
> and etc/init.d/couchdb) to the branch which should be fixed (add your
> settings to default.init.tpl.in instead).
>
> I should end by saying that if the problems above can be solved then
> this would be a very useful addition to CouchDB and one that is
> frequently requested. It might also be a model for multi-machine
> views.
>
> B.
>
> On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker  
> wrote:
>> I would like to take this multiview code and have it added to trunk if
>> possible, what are the next steps?
>>
>> thanks,
>>
>> Norman
>>
>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
>> wrote:
>>> I have made
>>>
>>> http://github.com/normanb/couchdb
>>>
>>> which is a fork of the latest couchdb trunk with the multiview code
>>> and tests added.
>>>
>>> If geocouch is available then it can still be used.
>>>
>>> There are a couple of questions about the multiview on the user /dev
>>> list so I will be adding some more test cases during today.
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
>>> wrote:
 this is possible, I forked geocouch since I use it, but I have already
 separated the geocouch dependencies from the trunk.

 I can do this tomorrow, certainly be interested in any feedback.

 thanks,

 Norman



 On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
 wrote:
> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>
>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>
>>> Hi,
>>>
>>> I have made the changes as recommended, adding a test case
>>> multiview.js and also adding the userCtx to open the db.
>>>
>>> I have also forked geocouch and this is available here
>>>
>>
>> this patch seems important (especially as people are already asking for
>> help using it on user@)
>>
>> to get it committed, it either must remove the dependency on GeoCouch, or
>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>>
>> Is it possible / useful to make a version that doesn't use GeoCouch? And
>> then to make the GeoCouch capabilities part GeoCouch for now?
>>
>> Chris
>>
>
> Hi Norman,
>
> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>
> Lately I haven't been that responsive when it comes to GeoCouch, but that
> will change (in about a month) after holidays and FOSS4G.
>
> Cheers,
>  Volker
>

>>>
>>
>


Re: multiview on github

2010-08-22 Thread Robert Newson
I'm concerned about the performance of this on non-trivial databases,
given the iteration of all items between startkey and endkey. I don't
have time to test it this week but I'd be interested to hear the time
it took to do a multiview on two views of, say, a million rows each
(especially as compared to the two normal view calls).

I was also intrigued to see the code handles fti too, a problem I have
spent some time thinking about without finding a satisfactorily
performant solution too. I note that, as written, it doesn't appear to
work because the fti call (I'm assuming couchdb-lucene) will only
return the top N matching hits, so at best you can filter those
against another view (perhaps that's useful?). The trick to merging a
view and an fti result together would be to get the results from both
in the same order and step through the rows, filtering as you go.
Sorting in Lucene has a large memory hit so I gave up on that
solution.

Finally, your patch appears to add two generated files (local_dev.ini
and etc/init.d/couchdb) to the branch which should be fixed (add your
settings to default.init.tpl.in instead).

I should end by saying that if the problems above can be solved then
this would be a very useful addition to CouchDB and one that is
frequently requested. It might also be a model for multi-machine
views.

B.

On Sun, Aug 22, 2010 at 10:45 PM, Norman Barker  wrote:
> I would like to take this multiview code and have it added to trunk if
> possible, what are the next steps?
>
> thanks,
>
> Norman
>
> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  
> wrote:
>> I have made
>>
>> http://github.com/normanb/couchdb
>>
>> which is a fork of the latest couchdb trunk with the multiview code
>> and tests added.
>>
>> If geocouch is available then it can still be used.
>>
>> There are a couple of questions about the multiview on the user /dev
>> list so I will be adding some more test cases during today.
>>
>> thanks,
>>
>> Norman
>>
>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
>> wrote:
>>> this is possible, I forked geocouch since I use it, but I have already
>>> separated the geocouch dependencies from the trunk.
>>>
>>> I can do this tomorrow, certainly be interested in any feedback.
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>>
>>>
>>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
>>> wrote:
 On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>
> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>
>> Hi,
>>
>> I have made the changes as recommended, adding a test case
>> multiview.js and also adding the userCtx to open the db.
>>
>> I have also forked geocouch and this is available here
>>
>
> this patch seems important (especially as people are already asking for
> help using it on user@)
>
> to get it committed, it either must remove the dependency on GeoCouch, or
> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>
> Is it possible / useful to make a version that doesn't use GeoCouch? And
> then to make the GeoCouch capabilities part GeoCouch for now?
>
> Chris
>

 Hi Norman,

 if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
 GeoCouch itself (as GeoCouch isn't ready for trunk yet).

 Lately I haven't been that responsive when it comes to GeoCouch, but that
 will change (in about a month) after holidays and FOSS4G.

 Cheers,
  Volker

>>>
>>
>


Re: multiview on github

2010-08-22 Thread Norman Barker
I would like to take this multiview code and have it added to trunk if
possible, what are the next steps?

thanks,

Norman

On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker  wrote:
> I have made
>
> http://github.com/normanb/couchdb
>
> which is a fork of the latest couchdb trunk with the multiview code
> and tests added.
>
> If geocouch is available then it can still be used.
>
> There are a couple of questions about the multiview on the user /dev
> list so I will be adding some more test cases during today.
>
> thanks,
>
> Norman
>
> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  
> wrote:
>> this is possible, I forked geocouch since I use it, but I have already
>> separated the geocouch dependencies from the trunk.
>>
>> I can do this tomorrow, certainly be interested in any feedback.
>>
>> thanks,
>>
>> Norman
>>
>>
>>
>> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
>> wrote:
>>> On 08/18/2010 03:26 AM, J Chris Anderson wrote:

 On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:

> Hi,
>
> I have made the changes as recommended, adding a test case
> multiview.js and also adding the userCtx to open the db.
>
> I have also forked geocouch and this is available here
>

 this patch seems important (especially as people are already asking for
 help using it on user@)

 to get it committed, it either must remove the dependency on GeoCouch, or
 become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.

 Is it possible / useful to make a version that doesn't use GeoCouch? And
 then to make the GeoCouch capabilities part GeoCouch for now?

 Chris

>>>
>>> Hi Norman,
>>>
>>> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
>>> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>>>
>>> Lately I haven't been that responsive when it comes to GeoCouch, but that
>>> will change (in about a month) after holidays and FOSS4G.
>>>
>>> Cheers,
>>>  Volker
>>>
>>
>


Re: multiview on github

2010-08-18 Thread Norman Barker
I have made

http://github.com/normanb/couchdb

which is a fork of the latest couchdb trunk with the multiview code
and tests added.

If geocouch is available then it can still be used.

There are a couple of questions about the multiview on the user /dev
list so I will be adding some more test cases during today.

thanks,

Norman

On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker  wrote:
> this is possible, I forked geocouch since I use it, but I have already
> separated the geocouch dependencies from the trunk.
>
> I can do this tomorrow, certainly be interested in any feedback.
>
> thanks,
>
> Norman
>
>
>
> On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  
> wrote:
>> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>>
>>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>>
 Hi,

 I have made the changes as recommended, adding a test case
 multiview.js and also adding the userCtx to open the db.

 I have also forked geocouch and this is available here

>>>
>>> this patch seems important (especially as people are already asking for
>>> help using it on user@)
>>>
>>> to get it committed, it either must remove the dependency on GeoCouch, or
>>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>>>
>>> Is it possible / useful to make a version that doesn't use GeoCouch? And
>>> then to make the GeoCouch capabilities part GeoCouch for now?
>>>
>>> Chris
>>>
>>
>> Hi Norman,
>>
>> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
>> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>>
>> Lately I haven't been that responsive when it comes to GeoCouch, but that
>> will change (in about a month) after holidays and FOSS4G.
>>
>> Cheers,
>>  Volker
>>
>


Re: multiview on github

2010-08-17 Thread Norman Barker
this is possible, I forked geocouch since I use it, but I have already
separated the geocouch dependencies from the trunk.

I can do this tomorrow, certainly be interested in any feedback.

thanks,

Norman



On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische  wrote:
> On 08/18/2010 03:26 AM, J Chris Anderson wrote:
>>
>> On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
>>
>>> Hi,
>>>
>>> I have made the changes as recommended, adding a test case
>>> multiview.js and also adding the userCtx to open the db.
>>>
>>> I have also forked geocouch and this is available here
>>>
>>
>> this patch seems important (especially as people are already asking for
>> help using it on user@)
>>
>> to get it committed, it either must remove the dependency on GeoCouch, or
>> become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
>>
>> Is it possible / useful to make a version that doesn't use GeoCouch? And
>> then to make the GeoCouch capabilities part GeoCouch for now?
>>
>> Chris
>>
>
> Hi Norman,
>
> if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
> GeoCouch itself (as GeoCouch isn't ready for trunk yet).
>
> Lately I haven't been that responsive when it comes to GeoCouch, but that
> will change (in about a month) after holidays and FOSS4G.
>
> Cheers,
>  Volker
>


Re: multiview on github

2010-08-17 Thread Volker Mische

On 08/18/2010 03:26 AM, J Chris Anderson wrote:


On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:


Hi,

I have made the changes as recommended, adding a test case
multiview.js and also adding the userCtx to open the db.

I have also forked geocouch and this is available here



this patch seems important (especially as people are already asking for help 
using it on user@)

to get it committed, it either must remove the dependency on GeoCouch, or 
become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.

Is it possible / useful to make a version that doesn't use GeoCouch? And then 
to make the GeoCouch capabilities part GeoCouch for now?

Chris



Hi Norman,

if the patch is ready for trunk, I'd be happy to move the GeoCouch bits 
to GeoCouch itself (as GeoCouch isn't ready for trunk yet).


Lately I haven't been that responsive when it comes to GeoCouch, but 
that will change (in about a month) after holidays and FOSS4G.


Cheers,
  Volker


Re: multiview on github

2010-08-17 Thread J Chris Anderson

On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:

> Hi,
> 
> I have made the changes as recommended, adding a test case
> multiview.js and also adding the userCtx to open the db.
> 
> I have also forked geocouch and this is available here
> 

this patch seems important (especially as people are already asking for help 
using it on user@)

to get it committed, it either must remove the dependency on GeoCouch, or 
become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.

Is it possible / useful to make a version that doesn't use GeoCouch? And then 
to make the GeoCouch capabilities part GeoCouch for now?

Chris

> http://github.com/normanb/couchdb
> 
> ./bootstrap
> ./configure
> make dev
> utils/run
> 
> should do it and then the simple test case is available in Futon.
> 
> The test case multiview.js takes two views which emit docs which run
> from 0 .. 100, view 1 emits those documents with ids which are
> multiples of 3, view 2 emits those which are multiples of 4. The
> _multiview request is the intersection of view 1 and view 2 resulting
> those documents whose ids are multiples of 12.
> 
> Any comments appreciated, particular concerning the following;
> 
> 1) in the module multiview, is there a quicker way to find the counts
> from startkey to endkey rather than iterating?
> 2) In the module couch_query_ring is there a quicker way to test for
> inclusion rather than iterating?
> 
> Many thanks,
> 
> Norman
> 
> On Fri, Aug 6, 2010 at 10:16 AM, Norman Barker  
> wrote:
>> Chris,
>> 
>> I will make those changes, it might be a couple of days as I am on travel.
>> 
>> I will clone geocouch as a starting point and add javascript tests as
>> you suggest.
>> 
>> I am bench marking with around 1 docs and a couple of views
>> (including geocouch), the main issue with the folding over every
>> document to both find the number of docs in a view slice (between
>> startkey and endkey) and then again to test inclusion between views.
>> 
>> I am interested in taking this forward and appreciate any code feedback.
>> 
>> thanks,
>> 
>> Norman
>> 
>> On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson  wrote:
>>> 
>>> On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote:
>>> 
 Hi Norman,
 
 I still plan to look at your code, I know the others here
 are fairly busy too, sorry for the review delay :)
 
>>> 
>>> The code looks clean (but could use better comments about where in the flow 
>>> each module comes into play). I don't think we can guess about performance, 
>>> instead we should benchmark to make sure the ring approach is right.
>>> 
>>> In CouchDB currently, it is possible to isolate requests against a single 
>>> db. So you use the security settings to prevent access to databases, etc. 
>>> For this, using the userCtx and switching away from couch_db:open_int() 
>>> would make a big difference.
>>> 
>>> This way people can query across dbs if they have read access to all of 
>>> them.
>>> 
>>> I think if you package this as a CouchDB fork on Github and add a few 
>>> JavaScript tests, it will be really useful for some folks. I like that it 
>>> has geo support. Maybe we can target it for inclusion in trunk just after 
>>> GeoCouch goes in trunk (if Volker wants to put it in.)
>>> 
>>> Also, for realtime hacking on this, you might find that the #couchdb IRC 
>>> channel on Freenode is a good place to solicit feedback. There are a lot of 
>>> people on there doing Geo things that would benefit from this. (They really 
>>> wanna be able to intersect a Geo query with a Map Reduce query, etc.)
>>> 
>>> Chris
>>> 
 Cheers
 Jan
 --
 
 
 On 5 Aug 2010, at 18:12, Norman Barker wrote:
 
> Hi,
> 
> is there any interest in the multiview, I have fixed (3) below, but am
> still interested in approaches for (1) and (2).
> 
> thanks,
> 
> Norman
> 
> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  
> wrote:
>> Hi,
>> 
>> a very initial version of the multiview is at
>> http://github.com/normanb/couchdb-multiview for discussion.
>> 
>> The views are intersected by using a ring of processes where each node
>> in the ring represents a view as follows;
>> 
>> % send an id from the start list to the next node in the ring, if the
>> id is in adjacent node then this node sends to the next ring node 
>> % if the id gets all round the ring and back to the start node then it
>> has intersected all queries and should be included. The nodes in the
>> ring
>> % should be sorted in size from small to large for this to be effective
>> %
>> % In addition send the initial id list round in parallel
>> 
>> this is implemented in the couch_query_ring module.
>> 
>> I have a couple of questions
>> 
>> 1) in the module multiview, is there a quicker way to find the counts
>> from startkey to endkey rather than iterating?
>> 2) In the module couch_q

Re: multiview on github

2010-08-16 Thread Norman Barker
Hi,

I have made the changes as recommended, adding a test case
multiview.js and also adding the userCtx to open the db.

I have also forked geocouch and this is available here

http://github.com/normanb/couchdb

./bootstrap
./configure
make dev
utils/run

should do it and then the simple test case is available in Futon.

The test case multiview.js takes two views which emit docs which run
from 0 .. 100, view 1 emits those documents with ids which are
multiples of 3, view 2 emits those which are multiples of 4. The
_multiview request is the intersection of view 1 and view 2 resulting
those documents whose ids are multiples of 12.

Any comments appreciated, particular concerning the following;

 1) in the module multiview, is there a quicker way to find the counts
 from startkey to endkey rather than iterating?
 2) In the module couch_query_ring is there a quicker way to test for
 inclusion rather than iterating?

Many thanks,

Norman

On Fri, Aug 6, 2010 at 10:16 AM, Norman Barker  wrote:
> Chris,
>
> I will make those changes, it might be a couple of days as I am on travel.
>
> I will clone geocouch as a starting point and add javascript tests as
> you suggest.
>
> I am bench marking with around 1 docs and a couple of views
> (including geocouch), the main issue with the folding over every
> document to both find the number of docs in a view slice (between
> startkey and endkey) and then again to test inclusion between views.
>
> I am interested in taking this forward and appreciate any code feedback.
>
> thanks,
>
> Norman
>
> On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson  wrote:
>>
>> On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote:
>>
>>> Hi Norman,
>>>
>>> I still plan to look at your code, I know the others here
>>> are fairly busy too, sorry for the review delay :)
>>>
>>
>> The code looks clean (but could use better comments about where in the flow 
>> each module comes into play). I don't think we can guess about performance, 
>> instead we should benchmark to make sure the ring approach is right.
>>
>> In CouchDB currently, it is possible to isolate requests against a single 
>> db. So you use the security settings to prevent access to databases, etc. 
>> For this, using the userCtx and switching away from couch_db:open_int() 
>> would make a big difference.
>>
>> This way people can query across dbs if they have read access to all of them.
>>
>> I think if you package this as a CouchDB fork on Github and add a few 
>> JavaScript tests, it will be really useful for some folks. I like that it 
>> has geo support. Maybe we can target it for inclusion in trunk just after 
>> GeoCouch goes in trunk (if Volker wants to put it in.)
>>
>> Also, for realtime hacking on this, you might find that the #couchdb IRC 
>> channel on Freenode is a good place to solicit feedback. There are a lot of 
>> people on there doing Geo things that would benefit from this. (They really 
>> wanna be able to intersect a Geo query with a Map Reduce query, etc.)
>>
>> Chris
>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>> On 5 Aug 2010, at 18:12, Norman Barker wrote:
>>>
 Hi,

 is there any interest in the multiview, I have fixed (3) below, but am
 still interested in approaches for (1) and (2).

 thanks,

 Norman

 On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  
 wrote:
> Hi,
>
> a very initial version of the multiview is at
> http://github.com/normanb/couchdb-multiview for discussion.
>
> The views are intersected by using a ring of processes where each node
> in the ring represents a view as follows;
>
> % send an id from the start list to the next node in the ring, if the
> id is in adjacent node then this node sends to the next ring node 
> % if the id gets all round the ring and back to the start node then it
> has intersected all queries and should be included. The nodes in the
> ring
> % should be sorted in size from small to large for this to be effective
> %
> % In addition send the initial id list round in parallel
>
> this is implemented in the couch_query_ring module.
>
> I have a couple of questions
>
> 1) in the module multiview, is there a quicker way to find the counts
> from startkey to endkey rather than iterating?
> 2) In the module couch_query_ring is there a quicker way to test for
> inclusion rather than iterating?
> 3) Finally, if I hit this concurrently I get an exception,
>
> [error] [<0.201.0>] Uncaught error in HTTP request: {exit,
>                                {noproc,
>                                 {gen_server,call,
>
> (so ignore my previous email, I am able to trap the msg)
>
> I am going to look into (3) but if you have seen this before.
>
> I am developing on windows, but also test on linux I will work on
> getting a linux makefile, but the Makefile.win should be a start.
>
> Any help an

Re: multiview on github

2010-08-06 Thread Norman Barker
Chris,

I will make those changes, it might be a couple of days as I am on travel.

I will clone geocouch as a starting point and add javascript tests as
you suggest.

I am bench marking with around 1 docs and a couple of views
(including geocouch), the main issue with the folding over every
document to both find the number of docs in a view slice (between
startkey and endkey) and then again to test inclusion between views.

I am interested in taking this forward and appreciate any code feedback.

thanks,

Norman

On Thu, Aug 5, 2010 at 6:26 PM, J Chris Anderson  wrote:
>
> On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote:
>
>> Hi Norman,
>>
>> I still plan to look at your code, I know the others here
>> are fairly busy too, sorry for the review delay :)
>>
>
> The code looks clean (but could use better comments about where in the flow 
> each module comes into play). I don't think we can guess about performance, 
> instead we should benchmark to make sure the ring approach is right.
>
> In CouchDB currently, it is possible to isolate requests against a single db. 
> So you use the security settings to prevent access to databases, etc. For 
> this, using the userCtx and switching away from couch_db:open_int() would 
> make a big difference.
>
> This way people can query across dbs if they have read access to all of them.
>
> I think if you package this as a CouchDB fork on Github and add a few 
> JavaScript tests, it will be really useful for some folks. I like that it has 
> geo support. Maybe we can target it for inclusion in trunk just after 
> GeoCouch goes in trunk (if Volker wants to put it in.)
>
> Also, for realtime hacking on this, you might find that the #couchdb IRC 
> channel on Freenode is a good place to solicit feedback. There are a lot of 
> people on there doing Geo things that would benefit from this. (They really 
> wanna be able to intersect a Geo query with a Map Reduce query, etc.)
>
> Chris
>
>> Cheers
>> Jan
>> --
>>
>>
>> On 5 Aug 2010, at 18:12, Norman Barker wrote:
>>
>>> Hi,
>>>
>>> is there any interest in the multiview, I have fixed (3) below, but am
>>> still interested in approaches for (1) and (2).
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  
>>> wrote:
 Hi,

 a very initial version of the multiview is at
 http://github.com/normanb/couchdb-multiview for discussion.

 The views are intersected by using a ring of processes where each node
 in the ring represents a view as follows;

 % send an id from the start list to the next node in the ring, if the
 id is in adjacent node then this node sends to the next ring node 
 % if the id gets all round the ring and back to the start node then it
 has intersected all queries and should be included. The nodes in the
 ring
 % should be sorted in size from small to large for this to be effective
 %
 % In addition send the initial id list round in parallel

 this is implemented in the couch_query_ring module.

 I have a couple of questions

 1) in the module multiview, is there a quicker way to find the counts
 from startkey to endkey rather than iterating?
 2) In the module couch_query_ring is there a quicker way to test for
 inclusion rather than iterating?
 3) Finally, if I hit this concurrently I get an exception,

 [error] [<0.201.0>] Uncaught error in HTTP request: {exit,
                                {noproc,
                                 {gen_server,call,

 (so ignore my previous email, I am able to trap the msg)

 I am going to look into (3) but if you have seen this before.

 I am developing on windows, but also test on linux I will work on
 getting a linux makefile, but the Makefile.win should be a start.

 Any help and comments appreciated.

 Norman

>>
>
>


Re: multiview on github

2010-08-06 Thread Volker Mische

Hi Norman,

wow, I didn't know it hat GeoCouch support. Sounds great! I need to have 
a closer look. Not just now (sorry for that).


Cheers,
  Volker

On 08/06/2010 02:26 AM, J Chris Anderson wrote:


The code looks clean (but could use better comments about where in the flow 
each module comes into play). I don't think we can guess about performance, 
instead we should benchmark to make sure the ring approach is right.

In CouchDB currently, it is possible to isolate requests against a single db. 
So you use the security settings to prevent access to databases, etc. For this, 
using the userCtx and switching away from couch_db:open_int() would make a big 
difference.

This way people can query across dbs if they have read access to all of them.

I think if you package this as a CouchDB fork on Github and add a few 
JavaScript tests, it will be really useful for some folks. I like that it has 
geo support. Maybe we can target it for inclusion in trunk just after GeoCouch 
goes in trunk (if Volker wants to put it in.)

Also, for realtime hacking on this, you might find that the #couchdb IRC 
channel on Freenode is a good place to solicit feedback. There are a lot of 
people on there doing Geo things that would benefit from this. (They really 
wanna be able to intersect a Geo query with a Map Reduce query, etc.)

Chris



On 5 Aug 2010, at 18:12, Norman Barker wrote:


Hi,

is there any interest in the multiview, I have fixed (3) below, but am
still interested in approaches for (1) and (2).

thanks,

Norman

On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  wrote:

Hi,

a very initial version of the multiview is at
http://github.com/normanb/couchdb-multiview for discussion.

The views are intersected by using a ring of processes where each node
in the ring represents a view as follows;

% send an id from the start list to the next node in the ring, if the
id is in adjacent node then this node sends to the next ring node 
% if the id gets all round the ring and back to the start node then it
has intersected all queries and should be included. The nodes in the
ring
% should be sorted in size from small to large for this to be effective
%
% In addition send the initial id list round in parallel

this is implemented in the couch_query_ring module.

I have a couple of questions

1) in the module multiview, is there a quicker way to find the counts
from startkey to endkey rather than iterating?
2) In the module couch_query_ring is there a quicker way to test for
inclusion rather than iterating?
3) Finally, if I hit this concurrently I get an exception,

[error] [<0.201.0>] Uncaught error in HTTP request: {exit,
{noproc,
 {gen_server,call,

(so ignore my previous email, I am able to trap the msg)

I am going to look into (3) but if you have seen this before.

I am developing on windows, but also test on linux I will work on
getting a linux makefile, but the Makefile.win should be a start.

Any help and comments appreciated.

Norman









Re: multiview on github

2010-08-05 Thread J Chris Anderson

On Aug 5, 2010, at 4:32 PM, Jan Lehnardt wrote:

> Hi Norman,
> 
> I still plan to look at your code, I know the others here
> are fairly busy too, sorry for the review delay :)
> 

The code looks clean (but could use better comments about where in the flow 
each module comes into play). I don't think we can guess about performance, 
instead we should benchmark to make sure the ring approach is right.

In CouchDB currently, it is possible to isolate requests against a single db. 
So you use the security settings to prevent access to databases, etc. For this, 
using the userCtx and switching away from couch_db:open_int() would make a big 
difference.

This way people can query across dbs if they have read access to all of them.

I think if you package this as a CouchDB fork on Github and add a few 
JavaScript tests, it will be really useful for some folks. I like that it has 
geo support. Maybe we can target it for inclusion in trunk just after GeoCouch 
goes in trunk (if Volker wants to put it in.)

Also, for realtime hacking on this, you might find that the #couchdb IRC 
channel on Freenode is a good place to solicit feedback. There are a lot of 
people on there doing Geo things that would benefit from this. (They really 
wanna be able to intersect a Geo query with a Map Reduce query, etc.)

Chris

> Cheers
> Jan
> -- 
> 
> 
> On 5 Aug 2010, at 18:12, Norman Barker wrote:
> 
>> Hi,
>> 
>> is there any interest in the multiview, I have fixed (3) below, but am
>> still interested in approaches for (1) and (2).
>> 
>> thanks,
>> 
>> Norman
>> 
>> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  
>> wrote:
>>> Hi,
>>> 
>>> a very initial version of the multiview is at
>>> http://github.com/normanb/couchdb-multiview for discussion.
>>> 
>>> The views are intersected by using a ring of processes where each node
>>> in the ring represents a view as follows;
>>> 
>>> % send an id from the start list to the next node in the ring, if the
>>> id is in adjacent node then this node sends to the next ring node 
>>> % if the id gets all round the ring and back to the start node then it
>>> has intersected all queries and should be included. The nodes in the
>>> ring
>>> % should be sorted in size from small to large for this to be effective
>>> %
>>> % In addition send the initial id list round in parallel
>>> 
>>> this is implemented in the couch_query_ring module.
>>> 
>>> I have a couple of questions
>>> 
>>> 1) in the module multiview, is there a quicker way to find the counts
>>> from startkey to endkey rather than iterating?
>>> 2) In the module couch_query_ring is there a quicker way to test for
>>> inclusion rather than iterating?
>>> 3) Finally, if I hit this concurrently I get an exception,
>>> 
>>> [error] [<0.201.0>] Uncaught error in HTTP request: {exit,
>>>{noproc,
>>> {gen_server,call,
>>> 
>>> (so ignore my previous email, I am able to trap the msg)
>>> 
>>> I am going to look into (3) but if you have seen this before.
>>> 
>>> I am developing on windows, but also test on linux I will work on
>>> getting a linux makefile, but the Makefile.win should be a start.
>>> 
>>> Any help and comments appreciated.
>>> 
>>> Norman
>>> 
> 



Re: multiview on github

2010-08-05 Thread Jan Lehnardt
Hi Norman,

I still plan to look at your code, I know the others here
are fairly busy too, sorry for the review delay :)

Cheers
Jan
-- 


On 5 Aug 2010, at 18:12, Norman Barker wrote:

> Hi,
> 
> is there any interest in the multiview, I have fixed (3) below, but am
> still interested in approaches for (1) and (2).
> 
> thanks,
> 
> Norman
> 
> On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  
> wrote:
>> Hi,
>> 
>> a very initial version of the multiview is at
>> http://github.com/normanb/couchdb-multiview for discussion.
>> 
>> The views are intersected by using a ring of processes where each node
>> in the ring represents a view as follows;
>> 
>> % send an id from the start list to the next node in the ring, if the
>> id is in adjacent node then this node sends to the next ring node 
>> % if the id gets all round the ring and back to the start node then it
>> has intersected all queries and should be included. The nodes in the
>> ring
>> % should be sorted in size from small to large for this to be effective
>> %
>> % In addition send the initial id list round in parallel
>> 
>> this is implemented in the couch_query_ring module.
>> 
>> I have a couple of questions
>> 
>> 1) in the module multiview, is there a quicker way to find the counts
>> from startkey to endkey rather than iterating?
>> 2) In the module couch_query_ring is there a quicker way to test for
>> inclusion rather than iterating?
>> 3) Finally, if I hit this concurrently I get an exception,
>> 
>> [error] [<0.201.0>] Uncaught error in HTTP request: {exit,
>> {noproc,
>>  {gen_server,call,
>> 
>> (so ignore my previous email, I am able to trap the msg)
>> 
>> I am going to look into (3) but if you have seen this before.
>> 
>> I am developing on windows, but also test on linux I will work on
>> getting a linux makefile, but the Makefile.win should be a start.
>> 
>> Any help and comments appreciated.
>> 
>> Norman
>> 



Re: multiview on github

2010-08-05 Thread Norman Barker
Hi,

is there any interest in the multiview, I have fixed (3) below, but am
still interested in approaches for (1) and (2).

thanks,

Norman

On Fri, Jul 30, 2010 at 3:39 PM, Norman Barker  wrote:
> Hi,
>
> a very initial version of the multiview is at
> http://github.com/normanb/couchdb-multiview for discussion.
>
> The views are intersected by using a ring of processes where each node
> in the ring represents a view as follows;
>
> % send an id from the start list to the next node in the ring, if the
> id is in adjacent node then this node sends to the next ring node 
> % if the id gets all round the ring and back to the start node then it
> has intersected all queries and should be included. The nodes in the
> ring
> % should be sorted in size from small to large for this to be effective
> %
> % In addition send the initial id list round in parallel
>
> this is implemented in the couch_query_ring module.
>
> I have a couple of questions
>
> 1) in the module multiview, is there a quicker way to find the counts
> from startkey to endkey rather than iterating?
> 2) In the module couch_query_ring is there a quicker way to test for
> inclusion rather than iterating?
> 3) Finally, if I hit this concurrently I get an exception,
>
> [error] [<0.201.0>] Uncaught error in HTTP request: {exit,
>                                 {noproc,
>                                  {gen_server,call,
>
> (so ignore my previous email, I am able to trap the msg)
>
> I am going to look into (3) but if you have seen this before.
>
> I am developing on windows, but also test on linux I will work on
> getting a linux makefile, but the Makefile.win should be a start.
>
> Any help and comments appreciated.
>
> Norman
>


multiview on github

2010-07-30 Thread Norman Barker
Hi,

a very initial version of the multiview is at
http://github.com/normanb/couchdb-multiview for discussion.

The views are intersected by using a ring of processes where each node
in the ring represents a view as follows;

% send an id from the start list to the next node in the ring, if the
id is in adjacent node then this node sends to the next ring node 
% if the id gets all round the ring and back to the start node then it
has intersected all queries and should be included. The nodes in the
ring
% should be sorted in size from small to large for this to be effective
%
% In addition send the initial id list round in parallel

this is implemented in the couch_query_ring module.

I have a couple of questions

1) in the module multiview, is there a quicker way to find the counts
from startkey to endkey rather than iterating?
2) In the module couch_query_ring is there a quicker way to test for
inclusion rather than iterating?
3) Finally, if I hit this concurrently I get an exception,

[error] [<0.201.0>] Uncaught error in HTTP request: {exit,
 {noproc,
  {gen_server,call,

(so ignore my previous email, I am able to trap the msg)

I am going to look into (3) but if you have seen this before.

I am developing on windows, but also test on linux I will work on
getting a linux makefile, but the Makefile.win should be a start.

Any help and comments appreciated.

Norman