Re: [POC] Mango Catch All Selector

2016-01-04 Thread Tony Sun
Hi all,

Hope everyone enjoyed the holidays!

This is the most common mango experience for new users:

1) Syntax issues to create an index.
2) Running into the "no index found" error because his or her query
(with and w/o sort) doesn't match the index correctly.
3) We explain how views work and also suggest our all_docs hack.
4) Then the user complains that their query is slow(due to all_docs or
large result set), and again we try to either optimize the index or suggest
using text indexes (the new open-sourced   feature).

A lot of users are turned off by the usability issues encountered in 1)
and 2). I agree that we should make it as easy as possible for first time
users, so I am okay with removing the need to create an index first.
However, we need to somehow explicitly let the user know about all_docs so
they don't abuse this capability. Also, like mongo, we could internally
check if the current index is an all_docs index and throw a timeout/size
error for a particular query?


Thanks,


Tony

On Mon, Jan 4, 2016 at 11:49 AM, Sebastian Rothbucher <
sebastianrothbuc...@googlemail.com> wrote:

> Hi Robert,
>
> I'm with you that the easier we can make it for s/o to get started the
> better.
> And I think falling back to a full table scan with a log written is a good
> and easy way to go. I'd even set the log level to info or even warning to
> make it clear that there's a problem with huge data sets. And hopefully,
> people run some load test before going into production ;-)
>
> The only other idea I had (a button "use default index" in Fauxton that
> modifies the selector) looks daft on second thought
>
> - I do like your idea though
>
> Best
> Sebastian
>
>
> On Mon, Jan 4, 2016 at 8:04 PM, Paul Davis 
> wrote:
>
> > Hey all,
> >
> > I meant to reply to the ticket on pouchdb-find but got distracted by
> > the holidays.
> >
> > I wanted to note that the original motivation for rejecting a selector
> > that doesn't have an index was to avoid the specific situation where a
> > user has a query that appears to run quite quickly in testing/dev but
> > fails or results in timeouts in production due to a different data
> > set. This was definitely a deviation from the MongoDB approach. The
> > last I read their docs on this they mentioned in a couple places that
> > while an index is not required there are limits on result set sizes
> > and (I think?) query time. I made the choice that rather than fail
> > eventually to fail quickly and hopefully be descriptive of why the
> > query failed. For instance, there should be a note in the error
> > response when no index is available that describes which fields could
> > be indexed to satisfy the query.
> >
> > On the other hand, once we had users actually playing with this
> > feature there were quite a few instances of, "I just want to try this
> > query without waiting for an index to build." and I made the clever
> > suggestion that just adding the {"$and": [Query, {"_id": {"$gt":
> > null}}]} wrapper would cause a full table scan. That's obviously a
> > hack and I was fine with that because it seemed like an obvious hack
> > that would motivate users to create the appropriate index before
> > moving to production.
> >
> > On the flip side it seems like for some people the hack is a hurdle
> > into learning the query capabilities as well as adding to the overhead
> > of learning CouchDB in general. And this particular feature was aimed
> > directly at providing an easier on-ramp to CouchDB for people coming
> > from other databases. Given what I've read here and elsewhere perhaps
> > what might be easiest would be to add a feature along the lines of
> > "developing": "true" to the _find request body that would enable the
> > _all_docs fold. This would provide two benefits in that internally we
> > could throw different errors in specific cases. For instances, some
> > selectors fail because they can't run against a map/reduce index (ie,
> > $or) and that won't change no matter what map/reduce indexes are
> > added. If we just wrap the the _all_docs hack this changes the
> > behavior which would probably surprise new users.
> >
> > On the other hand, indexes can be operationally quite costly and
> > require planning to handle capacity so I would definitely avoid
> > automatically creating them from the _find endpoint. Perhaps we could
> > add a feature for the _index endpoint that accepts a selector and
> > figures out the index to create. Which I think is along the lines of
> > what Dale mentioned but with a slightly more on purpose interaction
> > from the user.
> >
> > Paul
> >
> > On Mon, Jan 4, 2016 at 8:05 AM, Garren Smith  wrote:
> > > Hi Robert,
> > >
> > > This is cool. I think it links in with this
> > https://issues.apache.org/jira/browse/COUCHDB-2928 <
> > https://issues.apache.org/jira/browse/COUCHDB-2928> and this
> > https://github.com/nolanlawson/pouchdb-find/issues/138 <
> > https://github.com/nolanlawson/pouchdb-fin

Re: [POC] Mango Catch All Selector

2016-01-04 Thread Sebastian Rothbucher
Hi Robert,

I'm with you that the easier we can make it for s/o to get started the
better.
And I think falling back to a full table scan with a log written is a good
and easy way to go. I'd even set the log level to info or even warning to
make it clear that there's a problem with huge data sets. And hopefully,
people run some load test before going into production ;-)

The only other idea I had (a button "use default index" in Fauxton that
modifies the selector) looks daft on second thought

- I do like your idea though

Best
Sebastian


On Mon, Jan 4, 2016 at 8:04 PM, Paul Davis 
wrote:

> Hey all,
>
> I meant to reply to the ticket on pouchdb-find but got distracted by
> the holidays.
>
> I wanted to note that the original motivation for rejecting a selector
> that doesn't have an index was to avoid the specific situation where a
> user has a query that appears to run quite quickly in testing/dev but
> fails or results in timeouts in production due to a different data
> set. This was definitely a deviation from the MongoDB approach. The
> last I read their docs on this they mentioned in a couple places that
> while an index is not required there are limits on result set sizes
> and (I think?) query time. I made the choice that rather than fail
> eventually to fail quickly and hopefully be descriptive of why the
> query failed. For instance, there should be a note in the error
> response when no index is available that describes which fields could
> be indexed to satisfy the query.
>
> On the other hand, once we had users actually playing with this
> feature there were quite a few instances of, "I just want to try this
> query without waiting for an index to build." and I made the clever
> suggestion that just adding the {"$and": [Query, {"_id": {"$gt":
> null}}]} wrapper would cause a full table scan. That's obviously a
> hack and I was fine with that because it seemed like an obvious hack
> that would motivate users to create the appropriate index before
> moving to production.
>
> On the flip side it seems like for some people the hack is a hurdle
> into learning the query capabilities as well as adding to the overhead
> of learning CouchDB in general. And this particular feature was aimed
> directly at providing an easier on-ramp to CouchDB for people coming
> from other databases. Given what I've read here and elsewhere perhaps
> what might be easiest would be to add a feature along the lines of
> "developing": "true" to the _find request body that would enable the
> _all_docs fold. This would provide two benefits in that internally we
> could throw different errors in specific cases. For instances, some
> selectors fail because they can't run against a map/reduce index (ie,
> $or) and that won't change no matter what map/reduce indexes are
> added. If we just wrap the the _all_docs hack this changes the
> behavior which would probably surprise new users.
>
> On the other hand, indexes can be operationally quite costly and
> require planning to handle capacity so I would definitely avoid
> automatically creating them from the _find endpoint. Perhaps we could
> add a feature for the _index endpoint that accepts a selector and
> figures out the index to create. Which I think is along the lines of
> what Dale mentioned but with a slightly more on purpose interaction
> from the user.
>
> Paul
>
> On Mon, Jan 4, 2016 at 8:05 AM, Garren Smith  wrote:
> > Hi Robert,
> >
> > This is cool. I think it links in with this
> https://issues.apache.org/jira/browse/COUCHDB-2928 <
> https://issues.apache.org/jira/browse/COUCHDB-2928> and this
> https://github.com/nolanlawson/pouchdb-find/issues/138 <
> https://github.com/nolanlawson/pouchdb-find/issues/138>
> >
> > Cheers
> > Garren
> >
> >> On 04 Jan 2016, at 2:33 PM, Dale Harvey  wrote:
> >>
> >> I havent yet started looking into the implementation details, but when
> >> using pouchdb-find I have very much always expected that at some point
> we
> >> would analyse the queries and automatically produce an index for them.
> This
> >> seems like a great step in between.
> >>
> >> On 4 January 2016 at 13:27, Robert Kowalski  wrote:
> >>
> >>> Hi list,
> >>>
> >>> I hope you had awesome holidays!
> >>>
> >>> The whole holidays I thought about an idea I had and today I
> >>> implemented a prototype which still has some bugs and isn't complete
> >>> yet.
> >>>
> >>> I want to find out if there is general interest and if it would be
> >>> worth to spend more time.
> >>>
> >>> The problem I am trying to solve is that I usually have a hard time
> >>> explaining people how views work. Now we got Mango and I can just say:
> >>> we use a syntax similar to MongoDB's query language _but you have to
> >>> create an index before you can use it_.
> >>>
> >>> At this point I usually look into sad, big eyes because no one
> >>> understands why they have to create an index first and I feel there is
> >>> another entry barrier for newcomers. If trying anyway given they have
> >>> d

Re: [POC] Mango Catch All Selector

2016-01-04 Thread Paul Davis
Hey all,

I meant to reply to the ticket on pouchdb-find but got distracted by
the holidays.

I wanted to note that the original motivation for rejecting a selector
that doesn't have an index was to avoid the specific situation where a
user has a query that appears to run quite quickly in testing/dev but
fails or results in timeouts in production due to a different data
set. This was definitely a deviation from the MongoDB approach. The
last I read their docs on this they mentioned in a couple places that
while an index is not required there are limits on result set sizes
and (I think?) query time. I made the choice that rather than fail
eventually to fail quickly and hopefully be descriptive of why the
query failed. For instance, there should be a note in the error
response when no index is available that describes which fields could
be indexed to satisfy the query.

On the other hand, once we had users actually playing with this
feature there were quite a few instances of, "I just want to try this
query without waiting for an index to build." and I made the clever
suggestion that just adding the {"$and": [Query, {"_id": {"$gt":
null}}]} wrapper would cause a full table scan. That's obviously a
hack and I was fine with that because it seemed like an obvious hack
that would motivate users to create the appropriate index before
moving to production.

On the flip side it seems like for some people the hack is a hurdle
into learning the query capabilities as well as adding to the overhead
of learning CouchDB in general. And this particular feature was aimed
directly at providing an easier on-ramp to CouchDB for people coming
from other databases. Given what I've read here and elsewhere perhaps
what might be easiest would be to add a feature along the lines of
"developing": "true" to the _find request body that would enable the
_all_docs fold. This would provide two benefits in that internally we
could throw different errors in specific cases. For instances, some
selectors fail because they can't run against a map/reduce index (ie,
$or) and that won't change no matter what map/reduce indexes are
added. If we just wrap the the _all_docs hack this changes the
behavior which would probably surprise new users.

On the other hand, indexes can be operationally quite costly and
require planning to handle capacity so I would definitely avoid
automatically creating them from the _find endpoint. Perhaps we could
add a feature for the _index endpoint that accepts a selector and
figures out the index to create. Which I think is along the lines of
what Dale mentioned but with a slightly more on purpose interaction
from the user.

Paul

On Mon, Jan 4, 2016 at 8:05 AM, Garren Smith  wrote:
> Hi Robert,
>
> This is cool. I think it links in with this 
> https://issues.apache.org/jira/browse/COUCHDB-2928 
>  and this 
> https://github.com/nolanlawson/pouchdb-find/issues/138 
> 
>
> Cheers
> Garren
>
>> On 04 Jan 2016, at 2:33 PM, Dale Harvey  wrote:
>>
>> I havent yet started looking into the implementation details, but when
>> using pouchdb-find I have very much always expected that at some point we
>> would analyse the queries and automatically produce an index for them. This
>> seems like a great step in between.
>>
>> On 4 January 2016 at 13:27, Robert Kowalski  wrote:
>>
>>> Hi list,
>>>
>>> I hope you had awesome holidays!
>>>
>>> The whole holidays I thought about an idea I had and today I
>>> implemented a prototype which still has some bugs and isn't complete
>>> yet.
>>>
>>> I want to find out if there is general interest and if it would be
>>> worth to spend more time.
>>>
>>> The problem I am trying to solve is that I usually have a hard time
>>> explaining people how views work. Now we got Mango and I can just say:
>>> we use a syntax similar to MongoDB's query language _but you have to
>>> create an index before you can use it_.
>>>
>>> At this point I usually look into sad, big eyes because no one
>>> understands why they have to create an index first and I feel there is
>>> another entry barrier for newcomers. If trying anyway given they have
>>> decided for CouchDB the user gets a error back: "no index available
>>> for this selector".
>>>
>>> The idea of this patch is to just fallback on the "give me all docs
>>> and i filter afterwards"-trick that people usually use (if they know
>>> it) when they just want to test something, without creating an index
>>> which can take time for creation and requires further knowledge.
>>> Additionally the user is warned that they can create an index to make
>>> the queries faster.
>>>
>>> What do you think? Is that something worth to work on further? The PR
>>> is at https://github.com/apache/couchdb-mango/pull/27
>>>
>>> You can test it with basic queries on a database which does not have
>>> indexes for the fields you want to query created yet.
>>>
>>>
>>> Best,
>>>

Re: [POC] Mango Catch All Selector

2016-01-04 Thread Garren Smith
Hi Robert,

This is cool. I think it links in with this 
https://issues.apache.org/jira/browse/COUCHDB-2928 
 and this 
https://github.com/nolanlawson/pouchdb-find/issues/138 


Cheers
Garren

> On 04 Jan 2016, at 2:33 PM, Dale Harvey  wrote:
> 
> I havent yet started looking into the implementation details, but when
> using pouchdb-find I have very much always expected that at some point we
> would analyse the queries and automatically produce an index for them. This
> seems like a great step in between.
> 
> On 4 January 2016 at 13:27, Robert Kowalski  wrote:
> 
>> Hi list,
>> 
>> I hope you had awesome holidays!
>> 
>> The whole holidays I thought about an idea I had and today I
>> implemented a prototype which still has some bugs and isn't complete
>> yet.
>> 
>> I want to find out if there is general interest and if it would be
>> worth to spend more time.
>> 
>> The problem I am trying to solve is that I usually have a hard time
>> explaining people how views work. Now we got Mango and I can just say:
>> we use a syntax similar to MongoDB's query language _but you have to
>> create an index before you can use it_.
>> 
>> At this point I usually look into sad, big eyes because no one
>> understands why they have to create an index first and I feel there is
>> another entry barrier for newcomers. If trying anyway given they have
>> decided for CouchDB the user gets a error back: "no index available
>> for this selector".
>> 
>> The idea of this patch is to just fallback on the "give me all docs
>> and i filter afterwards"-trick that people usually use (if they know
>> it) when they just want to test something, without creating an index
>> which can take time for creation and requires further knowledge.
>> Additionally the user is warned that they can create an index to make
>> the queries faster.
>> 
>> What do you think? Is that something worth to work on further? The PR
>> is at https://github.com/apache/couchdb-mango/pull/27
>> 
>> You can test it with basic queries on a database which does not have
>> indexes for the fields you want to query created yet.
>> 
>> 
>> Best,
>> Robert :)
>> 



Re: [POC] Mango Catch All Selector

2016-01-04 Thread Dale Harvey
I havent yet started looking into the implementation details, but when
using pouchdb-find I have very much always expected that at some point we
would analyse the queries and automatically produce an index for them. This
seems like a great step in between.

On 4 January 2016 at 13:27, Robert Kowalski  wrote:

> Hi list,
>
> I hope you had awesome holidays!
>
> The whole holidays I thought about an idea I had and today I
> implemented a prototype which still has some bugs and isn't complete
> yet.
>
> I want to find out if there is general interest and if it would be
> worth to spend more time.
>
> The problem I am trying to solve is that I usually have a hard time
> explaining people how views work. Now we got Mango and I can just say:
> we use a syntax similar to MongoDB's query language _but you have to
> create an index before you can use it_.
>
> At this point I usually look into sad, big eyes because no one
> understands why they have to create an index first and I feel there is
> another entry barrier for newcomers. If trying anyway given they have
> decided for CouchDB the user gets a error back: "no index available
> for this selector".
>
> The idea of this patch is to just fallback on the "give me all docs
> and i filter afterwards"-trick that people usually use (if they know
> it) when they just want to test something, without creating an index
> which can take time for creation and requires further knowledge.
> Additionally the user is warned that they can create an index to make
> the queries faster.
>
> What do you think? Is that something worth to work on further? The PR
> is at https://github.com/apache/couchdb-mango/pull/27
>
> You can test it with basic queries on a database which does not have
> indexes for the fields you want to query created yet.
>
>
> Best,
> Robert :)
>


[POC] Mango Catch All Selector

2016-01-04 Thread Robert Kowalski
Hi list,

I hope you had awesome holidays!

The whole holidays I thought about an idea I had and today I
implemented a prototype which still has some bugs and isn't complete
yet.

I want to find out if there is general interest and if it would be
worth to spend more time.

The problem I am trying to solve is that I usually have a hard time
explaining people how views work. Now we got Mango and I can just say:
we use a syntax similar to MongoDB's query language _but you have to
create an index before you can use it_.

At this point I usually look into sad, big eyes because no one
understands why they have to create an index first and I feel there is
another entry barrier for newcomers. If trying anyway given they have
decided for CouchDB the user gets a error back: "no index available
for this selector".

The idea of this patch is to just fallback on the "give me all docs
and i filter afterwards"-trick that people usually use (if they know
it) when they just want to test something, without creating an index
which can take time for creation and requires further knowledge.
Additionally the user is warned that they can create an index to make
the queries faster.

What do you think? Is that something worth to work on further? The PR
is at https://github.com/apache/couchdb-mango/pull/27

You can test it with basic queries on a database which does not have
indexes for the fields you want to query created yet.


Best,
Robert :)