Re: [POC] Mango Catch All Selector

Robert Kowalski Tue, 05 Jan 2016 03:53:05 -0800

Hi there!

Thanks for the detailed responses Garren, Paul, Sebastian, Dale and
Tony! It looks like that we have consensus that it should be easier to
query data with Mango. :)


I like the idea with the automatic index creation, probably just if
the user opts in, as I also think that index creation needs more
planning. Automatic index creation is probably another whole topic on
its own I am afraid, I would like to talk about it separately.

Regarding querying I would really like to keep it very simple for
newcomers and implement the "fallback" as transparent as possible.

A newcomer usually doesn't have much data in their databases when they
start to try out a new database system and play around. I agree with
Paul that the current behaviour is different from Mongo and also from
other successful databases (e.g. MySQL and Postgres). They all let you
query without indexes, but you can add them later for performance
optimisation. I feel unsuer about developing: true. Adding
"developing: true" or other parameters deviates further from those
systems which are very friendly to beginners, I would like to avoid
that.

I want to add that for more advanced production use cases the other
DBMS usually offer performance measuring tools, here is Mongo
https://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
and here is MySQL
http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html as examples
to help the user to detect slow queries and to optimise them. Rails
(is it still called Active Record?) will also explain and warn on slow
queries without index:
http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain/

Maybe the mentioned problem for people that complain about slow
responses because they use the sometimes advertised id > null hack is
that they don't have enough tooling / insight to notice those
problems? Then this would be probably another task, something along
the lines of "improve query optimisation capabilities for users".

The idea I had for now is to log slow queries in the logfile. From
there monitoring tools etc could pick it up and inform the user in
some way if they don't run Couch on their own machine. The
documentation (right now Mango has no docs at all for CouchDB) should
basically explain that it runs out of the box, but if people go into
production, they should create indexes. (see also
https://docs.mongodb.org/manual/core/indexes-introduction/ which does
that quite nice).

I think the timeout for queries is a good idea. It could be also
configurable from the HTTP client. When the timeout is hit we could
inform the user that we think an index would be a better fit and maybe
even give them the curl command to create the index in the error
message, and also that they can raise the current timeout of X sec
using ?timeout=[newtimeout]. I have to confess that I am right now
just assuming how our users would use it and if they need a timeout.
Right now I am not sure if the timeout solves the root cause of the
problem or the symptoms. Do they want a query optimisation tool or
something else that helps them to explain why their queries are slow?
(see also section "improve query optimisation capabilities for users")


Summary of my open questions:

- do you agree that users don't have enough tooling / insight to
notice those problems? Should we handle query performance optimisation
as a separate problem from using _all_docs as it can happen in more
cases?

- how can we document Mango better, or at all? Any volunteers?

- is a timeout solving the root cause or the symptoms? Could it be a
temporary or additional step as in conjunction with query optimisation
tooling?


On Mon, Jan 4, 2016 at 9:55 PM, Tony Sun <tony.sun...@gmail.com> wrote:
> Hi all,
>
>     Hope everyone enjoyed the holidays!
>
>     This is the most common mango experience for new users:
>
>     1) Syntax issues to create an index.
>     2) Running into the "no index found" error because his or her query
> (with and w/o sort) doesn't match the index correctly.
>     3) We explain how views work and also suggest our all_docs hack.
>     4) Then the user complains that their query is slow(due to all_docs or
> large result set), and again we try to either optimize the index or suggest
> using text indexes (the new open-sourced   feature).
>
>     A lot of users are turned off by the usability issues encountered in 1)
> and 2). I agree that we should make it as easy as possible for first time
> users, so I am okay with removing the need to create an index first.
> However, we need to somehow explicitly let the user know about all_docs so
> they don't abuse this capability. Also, like mongo, we could internally
> check if the current index is an all_docs index and throw a timeout/size
> error for a particular query?
>
>
> Thanks,
>
>
> Tony
>
> On Mon, Jan 4, 2016 at 11:49 AM, Sebastian Rothbucher <
> sebastianrothbuc...@googlemail.com> wrote:
>
>> Hi Robert,
>>
>> I'm with you that the easier we can make it for s/o to get started the
>> better.
>> And I think falling back to a full table scan with a log written is a good
>> and easy way to go. I'd even set the log level to info or even warning to
>> make it clear that there's a problem with huge data sets. And hopefully,
>> people run some load test before going into production ;-)
>>
>> The only other idea I had (a button "use default index" in Fauxton that
>> modifies the selector) looks daft on second thought
>>
>> - I do like your idea though
>>
>> Best
>>     Sebastian
>>
>>
>> On Mon, Jan 4, 2016 at 8:04 PM, Paul Davis <paul.joseph.da...@gmail.com>
>> wrote:
>>
>> > Hey all,
>> >
>> > I meant to reply to the ticket on pouchdb-find but got distracted by
>> > the holidays.
>> >
>> > I wanted to note that the original motivation for rejecting a selector
>> > that doesn't have an index was to avoid the specific situation where a
>> > user has a query that appears to run quite quickly in testing/dev but
>> > fails or results in timeouts in production due to a different data
>> > set. This was definitely a deviation from the MongoDB approach. The
>> > last I read their docs on this they mentioned in a couple places that
>> > while an index is not required there are limits on result set sizes
>> > and (I think?) query time. I made the choice that rather than fail
>> > eventually to fail quickly and hopefully be descriptive of why the
>> > query failed. For instance, there should be a note in the error
>> > response when no index is available that describes which fields could
>> > be indexed to satisfy the query.
>> >
>> > On the other hand, once we had users actually playing with this
>> > feature there were quite a few instances of, "I just want to try this
>> > query without waiting for an index to build." and I made the clever
>> > suggestion that just adding the {"$and": [Query, {"_id": {"$gt":
>> > null}}]} wrapper would cause a full table scan. That's obviously a
>> > hack and I was fine with that because it seemed like an obvious hack
>> > that would motivate users to create the appropriate index before
>> > moving to production.
>> >
>> > On the flip side it seems like for some people the hack is a hurdle
>> > into learning the query capabilities as well as adding to the overhead
>> > of learning CouchDB in general. And this particular feature was aimed
>> > directly at providing an easier on-ramp to CouchDB for people coming
>> > from other databases. Given what I've read here and elsewhere perhaps
>> > what might be easiest would be to add a feature along the lines of
>> > "developing": "true" to the _find request body that would enable the
>> > _all_docs fold. This would provide two benefits in that internally we
>> > could throw different errors in specific cases. For instances, some
>> > selectors fail because they can't run against a map/reduce index (ie,
>> > $or) and that won't change no matter what map/reduce indexes are
>> > added. If we just wrap the the _all_docs hack this changes the
>> > behavior which would probably surprise new users.
>> >
>> > On the other hand, indexes can be operationally quite costly and
>> > require planning to handle capacity so I would definitely avoid
>> > automatically creating them from the _find endpoint. Perhaps we could
>> > add a feature for the _index endpoint that accepts a selector and
>> > figures out the index to create. Which I think is along the lines of
>> > what Dale mentioned but with a slightly more on purpose interaction
>> > from the user.
>> >
>> > Paul
>> >
>> > On Mon, Jan 4, 2016 at 8:05 AM, Garren Smith <gar...@apache.org> wrote:
>> > > Hi Robert,
>> > >
>> > > This is cool. I think it links in with this
>> > https://issues.apache.org/jira/browse/COUCHDB-2928 <
>> > https://issues.apache.org/jira/browse/COUCHDB-2928> and this
>> > https://github.com/nolanlawson/pouchdb-find/issues/138 <
>> > https://github.com/nolanlawson/pouchdb-find/issues/138>
>> > >
>> > > Cheers
>> > > Garren
>> > >
>> > >> On 04 Jan 2016, at 2:33 PM, Dale Harvey <d...@arandomurl.com> wrote:
>> > >>
>> > >> I havent yet started looking into the implementation details, but when
>> > >> using pouchdb-find I have very much always expected that at some point
>> > we
>> > >> would analyse the queries and automatically produce an index for them.
>> > This
>> > >> seems like a great step in between.
>> > >>
>> > >> On 4 January 2016 at 13:27, Robert Kowalski <r...@kowalski.gd> wrote:
>> > >>
>> > >>> Hi list,
>> > >>>
>> > >>> I hope you had awesome holidays!
>> > >>>
>> > >>> The whole holidays I thought about an idea I had and today I
>> > >>> implemented a prototype which still has some bugs and isn't complete
>> > >>> yet.
>> > >>>
>> > >>> I want to find out if there is general interest and if it would be
>> > >>> worth to spend more time.
>> > >>>
>> > >>> The problem I am trying to solve is that I usually have a hard time
>> > >>> explaining people how views work. Now we got Mango and I can just
>> say:
>> > >>> we use a syntax similar to MongoDB's query language _but you have to
>> > >>> create an index before you can use it_.
>> > >>>
>> > >>> At this point I usually look into sad, big eyes because no one
>> > >>> understands why they have to create an index first and I feel there
>> is
>> > >>> another entry barrier for newcomers. If trying anyway given they have
>> > >>> decided for CouchDB the user gets a error back: "no index available
>> > >>> for this selector".
>> > >>>
>> > >>> The idea of this patch is to just fallback on the "give me all docs
>> > >>> and i filter afterwards"-trick that people usually use (if they know
>> > >>> it) when they just want to test something, without creating an index
>> > >>> which can take time for creation and requires further knowledge.
>> > >>> Additionally the user is warned that they can create an index to make
>> > >>> the queries faster.
>> > >>>
>> > >>> What do you think? Is that something worth to work on further? The PR
>> > >>> is at https://github.com/apache/couchdb-mango/pull/27
>> > >>>
>> > >>> You can test it with basic queries on a database which does not have
>> > >>> indexes for the fields you want to query created yet.
>> > >>>
>> > >>>
>> > >>> Best,
>> > >>> Robert :)
>> > >>>
>> > >
>> >
>>

Re: [POC] Mango Catch All Selector

Reply via email to