Re: [POC] Mango Catch All Selector

Garren Smith Tue, 12 Jan 2016 00:04:05 -0800

Hi All,

I really like the idea of returning an error message about it being slow along 
with a helpful url. I don’t really like the idea of a `slow` or `developer` 
flag for the actual query, I think that will add some confusion.


Cheers
Garren

> On 11 Jan 2016, at 8:55 PM, Tony Sun <tony.sun...@gmail.com> wrote:
> 
> Hi Robert,
> 
>    Building upon what others have stated above, what do you think about
> the following:
> 
>    1) Let the user query without creating an index
>    2) Return an error message with a new url that has
> "slow/no_index/developer":true appended at the end. The message clearly
> explains that this query will be slow, and that creating an index will be
> more efficient. However, he or she can continue. The error message will
> then have a link to point to our documentation.
>    3) In Fauxton, there is a checkbox or button that also appends the
> "slow/no_index/developer":true to the _find url. If the user clicks it,
> then the same message pops up to notify the user.
> 
> 
> 
> Tony
> 
> 
> 
> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) <wickedg...@gmail.com>
> wrote:
> 
>> Just wanted to chime in here as a user - I've run into similar
>> behavior from CouchDB with the reduce-not-reducing-enough heuristic,
>> where stuff I was working on went smoothly in dev, but stopped once
>> real load was pushed through it (thankfully for me, that was in
>> testing, rather than released to customers).
>> 
>> It's a frustrating experience, and I don't think that a reputation for
>> "works until you cross a threshold, and then it doesn't, but only in
>> production" is a good thing to move towards.
>> 
>> Perhaps something like adding a key to the returned data along the
>> lines of "_slow_warning": "This query is going to be slow on large
>> data sets. See http://..."; in addition to the ?slow_warning=true query
>> param (note that I'm calling it "slow_warning" in both places only to
>> increase discoverability; without the url param, the no-index query
>> wouldn't work at all). Bikeshed the name as needed.
>> 
>> I'd like to see a lot more URLs in CouchDB error messages in general,
>> actually - I would find it very useful when trying to determine what's
>> going wrong to have a URL right there in the logs that I can get more
>> information from.
>> 
>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <woh...@apache.org> wrote:
>>> Hi Robert,
>>> 
>>> I've been thinking about this one for the week or so, and I have a
>>> simple suggestion:
>>> 
>>>  Add the query parameter slow=true to enable this behaviour.
>>> 
>>> This meets all the original requirements:
>>> 
>>> 1. It is not default behaviour
>>> 2. You can grep the log files for the word 'slow' and find evidence
>>> 3. There is a shorthand, simple way to enable the behaviour
>>> 4. Any self-respecting developer will try to remove slow=true, find
>>>   a break, and be forced to learn about indexes
>>> 5. It's a bit cheeky, which I think is kind of fun :D
>>> 
>>> All the best,
>>> Joan
>>> 
>>> ----- Original Message -----
>>>> From: "William Edney" <bed...@technicalpursuit.com>
>>>> To: dev@couchdb.apache.org
>>>> Sent: Friday, January 8, 2016 10:27:29 AM
>>>> Subject: Re: [POC] Mango Catch All Selector
>>>> 
>>>> Hi Robert -
>>>> 
>>>> As a builder of UI, API and library code who has also done developer
>>>> training on a variety of technologies, one simple fix might be go
>>>> ahead and
>>>> not require indexes to be built, but then to put a big NOTE at the
>>>> beginning of the "Mango Getting Started" guide (I would assume there
>>>> is
>>>> such a piece of documentation) that states: "Note that the examples
>>>> in this
>>>> document do not require you to build an index, but for performance
>>>> reasons
>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more information
>>>> about
>>>> how to do that" (or some such verbiage).
>>>> 
>>>> My 2 cents.
>>>> 
>>>> Cheers,
>>>> 
>>>> - Bill
>>>> 
>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <r...@kowalski.gd>
>>>> wrote:
>>>> 
>>>>> Hi list,
>>>>> 
>>>>> At the end of the mail I would like to invite the other folks from
>>>>> the
>>>>> mailing list that build interfaces for humans (APIs, CLIs or even
>>>>> UIs)
>>>>> to chime in again with their opinions. So all people one the ML,
>>>>> the
>>>>> mail is not just a response to Paul, feedback is welcome :)
>>>>> 
>>>>> Hi Paul, I agree with the timeout. It could lead to very unpleasant
>>>>> errors which are hard to debug and support.
>>>>> 
>>>>> I added some thoughts to the other points you made:
>>>>> 
>>>>>> a) know that the slow queries logs exist,
>>>>> 
>>>>> Hmm... If I take a look at the 1.x logging it was very
>>>>> straightforward. As a developer you would spin up a CouchDB and you
>>>>> get all the log messages into your terminal. It was quite handy in
>>>>> general for all kind of debugging. That the logs are not displayed
>>>>> directly on stdout/stderr is in my opinion a general 2.x problem.
>>>>> The
>>>>> problem does occur with all kinds of log message we produce in
>>>>> CouchDB
>>>>> for 2.x and is not specific to the slow-query-logging.
>>>>> 
>>>>> 
>>>>>> Ie, "You can try queries with testing:true, when you're ready to
>>>>>> move to
>>>>> production you can
>>>>>> POST your selector to _index to create the index which allows you
>>>>>> to
>>>>>> remove testing:true".
>>>>> 
>>>>> I really like the migration path you mentioned here with the API to
>>>>> create indexes. I am worried to have a too high entry barrier for
>>>>> absolute newcomers, people that you want to play around before they
>>>>> are ready to think about indexes, e.g. by putting coupling the
>>>>> index
>>>>> topic from the beginning to the querying.
>>>>> 
>>>>> When I throw too much things to learn on people (which  may not
>>>>> have
>>>>> used a database before), most people get discouraged and does not
>>>>> take
>>>>> a look. The usual things they feel or say are : "too complicated",
>>>>> "I
>>>>> have not enough time", "product XY is easier to use".
>>>>> 
>>>>> I would argue that newcomers to a database will launch a high
>>>>> traffic,
>>>>> multi-gigabyte product with the database from day one. Day one is
>>>>> the
>>>>> day where they learn how to query the data and put data into the
>>>>> database. Even for scenarios where people have a running high
>>>>> traffic
>>>>> system, and have used other databases at a medium to large scale I
>>>>> would expect given they migrate to Couch, that they run both
>>>>> systems
>>>>> in parallel for the first time in order to fix the issues that
>>>>> occur
>>>>> during a migration.
>>>>> 
>>>>> I think we we share the same goal (getting beginners started
>>>>> quickly)
>>>>> and the cool thing about your suggestion is that everyone gets the
>>>>> required knowledge to run a production system right from the very
>>>>> start. My suggestion leaves some parts out, but reduces the
>>>>> cognitive
>>>>> load required to get the very first basic results, e.g. in a
>>>>> university class setting - or junior developers on their "casual
>>>>> friday 20% time". My big hope is, once those folks build high
>>>>> traffic
>>>>> systems, they remember how easy the usage of CouchDB was and that
>>>>> they
>>>>> start to learn more about CouchDB in order to run it in a system
>>>>> with
>>>>> more than a few thousand documents.
>>>>> 
>>>>> 
>>>>> For us both I think the "what" is clear, but the "how" is a bit
>>>>> different. I also think this discussion still makes progress, but I
>>>>> am
>>>>> afraid it could stall. I see that we both have very good rudiments
>>>>> and
>>>>> I would like to invite the other folks from the mailing list that
>>>>> build interfaces for humans (APIs, CLIs or even UIs) to chime in
>>>>> again
>>>>> with their opinions - of course I'm also looking forward to your
>>>>> answer :)
>>>>> 
>>>>> Best,
>>>>> Robert :)
>>>>> 
>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>>>>> <paul.joseph.da...@gmail.com>
>>>>> wrote:
>>>>>>>> - is a timeout solving the root cause or the symptoms? Could it
>>>>>>>> be a
>>>>>>>> temporary or additional step as in conjunction with query
>>>>>>>> optimisation
>>>>>>>> tooling?
>>>>>>> 
>>>>>>> It really depends. From my CouchDB admin and user perspective,
>>>>>>> this
>>>>>>> doesn't seem so important to me right now. However, I recognize
>>>>>>> that
>>>>>>> there are different usage scenarios with different requirents
>>>>>>> (e.g. the
>>>>>>> ones at Cloudant).
>>>>>> 
>>>>>> I don't think there's anything special about Cloudant in this
>>>>>> discussion. Its just a question of how do we allow new users the
>>>>>> ability to easily test and learn the selector/query API while
>>>>>> also
>>>>>> preventing them from going too far without creating indexes for
>>>>>> their
>>>>>> queries. The slow queries messages are fine, but just as any
>>>>>> other
>>>>>> database they don't really prompt the developer to make the
>>>>>> correct
>>>>>> change. Ie, the developer has to be savvy enough to a) know that
>>>>>> the
>>>>>> slow queries logs exist, b) understand that creating an index
>>>>>> would
>>>>>> speed things up, and then c) know which index to create based on
>>>>>> the
>>>>>> logged query.
>>>>>> 
>>>>>> In my experience, the group of users that we're concerned about
>>>>>> in
>>>>>> this discussion most likely don't know about any of those three
>>>>>> things, hence why the current API is designed to force them to
>>>>>> learn
>>>>>> about and understand indexes as part of learning the API. Granted
>>>>>> the
>>>>>> `_id > null` trick muddies that learning process. I would think
>>>>>> that
>>>>>> replacing the _id trick with `"testing": true` or similar would
>>>>>> be an
>>>>>> obvious indication to users that this is a dev/debug type feature
>>>>>> and
>>>>>> when they went to production they would still be pushed to using
>>>>>> an
>>>>>> index. If we add the "create index from selector" API then I
>>>>>> think
>>>>>> this would be a relatively straightforward method to on ramping
>>>>>> to
>>>>>> both the query and index sides of the API. Ie, "You can try
>>>>>> queries
>>>>>> with testing:true, when you're ready to move to production you
>>>>>> can
>>>>>> POST your selector to _index to create the index which allows you
>>>>>> to
>>>>>> remove testing:true".
>>>>>> 
>>>>>> That's also why I don't particularly care for the timeout
>>>>>> approach.
>>>>>> It's a binary threshold that a user would (maybe) meet after some
>>>>>> unknown amount of time after they falsely believe their app is
>>>>>> working
>>>>>> correctly. The feedback is "Everything is fine until it isn't".
>>>>>> Consider an app that's been working for a week or a month or more
>>>>>> that
>>>>>> suddenly starts throwing timeouts for a query. From the user's
>>>>>> perspective the database broke because the query that used to
>>>>>> work
>>>>>> fine no longer does. And then there's the follow on question on
>>>>>> how
>>>>>> that timeout might instruct the user that they need an index, and
>>>>>> that
>>>>>> the fix may be as easy as POSTing their selector to the _index
>>>>>> endpoint. Sure Google would most likely have the answer if our
>>>>>> docs
>>>>>> are good enough, but by that point the developer is probably
>>>>>> already
>>>>>> experiencing downtime if their app is live which means they're
>>>>>> frantically trying to fix the thing. From my point of view, a few
>>>>>> road
>>>>>> blocks that guide developers towards the correct usage early on
>>>>>> would
>>>>>> be better than letting them get to the adrenaline fueled
>>>>>> expletive
>>>>>> fountain of downtime.
>>>>> 
>>>> 
>>

Re: [POC] Mango Catch All Selector

Reply via email to