> On 2. Apr 2019, at 15:10, Adam Kocoloski <[email protected]> wrote:
>
>
>> On Apr 2, 2019, at 8:10 AM, Jan Lehnardt <[email protected]> wrote:
>>
>>> On 28. Mar 2019, at 12:01, Garren Smith <[email protected]> wrote:
>>>
>>> In terms of keeping mango indexes up to date, we should be able to update
>>> all existing indexes in the same transaction as a document is
>>> updated/created, this means we shouldn’t have to have any background
>>> process keeping mango indexes updated.
>>
>>
>> Is this a specific design goal and if yes why? I don’t mind to have this
>> as an option, but I feel it might be easier to go with the lazy/batch-update
>> process in a first pass, as that should both keep the implementation simpler
>> and keep room in doc update transactions.
>>
>> It’s rather a nice point when discussing CouchDB that you can have as many
>> indexes as you like and incur no write penalties, whereas in other dbs you
>> usually are encouraged to have exactly as many indexes as needed to avoid
>> a write penalty.
>>
>> Best
>> Jan
>
> Fair point. There are two drivers for me:
>
> - Space efficiency
> - Avoiding operational complexities with indexes falling behind
>
> When I was looking into this design I realized that, for Mango, the async
> approach produces indexes that are ~2x as large as the “index-on-write”
> approach. The extra docid -> view_key mapping needed for async indexing is
> the same size as the view_key -> docid one that the user actually wants.
> Indexing on write allows us to reuse the document data space to retrieve the
> old docid -> view_key mapping just before removing that data from the
> datastore.
>
> I also know that we’ve had plenty of instances over the years where we needed
> to do careful tuning to keep critical query response times low in the face of
> a write-intensive application. We’d end up pushing the latency back onto the
> writers indirectly through queue tuning, or the user would have to resort to
> stale=ok indexes.
>
> As a bit of an aside, I also expect that the efficiencies we achieved through
> batching in our current implementation will be less pronounced when working
> with FoundationDB. Currently the by_seq index contains pointers directly to
> the document data on disk and we could skip the entire btree overhead while
> streaming that data to the indexers, whereas in FDB it’ll just have a bunch
> of docid/rev pairs and the actual document data will be dispersed elsewhere.
>
Thanks Adam, this all makes a lot of sense.
I don’t feel strongly about whether we should do this right away, but since we
need the async behaviour as well anyway, it might be a worthwhile
time-trade-off to do this optimisation at a later point.
Best
Jan
--
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/