> On 2. Apr 2019, at 15:10, Adam Kocoloski <[email protected]> wrote:
> 
> 
>> On Apr 2, 2019, at 8:10 AM, Jan Lehnardt <[email protected]> wrote:
>> 
>>> On 28. Mar 2019, at 12:01, Garren Smith <[email protected]> wrote:
>>> 
>>> In terms of keeping mango indexes up to date, we should be able to update
>>> all existing indexes in the same transaction as a document is
>>> updated/created, this means we shouldn’t have to have any background
>>> process keeping mango indexes updated.
>> 
>> 
>> Is this a specific design goal and if yes why? I don’t mind to have this
>> as an option, but I feel it might be easier to go with the lazy/batch-update
>> process in a first pass, as that should both keep the implementation simpler
>> and keep room in doc update transactions.
>> 
>> It’s rather a nice point when discussing CouchDB that you can have as many
>> indexes as you like and incur no write penalties, whereas in other dbs you
>> usually are encouraged to have exactly as many indexes as needed to avoid
>> a write penalty.
>> 
>> Best
>> Jan
> 
> Fair point. There are two drivers for me:
> 
> - Space efficiency
> - Avoiding operational complexities with indexes falling behind
> 
> When I was looking into this design I realized that, for Mango, the async 
> approach produces indexes that are ~2x as large as the “index-on-write” 
> approach. The extra docid -> view_key mapping needed for async indexing is 
> the same size as the view_key ->  docid one that the user actually wants. 
> Indexing on write allows us to reuse the document data space to retrieve the 
> old docid -> view_key mapping just before removing that data from the 
> datastore.
> 
> I also know that we’ve had plenty of instances over the years where we needed 
> to do careful tuning to keep critical query response times low in the face of 
> a write-intensive application. We’d end up pushing the latency back onto the 
> writers indirectly through queue tuning, or the user would have to resort to 
> stale=ok indexes.
> 
> As a bit of an aside, I also expect that the efficiencies we achieved through 
> batching in our current implementation will be less pronounced when working 
> with FoundationDB. Currently the by_seq index contains pointers directly to 
> the document data on disk and we could skip the entire btree overhead while 
> streaming that data to the indexers, whereas in FDB it’ll just have a bunch 
> of docid/rev pairs and the actual document data will be dispersed elsewhere.
> 

Thanks Adam, this all makes a lot of sense.

I don’t feel strongly about whether we should do this right away, but since we 
need the async behaviour as well anyway, it might be a worthwhile 
time-trade-off to do this optimisation at a later point.

Best
Jan
-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Reply via email to