> On Apr 2, 2019, at 8:10 AM, Jan Lehnardt <[email protected]> wrote:
> 
>> On 28. Mar 2019, at 12:01, Garren Smith <[email protected]> wrote:
>> 
>> In terms of keeping mango indexes up to date, we should be able to update
>> all existing indexes in the same transaction as a document is
>> updated/created, this means we shouldn’t have to have any background
>> process keeping mango indexes updated.
> 
> 
> Is this a specific design goal and if yes why? I don’t mind to have this
> as an option, but I feel it might be easier to go with the lazy/batch-update
> process in a first pass, as that should both keep the implementation simpler
> and keep room in doc update transactions.
> 
> It’s rather a nice point when discussing CouchDB that you can have as many
> indexes as you like and incur no write penalties, whereas in other dbs you
> usually are encouraged to have exactly as many indexes as needed to avoid
> a write penalty.
> 
> Best
> Jan

Fair point. There are two drivers for me:

- Space efficiency
- Avoiding operational complexities with indexes falling behind

When I was looking into this design I realized that, for Mango, the async 
approach produces indexes that are ~2x as large as the “index-on-write” 
approach. The extra docid -> view_key mapping needed for async indexing is the 
same size as the view_key ->  docid one that the user actually wants. Indexing 
on write allows us to reuse the document data space to retrieve the old docid 
-> view_key mapping just before removing that data from the datastore.

I also know that we’ve had plenty of instances over the years where we needed 
to do careful tuning to keep critical query response times low in the face of a 
write-intensive application. We’d end up pushing the latency back onto the 
writers indirectly through queue tuning, or the user would have to resort to 
stale=ok indexes.

As a bit of an aside, I also expect that the efficiencies we achieved through 
batching in our current implementation will be less pronounced when working 
with FoundationDB. Currently the by_seq index contains pointers directly to the 
document data on disk and we could skip the entire btree overhead while 
streaming that data to the indexers, whereas in FDB it’ll just have a bunch of 
docid/rev pairs and the actual document data will be dispersed elsewhere.

Adam

Reply via email to