Re: [DISCUSS] Implementing Mango Indexes for FoundationDB

Adam Kocoloski Mon, 01 Apr 2019 18:14:45 -0700

Hi Will, great comments, I have replies to a couple of them.

> On Apr 1, 2019, at 5:21 AM, Will Holley <[email protected]> wrote:
> 
> 2. Does the ICU sort key have a bounded length? Mostly I'm wondering
> whether we can guarantee that the generated keys will fit within the
> maximum FDB key length or if there needs to be some thought as to the
> failure mode / workaround. As Adam mentioned, it seems fine to store an
> encoded key given Mango (currently) always fetches the associated document
> / fields from the primary index to filter on anyway. It might even be
> beneficial to have an additional layer of indirection and allow multiple
> docs to be associated with each row so that we can maintain compact keys.


Interesting thought on that layer of indirection; it reminds me of an 
optimization applied in the Record Layer’s text indexes. Would have to compare 
whether the extra reads needed to maintain the index that way are an acceptable 
tradeoff.

Good point on the sort key sizes, I’ve not seen any way to place a reliably 
safe upper bound on the size of one that might be generated. The ICU folks have 
some hand-wavey guidance at 
http://userguide.icu-project.org/collation/architecture#TOC-Sort-key-size, but 
it seems like we might be able to dig a little deeper.

I personally haven’t given much thought to a workaround where a user-defined 
index key exceeds 10 KB. We’ll definitely need to handle that failure mode 
safely even without the sort key complication — people try crazy things :)

> 3. I don't immediately see how you clear previous values from the index
> when a doc is updated, but I could easily be missing something obvious :)

Ah yeah, this part wasn’t explicit, was it?

I think the idea is that these are simple indexes on specific fields of a 
document, and we have a data model where those fields are already stored as 
their own keys in FDB, so there’s no need (in the case of Mango) to maintain a 
separate docid -> {viewid, [keys]} mapping like we do today in each view group. 
Rather, the flow would go something like

1) Check which fields are supposed to be indexed
2) Retrieve values for those fields in the ?DOCUMENTS space for the parent 
revision
3) Compare the parent values with the ones supplied in this transaction; if any 
indexed values change, clear the old ones and insert the new ones

with some additional caveats around checking that the supplied edit is actually 
going to be winning (and therefore indexed) version after the commit succeeds.

> 4. Regarding "Index on write" behaviour, is there something in the existing
> design (Mango overlaying mrview / lucene) that would prevent this? I can
> see some benefit for certain workloads (and headaches for others) but I
> don't see that it's necessarily coupled to the Mango design given
> background indexing of new/changed indexes needs to be supported anyway.

I’m not sure I understand your question. In my mind the reason “index on write" 
is more applicable for Mango JSON than for generalized views is because in the 
view case batching is currently quite important to achieve good throughput to 
the JS system. You’re of course correct that we need to be able to re-generate 
Mango JSON indexes in the background as well.

Adam

Re: [DISCUSS] Implementing Mango Indexes for FoundationDB

Reply via email to