On Tue, Apr 2, 2019 at 3:14 AM Adam Kocoloski <kocol...@apache.org> wrote:

> Hi Will, great comments, I have replies to a couple of them.
>
> > On Apr 1, 2019, at 5:21 AM, Will Holley <willhol...@gmail.com> wrote:
> >
> > 2. Does the ICU sort key have a bounded length? Mostly I'm wondering
> > whether we can guarantee that the generated keys will fit within the
> > maximum FDB key length or if there needs to be some thought as to the
> > failure mode / workaround. As Adam mentioned, it seems fine to store an
> > encoded key given Mango (currently) always fetches the associated
> document
> > / fields from the primary index to filter on anyway. It might even be
> > beneficial to have an additional layer of indirection and allow multiple
> > docs to be associated with each row so that we can maintain compact keys.
>
> Interesting thought on that layer of indirection; it reminds me of an
> optimization applied in the Record Layer’s text indexes. Would have to
> compare whether the extra reads needed to maintain the index that way are
> an acceptable tradeoff.
>
> Good point on the sort key sizes, I’ve not seen any way to place a
> reliably safe upper bound on the size of one that might be generated. The
> ICU folks have some hand-wavey guidance at
> http://userguide.icu-project.org/collation/architecture#TOC-Sort-key-size,
> but it seems like we might be able to dig a little deeper.
>
> I personally haven’t given much thought to a workaround where a
> user-defined index key exceeds 10 KB. We’ll definitely need to handle that
> failure mode safely even without the sort key complication — people try
> crazy things :)
>

For the 10 KB error, I think we should just return an error. As a
comparison, MongoDB has a 1024 Byte limit
https://docs.mongodb.com/manual/reference/limits/#Index-Key-Limit


> > 3. I don't immediately see how you clear previous values from the index
> > when a doc is updated, but I could easily be missing something obvious :)
>
> Ah yeah, this part wasn’t explicit, was it?
>
> I think the idea is that these are simple indexes on specific fields of a
> document, and we have a data model where those fields are already stored as
> their own keys in FDB, so there’s no need (in the case of Mango) to
> maintain a separate docid -> {viewid, [keys]} mapping like we do today in
> each view group. Rather, the flow would go something like
>
> 1) Check which fields are supposed to be indexed
> 2) Retrieve values for those fields in the ?DOCUMENTS space for the parent
> revision
> 3) Compare the parent values with the ones supplied in this transaction;
> if any indexed values change, clear the old ones and insert the new ones
>
> with some additional caveats around checking that the supplied edit is
> actually going to be winning (and therefore indexed) version after the
> commit succeeds.
>
> > 4. Regarding "Index on write" behaviour, is there something in the
> existing
> > design (Mango overlaying mrview / lucene) that would prevent this? I can
> > see some benefit for certain workloads (and headaches for others) but I
> > don't see that it's necessarily coupled to the Mango design given
> > background indexing of new/changed indexes needs to be supported anyway.
>
> I’m not sure I understand your question. In my mind the reason “index on
> write" is more applicable for Mango JSON than for generalized views is
> because in the view case batching is currently quite important to achieve
> good throughput to the JS system. You’re of course correct that we need to
> be able to re-generate Mango JSON indexes in the background as well.
>
> Adam
>
>
>

Reply via email to