On Tue, Apr 2, 2019 at 3:14 AM Adam Kocoloski <kocol...@apache.org> wrote:
> Hi Will, great comments, I have replies to a couple of them. > > > On Apr 1, 2019, at 5:21 AM, Will Holley <willhol...@gmail.com> wrote: > > > > 2. Does the ICU sort key have a bounded length? Mostly I'm wondering > > whether we can guarantee that the generated keys will fit within the > > maximum FDB key length or if there needs to be some thought as to the > > failure mode / workaround. As Adam mentioned, it seems fine to store an > > encoded key given Mango (currently) always fetches the associated > document > > / fields from the primary index to filter on anyway. It might even be > > beneficial to have an additional layer of indirection and allow multiple > > docs to be associated with each row so that we can maintain compact keys. > > Interesting thought on that layer of indirection; it reminds me of an > optimization applied in the Record Layer’s text indexes. Would have to > compare whether the extra reads needed to maintain the index that way are > an acceptable tradeoff. > > Good point on the sort key sizes, I’ve not seen any way to place a > reliably safe upper bound on the size of one that might be generated. The > ICU folks have some hand-wavey guidance at > http://userguide.icu-project.org/collation/architecture#TOC-Sort-key-size, > but it seems like we might be able to dig a little deeper. > > I personally haven’t given much thought to a workaround where a > user-defined index key exceeds 10 KB. We’ll definitely need to handle that > failure mode safely even without the sort key complication — people try > crazy things :) > For the 10 KB error, I think we should just return an error. As a comparison, MongoDB has a 1024 Byte limit https://docs.mongodb.com/manual/reference/limits/#Index-Key-Limit > > 3. I don't immediately see how you clear previous values from the index > > when a doc is updated, but I could easily be missing something obvious :) > > Ah yeah, this part wasn’t explicit, was it? > > I think the idea is that these are simple indexes on specific fields of a > document, and we have a data model where those fields are already stored as > their own keys in FDB, so there’s no need (in the case of Mango) to > maintain a separate docid -> {viewid, [keys]} mapping like we do today in > each view group. Rather, the flow would go something like > > 1) Check which fields are supposed to be indexed > 2) Retrieve values for those fields in the ?DOCUMENTS space for the parent > revision > 3) Compare the parent values with the ones supplied in this transaction; > if any indexed values change, clear the old ones and insert the new ones > > with some additional caveats around checking that the supplied edit is > actually going to be winning (and therefore indexed) version after the > commit succeeds. > > > 4. Regarding "Index on write" behaviour, is there something in the > existing > > design (Mango overlaying mrview / lucene) that would prevent this? I can > > see some benefit for certain workloads (and headaches for others) but I > > don't see that it's necessarily coupled to the Mango design given > > background indexing of new/changed indexes needs to be supported anyway. > > I’m not sure I understand your question. In my mind the reason “index on > write" is more applicable for Mango JSON than for generalized views is > because in the view case batching is currently quite important to achieve > good throughput to the JS system. You’re of course correct that we need to > be able to re-generate Mango JSON indexes in the background as well. > > Adam > > >