Hi Garren, cool, this is a good start. On the ICU side of things, Russell pointed out that sort keys are a one-way trip; i.e., there’s no way to recover the original string from a sort key. For the initial pass at Mango I think that’s OK, as we’re reading the indexed documents anyway. When we get to views I guess the design will need to store the original string in the value so that we can return it as the “key” field in the response.
Adam > On Mar 28, 2019, at 7:01 AM, Garren Smith <gar...@apache.org> wrote: > > Hi everyone, > > > I want to start a discussion, with the aim of an RFC, around implementing > Mango JSON indexes for FoundationDB. Currently Mango indexes are a layer > above CouchDB map/reduce indexes, but with FoundationDB we can make them > separate indexes in FoundationDB. This gives us the possibility of being > able to update the indexes in the same transaction that a document is being > saved in. Later we can look at adding specific mango like covering indexes. > > > Lets dive into the data model. Currently a user defines an index like this: > > > { > > name: ‘view-name’ - optional will be auto-generated > > index: { > > fields: [‘fieldA’, ‘fieldB’] > > }, > > partial_filter_selector {} - optional > > } > > > For query planning we need to be able to access the list of available > indexes. So we would have a index_definitions subspace with the following > content: > > > (<fieldname1>, …<rest of fields>) = (<index_name>, > <partial_filter_selector>) > > > Otherwise we could just store the index definitions as: > > (index_name) = ((fields), partial_filter_selector). > > > At this stage, I can’t think of a fancy way of storing the index > definitions so that when we need to select an index for a query there would > be a fast way to only fetch a subset of the indexes. I think the best is to > rather fetch them all like we currently do and process them. However, we > can look at caching these index definitions in the application layer, and > using FoundationDB watches[0] to notify us when a definition has changed so > we can update the cached definitions. > > > Then each index definition will have its own dedicated subspace for the > actual built index key/values. Keys in this subspace would be the fields > defined in the index with the doc id at the end of the tuple, e.g for an > index with fields name and age, it would be: > > > (“john”, 40, “doc-id-1) = null > > (“mary”, 30, “doc-id-2) = null > > > This follows the same key format that document layer[1] does for its > indexes. One point to make here is that the doc id is kept in the key part > so that we can avoid duplicate keys. > > > Then in terms of sorting the keys, current CouchDB uses ICU to sort all > secondary indexes. We would need to use ICU to sort the indexes for FDB but > we would have to do it differently. We will not be able to use ICU > collation operations directly, instead, we are going to have to look at > using ICU’s sort key[1] to generate a sort key ahead of time. At the same > time we need to look at creating binary encoding to capture the way that > CouchDB currently sorts object, array and numbers. This would most likely > be a sort of key prefix that we add to each key field along with the sort > key generated from ICU. > > > In terms of keeping mango indexes up to date, we should be able to update > all existing indexes in the same transaction as a document is > updated/created, this means we shouldn’t have to have any background > process keeping mango indexes updated. Though I imagine we going to have to > look at a background process that does update and build new indexes on an > existing index. We will have to do some decent performance testing around > this to determine the best solution, but looking at document layer they > seem to recommend updating the indexes in the transaction rather than in a > background process. > > > In the future, we could look at using the value space to store covering > indexed or materialized views. That way we would not need to always read > from the by_id when quering with Mango. Which would be a nice performance > improvement. > > > > Please let me know any thoughts, improvements, suggestions or questions > around this. > > > > [0] https://apple.github.io/foundationdb/features.html#watches > > [1] https://github.com/FoundationDB/fdb-document-layer > > [2] http://userguide.icu-project.org/collation/api#TOC-Sort-Key-Features