[GitHub] [couchdb-documentation] sansato commented on a change in pull request #407: RFC for Mango on FDB

GitBox Fri, 26 Apr 2019 10:53:47 -0700

sansato commented on a change in pull request #407: RFC for Mango on FDB
URL: 
https://github.com/apache/couchdb-documentation/pull/407#discussion_r279046353

##########
File path: rfcs/006-mango-fdb.md
##########
@@ -0,0 +1,154 @@
+# Mango RFC
+
+- - - -
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: ‘Mango JSON indexes in FoundationDB’
+labels: rfc, discussion
+assignees: ‘’
+
+- - - -
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes the data model and indexing management for Mango json
indexes in FoundationDB.
+
+## Abstract
+
+This document details the data model for storing Mango indexes. The basic
model is that we would have a namespace for storing defined indexes and then a
dedicated namespace per index for the key/values for a given index. Indexes
will be updated in the transaction that a document is written to FoundationDB.
When an index is created on an existing database, a background task will build
the index up to the Sequence that the index was created at.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
+“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`Sequence`: a 13 byte value formed by combining the current `Incarnation` of
the database and the `Versionstamp` of the transaction. Sequences are
monotonically increasing even when a database is relocated across FoundationDB
clusters. See (RFC002)[LINK TBD] for a full explanation.
+- - - -
+
+# Detailed Description
+
+Mango is a declarative JSON querying syntax that allows a user to retrieve
documents based on a given selector. It supports defining indexes for queries
which will improve query performance. In CouchDB 2.x Mango is a query layer
built on top of Map/Reduce indexes. Each Mango query follows a two step
process, first a subset of the selector is converted into a map query to be
used with a predefined index or falling back to _all_docs if no indexes are
available. Each document retrieved from the index is then matched against the
query selector.
+
+In a future release of CouchDB with FoundationDB the external behaviour of
Mango will remain the same but internally will have its own indexes and index
management. This will allow for Mango indexes to be updated in the same
transaction where a write request happens - index on write. Later we can also
look at adding Mango specific functionality.
+
+## Data Model
+
+### Index Definitions
+
+A Mango index is defined as:
+
+```json
+{
+ name: ‘view-name’ - optional will be auto-generated
+ index: {
+ fields: [‘fieldA’, ‘fieldB’] - fields to be indexed
+ },
+ partial_filter_selector {} - optional filter to process documents before
adding to the index
+}
+```
+
+The above index definition would be stored in FoundationDB as:
+
+`(?DATABASE, ?INDEX_DEFINITIONS, <fieldname1>, …<rest of fields>) =
(<index_name>, <partial_filter_selector>, build_status, sequence)`
+
+`build_status` will have two options, `active` which indicates the index is
ready to service queries or `building` if the index is still being built.
`sequence` is the sequence that the index is created at. Nested fields defined
in the index would be stored as packed tuples.
+
+### Indexes
+
+Each index defined in the Index Definition would have an index key space where
the database’s documents are stored and sorted via the keys defined in the
index’s definition. The data model for each defined index would be:
+
+`(?DATABASE, ?INDEXES, ?INDEX_NAME, <indexed_field>, …<other indexed fields>,
_id) = null`
+
+The `_id` is kept to avoid duplicate keys and to be used to retrieve the full
document for a Mango query.
+For now, the value will be null, later we can look at storing covering
indexes. aggregate values or materialised views.
+
+### Key sorting
+
+In CouchDB 2.x ICU collation is used to sort string key’s when added to the
index’s b-tree. The current way of using ICU string collation won’t work with
FoundationDB. To resolve this strings will be converted to an ICU sort string
before being stored in FDB. This is an extra performance overhead but will
only be done when one when writing a key into the index.
+
+CouchDB has a defined [index collation
specification](http://docs.couchdb.org/en/stable/ddocs/views/collation.html#collation-specification)
that the new Mango design must adhere to. Each key added to a Mango index will
be converted into a composite key or tuple with the first value in the tuple
representing the type that the key so that it would be sorted correctly. Below
is an example of the type keys to be used:
+
+\x00 NULL
+\x26 False
+\x27 True
+\x30 Numbers
+\x40 Text converted into a sort string
+\x50 Array
+\x60 Objects
+
+An example for a number key would be (\x30, 1). Just too note, Null and
Boolean values won’t need to be composite keys as the type key is the value.
+
+### Index Limits
+
+This design has certain defined limits for it to work correctly:
+
+* The index definition (name, fields and partial_filter_selector) cannot
exceed 100 KB FDB value limit
+* The sorted keys for an index cannot exceed the 10 KB key limit

Review comment:
"sorted keys" - is this the same as the keys emitted for indexing records?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

With regards,
Apache Git Services

[GitHub] [couchdb-documentation] sansato commented on a change in pull request #407: RFC for Mango on FDB

Reply via email to