janl commented on code in PR #5792: URL: https://github.com/apache/couchdb/pull/5792#discussion_r2635383920
########## src/docs/rfcs/018-declarative-vdu.md: ########## @@ -0,0 +1,2249 @@ +--- +name: Formal RFC +about: Submit a formal Request For Comments for consideration by the team. +title: '' +labels: rfc, discussion +assignees: '' + +--- + +[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ ) + +# Introduction + +## Abstract + +[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. ) +[NOTE]: # ( Describe what problem you are solving, and the general approach. ) + +## Requirements Language + +[NOTE]: # ( Do not alter the section below. Follow its instructions. ) + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +document are to be interpreted as described in +[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt). + +## Terminology + +[TIP]: # ( Provide a list of any unique terms or acronyms, and their definitions here.) + +--- + +# Detailed Description + +This document specifies a system of declarative document validation and +authorization for CouchDB. It lets users write rules for validating document +updates using expressions that can be evaluated inside the main CouchDB process, +instead of having to invoke JavaScript code and incurring the overhead of +round-tripping documents to the external JavaScript engine. + + +## Design documents + +Users encode validation rules by storing a design document with the following +fields: + +- `language`: must have the value `"query"` +- `validate_doc_update`: contains an _Extended Mango_ expression that encodes + the desired validation rules +- `defs`: (optional) a set of named Extended Mango expressions that may be + referenced from the `validate_doc_update` expression, as a way to encode + reusable functions + + +## Handling write requests + +When a document is updated, the `validate_doc_update` (VDU) fields of all the +design docs in the database are evaluated, and all of them must return a +successful response in order for the update to be accepted. For existing VDUs +written in JavaScript, those continue to be evaluated using the JavaScript +engine. For VDUs in docs with the field `"language": "query"`, the VDU is +evaluated using the functions in the `mango` application, particularly the +`mango_selector` module. This module will need additional functionality to +handle Extended Mango expressions as described below. + + +### Input to declarative VDUs + +JavaScript-based VDUs are functions that accept four arguments: the old and new +versions of the document, the user context, and the database security object. +The input to a declarative VDU is a virtual JSON document with the following +top-level properties: + +- `$newDoc`: the body of the new version of the document that the client is + attempting to write +- `$oldDoc`: the body of the previous leaf revision of the document that the new + revision would follow on from +- `$userCtx`: the user context containing the current user's `name` and an array + of `roles` the user possesses +- `$secObj`: the database security object, containing arrays of users and roles + with admin access, and users and roles with member access + +Example of a user context: + + { + "db": "movies", + "name": "Alice", + "roles": ["_admin"] + } + +Example of a security object: + + { + "admins": { + "names": ["Bob"], + "roles": [] + }, + "members": { + "names": ["Mike", "Alice"], + "roles": [] + } + } + +To evaluate a declarative VDU against this virtual JSON document, evaluate the +Extended Mango expression in the `validate_doc_update` field. This will produce +a list of _failures_ that describe the ways in which the input does not match +the selector. Evaluation is considered successful if this list is empty. + +If the expression produces a non-empty list, then no further expressions from +other design docs are evaluated, the write is rejected, and a representation of +the failures is returned to the client. If the expression produces an empty +list, then expressions from other design docs are evaluated. If none of the +`validate_doc_update` fields in any design doc produces a failure, the write is +accepted. + + +### Responses to write requests + +If the selector expression in the `validate_doc_update` field returns an empty +list of failures, then the write is accepted and proceeds as normal, leading to +a 201 or 202 response. + +If any of the selectors fails, then either its list of failures or a custom +error response is returned to the caller, with a 401 or 403 status code as +indicated by the selector itself. For example, imagine a design doc contains the +following: + + "validate_doc_update": { + "$all": [ + { + "$userCtx.roles": { "$all": ["_admin"] }, + "$error": "unauthorized" + }, + { + "$newDoc.type": { "$in": ["movie", "director"] }, + "$error": "forbidden" + } + ] + } + +To evaluate this expression, the following steps are performed: + +- Check whether the user context's `roles` array contains the value `"_admin"`. + If it does not, return a 401 response to the client. +- Check whether the new doc's `type` field has the value `"movie"` or + `"director"`. If it does not, return a 403 response to the client. +- Otherwise, accept the write and return a 201 or 202. + +The body of the response contains two fields: + +- `error`: this is either `"unauthorized"` or `"forbidden"` +- `reason`: this contains either a custom error message, or the list of failures + generated by the first non-matching selector. + +If no custom `$reason` is set, then the `reason` field contains a list of +failures like so: + + { + "error": "forbidden", + "reason": { + "failures": [ + { + "path": ["$newDoc", "type"], + "type": "in", + "params": ["movie", "director"] + } + ] + } + } + +This is consistent with the current working of JavaScript VDUs. Such functions +can call `throw({ forbidden: obj })` where `obj` is an object, and it will be +passed back to the client as JSON, i.e. it is already possible for user-defined +VDUs to generate responses like that above. + +A custom error response can be generated by adding extra information to the +selector expressions; see "`$error` and `$reason`" below. + +The intent of this interface is that each individual selector expression +produces a complete list of _all_ the ways in which the input did not match the +selector expression, so that the client can show all the validation errors to +the user in one go. + + +## Extended Mango + +Declarative VDU functions are expressed in an extended variant of Mango. It +includes all the selector operators previously designed for use in queries and +filters, and a few additions that are particularly suited to the task of +defining VDUs. Some of these features _only_ make sense for VDUs and should only +be allowed in this context. + + +### Return values + +Currently, the evaluation of a Mango selector by the `mango_selector:match` +function returns a boolean value to indicate whether or not the input value +matched the selector. When evaluating Extended Mango for VDUs, `match` should +instead return a list of _failures_, which are records that describe the ways in +which the input did not match. A failure has the following properties: + +- `path`: an array of keys that give the path to the non-matching value, from + the root of the input document, such that a client could locate the + non-matching value by evaluating `path.reduce((val, key) => val[key], doc)` +- `type`: the name of the matching operator that failed, e.g. `"eq"`, `"in"`, + `"type"`, etc. +- `params`: an array of any other values that the operator used to determine the + match result. For most operators this would just be the expected value. + +An example of a failure object: + + { + "path": ["$secObj", "members", "names", 1], + "type": "eq", + "params": ["Alice"] + } + +A client should be able to construct useful user-facing error messages from the +information in these failure objects such that the user could correct any +mistakes in their input. + +An Extended Mango expression is considered to match a given input if its +evaluation returns an empty list. + +To produce the `path` field, the `match` function will need to track the path it +used to reach the current value. This can be done by adding a "match context" to +its list of parameters that tracks this information along with other things. +This will also be needed to handle negation and relative `$data` references. + + +### General evaluation + +Since `match` currently just returns `true` or `false`, the `mango_selector` +module can implement certain operations using "short-circuit" semantics. For +example, `$and` can be implemented by checking each of its sub-selectors, and +returning `false` as soon as a single one of them returns `false`. Likewise +`$or` can return `true` as soon as single sub-selector returns `true`. + +For VDUs, we want to return a complete list of match failures to the client, so +some compound operators must evaluate all their inputs completely, without +short-circuiting. Specifically: + +- `$and` should evaluate all its sub-selectors and return the combined list of + failures produced by any of them. +- `$or` may return an empty list as soon as any of its sub-selectors returns an + empty list. If all its sub-selectors return non-empty failure lists, it should + return a combined list of all failures. +- `$nor` should be translated as described under "Negation" below, before the + expression is evaluated. +- `$allMatch` should evaluate all list items against the sub-selector and return + a list of all failures produced by any of the items. It must only return an + empty list of all items produce an empty list. +- `$elemMatch` may return an empty list as soon as an item is found that + produces an empty list for the sub-selector. If no item does so, the combined + list of failures from all items should be returned. + +For normal object matchers, `{ "a": X, "b": Y }` should have the same behaviour +as `{ "$and": [{ "a": X }, { "b": Y }] }`. That is, all the fields in the object +should be checked, rather than returning `false` as soon as a single field does +not match, and all failures from all fields should be returned to the caller. + + +### `$if`/`$then`/`$else` + +To produce better error messages for dependent validation rules, a new set of +conditional operators is added. The general form is: + + { "$if": A, "$then": B, "$else": C } + +`A`, `B` and `C` are sub-selector expressions. Both the `$then` and `$else` +fields are optional. `$then` defaults to `NONE($then)`, a selector which always +fails with the message that `$then` is required. `$else` defaults to `ANY`, a +selector which always succeeds. (This definition may seem odd but is necessary +for these expressions to be automatically negated; see "Negation" below.) + +To evaluate this operator for input `Doc`, perform these steps: + +- If `match(A, Doc)` returns an empty list, return the result of `match(B, Doc)` +- Otherwise, return the result of `match(C, Doc)` + +If these operators appear in a selector alongside other operators, the effect is +the same as if any other of combination of operators was used. That is, the +`$if`/`$then`/`$else` operators must succeed, and all other operators in the +selector must succeed, in order for the match to be considered successful. For +example: + + { + "$gt": 0, + "$if": { "$gt": 10 }, + "$then": { "$mod": [5, 0] } + } + +This matches inputs that are greater than 0, and if they are greater than 10 +they must also be a multiple of 5. + +These operators may be evaluated in normal query/filter Mango contexts by +translating `{ "$if": A, "$then": B, "$else": C }` to: + + { + "$or": [ + { "$and": [A, B] }, + { "$and": [{ "$not": A }, C] } + ] + } + +This translation should be applied before negations are normalised. + + +### `$data` + +Some rules, especially those concerned with authorization, will need to compare +different fields within the input, particularly comparing the user context to +the security object and the new document. To enable this, Extended Mango +provides a way to reference data from elsewhere in the input. + +The `$data` operator produces the value at the location indicated by its +argument. For example, to require that one of the user's roles must be in the +database's admins list, one would write: + + { + "$userCtx.roles": { + "$elemMatch": { + "$in": { "$data": "$secObj.admins.roles" } + } + } + } + +The `$data` operator finds the value located at `$secObj.admins.roles` within +the input document, and provides it as the operand to the `$in` operator. + +The path given to `$data` may begin with one or more dots to indicate a relative +path. If no dots appear at the start of the path, then the path is absolute, and +is resolved from the root of the input document. If one or more dots are +present, the path is resolved by walking that many levels "up" from the current +value before proceeding. + +For example, to indicate that a field must contain an array of objects, where +each object has a `max` field that is greater than its `min` field, one can +write: + + { + "$allMatch": { + "max": { "$gt": { "$data": ".min" } } + } + } + +The dot at the start of the path `.min` indicates that the path should be +resolved by starting at the object containing the `max` field currently being +checked, rather than starting at the root of the input. A path starting with two +dots would indicate starting with the current object's parent, three dots +indicates starting at the grandparent, and so on. + +The rest of the path should contain one or more field identifiers separated by +single dots. Field identifiers can be non-empty strings that identify a property +in an object, or non-negative integers that identify elements in an array. The +path should be evaluated by performing an operation equivalent to the JavaScript +expression: + + path.split('.').reduce((val, field) => val[field], input) + +`$data` references must only be allowed in positions where a literal value would +otherwise be expected. If they were allowed in places that expect selector +expressions, this would allow an input document to inject its own validation +logic. Specifically, `$data` is only allowed in these positions: + +- As the operand to `$eq`, `$ne`, `$lt`, `$lte`, `$gt`, `$gte` +- As the entire array operand, or as an individual array element, for `$in`, + `$nin`, `$all` or `$mod` + +When used with a normal object field, it should be interpreted as an exact +equality constraint. i.e. `{ "a": { "$data": "b" } }` means `{ "a": { "$eq": { +"$data": "b" } } }`, otherwise this would allow operator injection. + + +### `$cat` + +The `$cat` operator takes an array whose elememts are either literal strings, or +`$data` references, and produces the result of concatenating them after +resolving the `$data` references. For example: + + { + "_id": { "$cat": ["org.couchdb.user:", { "$data": "name" }] } + } + +This means that the `_id` field must equal the `name` field with +`org.couchdb.user:` as a prefix. + +`$cat` has the same restrictions on its use as the `$data` field, i.e. it may +only appear where a literal value is expected, and must not appear anywhere a +sub-selector expression is expected. When it appears as the matcher for an +object property, it should be understood as a strict equality constraint, i.e. +the above expression means the same as: + + { + "_id": { + "$eq": { + "$cat": ["org.couchdb.user:", { "$data": "name" }] + } + } + } + + +### `$ref` + +Declarative VDUs can make use of reusable expressions when defining their logic. +Definitions of such expressions are placed in the `defs` field of the design +document, and then referenced within `validate_doc_update` expressions via the +`$ref` operator. + +For example, to define an expression for matching even numbers, this structure +is placed in the root of the design document: + + "defs": { + "even-number": { + "$type": "number", + "$mod": [2, 0] + } + } + +Then, a field can be validated as an even number using this syntax: + + "some_field": { "$ref": "defs.even-number" } + +The `$ref` operator takes a path with the same syntax as `$data`, but it is +resolved relative to the root of the design document. + +The effect of using `$ref` and other operators in the same selector should be +the same as if the `$ref` expression where combined with the rest of the +selector via `$and`. For example, `{ "$ref": "devs.even-number", "$gt": 20 }` +should have the same effect as: + + { + "$and": [ + { "$ref": "defs.even-number" }, + { "$gt": 20 } + ] + } + +If the implementation wishes to inline `$ref` expressions before evaluation, it +should first translate any uses of `$ref` into the above form to avoid key +collisions when merging objects. + +This operator may be used to define recursive structures. For example, if we had +a way of representing a tree of HTML nodes, where each node has a tag name, a +set of attributes, and a list of child nodes, we could validate it using this +definition: + + "defs": { + "html-tree": { + "tagName": { "$type": "string" }, + "attributes": { "$type": "object" }, + "children": { + "$type": "array", + "$allMatch": { "$ref": "defs.html-tree" } + } + } + } + +**To be decided**: we may wish to prohibit such recursion, as doing so would let +us inline any `$ref` references before evaluating an expression. This would +require detecting any sets of mutually recursive definitions in a design +document, and the complexity of doing this may not be worth it. + + +### Negation + +To produce good failure messages, negation needs to be pushed to the "leaves" of +the expression tree. For example, when evaluating `{ "$not": { "$eq": 42 } }` +against the input `42`, the `$eq` produces an empty list, and `$not` then has to +"invent" a failure record to return to the caller. It can't meaningfully do this +without understanding the meaning of its sub-selector. If the selector is +instead translated to `{ "$ne": 42 }` then it can directly produce a meaningful +failure message. + +Many Mango expressions can be translated in a way that pushes a `$not` operator +from the root of the expression towards the leaves. Specifically: + +- `{ "$not": { "field": X } }` becomes `{ "field": { "$ne": X } }` +- `{ "$not": { "$eq": X } }` becomes `{ "$ne": X }` and vice versa +- `{ "$not": { "$in": X } }` becomes `{ "$nin": X }` and vice versa +- `{ "$not": { "$lt": X } }` becomes `{ "$gte": X }` and vice versa +- `{ "$not": { "$gt": X } }` becomes `{ "$lte": X }` and vice versa +- `{ "$not": { "$exists": X } }` becomes `{ "$exists": !X }` +- `{ "$not": { "$not": X } }` becomes `X` + +- `{ "$not": { "$all": [A, B] } }` becomes `{ "$or": [{ "$not": A }, { "$not": B + }] }` + +- `{ "$not": { "$or": [A, B] } }` becomes `{ "$and": [{ "$not": A }, { "$not": B + }] }` + +- `{ "$nor": [A, B] }` becomes `{ "$and": [{ "$not": A }, { "$not": B }] }` + +- `{ "$not": { "$nor": [A, B] } }` becomes `{ "$or": [A, B] }` + +- `{ "$not": { "$allMatch": A } }` becomes `{ "$elemMatch": { "$not": A } }` + +- `{ "$not": { "$elemMatch": A } }` becomes `{ "$allMatch": { "$not": A } }` + +- `{ "$not": { "$if": A, "$then": B, "$else": C } }` becomes `{ "$if": A, + "$then": { "$not": B }, "$else": { "$not": C } }` + +- `{ "k": { "$not": NONE(k) } }` becomes `{ "k": ANY }` + +- `{ "k": { "$not": ANY } }` becomes `{ "k": NONE(k) }` + +Most of these translations are already implemented by the `norm_negations()` +function because this helps with identifying indexes that may be used for +queries. This function may prove useful for evaluating VDUs, but there are some +operators that do not have a built-in negation: + +- `$type` +- `$size` +- `$mod` +- `$regex` +- `$beginsWith` +- `$all` + +To make these give good failure messages, the `match` operation would need to +dynamically keep track of whether the expression it's currently evaluating has +been negated. This could be tracked via the "match context" which we'd need to +add to support the `$data` operator and the `path` failure property. + +Note that in principle, `{ "$not": { "$all": [A, B, C] } }` could be translated +to: + + { + "$or": [ + { "$allMatch": { "$ne": X } }, + { "$allMatch": { "$ne": Y } }, + { "$allMatch": { "$ne": Z } } + ] + } + +However, this is only possible if the required values are specified literally. +If a `$data` operator is used to dynamically find the list of required values, +the translator cannot "see through" that and `$data` may resolve to a different +value each time it's encountered. So the negation still needs to be tracked +during the `match` operation, rather than being completely handled by +translation steps that happen before `match` starts. + +Use of `$ref` also precludes full normalisation of expressions before evaluating +them, since it allows the creation of recursive matching procedures. These +cannot be translated by just inlining any referenced expressions, since any loop +of mutually recursive expressions would prevent this from completing. + + +### `$error` and `$reason` + +A failing VDU can cause a document write request to return either a `401 +Unauthorized` or `403 Forbidden` response. In JavaScript VDUs, which one to +return is indicated by throwing either `{ unauthorized: reason }` or `{ +forbidden: reason }`. + +Declarative VDUs can indicate which type of response each check produces using Review Comment: @ricellis suggested similar and I think that would be a fine simplification. The design brief for the RFC was allow expression of all built-in and documented VDU functions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
