Hi all, in the midst of handling the security stuff I had a moment of clarity how the often requested per document permissions could be implemented. We had discussed a potential approach extensively in the February Boston Developer Summit (notes here: https://lists.apache.org/thread.html/09a5686bca8049010b82796cc0fe99ef27aed4983a3f02fd6956259f@%3Cdev.couchdb.apache.org%3E)
What was so alluring about this proposal was that it solves per doc access control and per-user-db in one go. E.g. it would be able to share a single database with multiple distrusting users, allow them to have their own set of views, and even independently use their share of a single database as a replication endpoint without interfering with any of the other users on that database. I gave it a shot. Essentially, we need to build new indexes: by-access-id and by-access-seq to make all that work. I’m just focussing on the core of this, trying to re-use the existing couch_mrview/couch_index machinery as much as possible. Strictly, for replication only by-access-seq would be required, but by-update-id is a little easier to do, so I’ve done that first, and I believe the results are encouraging. I’ve put a diff against master into a gist for your perusal: https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc The core bits are: https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc#file-by-access-id-diff-L189-L215 and https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc#file-by-access-id-diff-L189-L215 Here’s an example Doc: { "_id":"1fb94bf8c3d5a73745f3cc4f5f000a8d”, "_rev":"4-bcbc975e61bdb80f3de1b87f6cad6a76”, "_access":["b”] } It shows up for user b: curl b:b@127.0.0.1:15984/a/_all_docs {"total_rows”:2,"offset":0,"rows":[ {"id":"1fb94bf8c3d5a73745f3cc4f5f000a8d","key":["b","1fb94bf8c3d5a73745f3cc4f5f000a8d"],"value":"4-bcbc975e61bdb80f3de1b87f6cad6a76”} ]} But not for user c: > curl c:c@127.0.0.1:15984/a/_all_docs {"total_rows”:2,"offset":2,"rows":[ ]} * * * I’d like to get some general design feedback on this approach to find out if it is worth pursuing further. See “Next Steps” way below for my thinking on how to get by-access-seq going. The rest of this email are my notes from reading the source and trying to explain my thinking as well as guide folks that might not be very familiar with the CouchDB sources to follow along what is happening. I’d especially like to get some feedback about this from some of the folks here who don’t spend their days in the main Erlang codebase :) Let me know what you think. Thanks! Jan * * * CouchDB Access Notes Background: https://lists.apache.org/thread.html/09a5686bca8049010b82796cc0fe99ef27aed4983a3f02fd6956259f@%3Cdev.couchdb.apache.org%3E # Overview To solve the problems with the db-per-user pattern, we want to introduce document level access control. The result should be a single CouchDB database that can be used by multiple mutually untrusting users while retaining CouchDB’s full semantics. // TODO: link to appendix: problems with db-per-user We decided on an approach to define access control in documents with a new property `_access` which is specified as an array of strings and arrays. Strings represent usernames and roles, sub-arrays are used as logical AND, elements in the top level array are used as logical OR. For example. an _access field with the value [[‘management’, ‘senior’], ‘ceo-jane’] would allow everyone with the roles ‘management’ AND ‘senior’, OR the user ‘ceo-jane’ access to that doc. but not e.g. users with roles ‘development’, ‘senior’, nor user ‘vp-jenn’. To achieve main CouchDB semantics, we need to introduce new behaviour for the _all_docs and _changes endpoints. The plan is to special case-this based on the authenticated user context (userCtx, e.g, username and associated roles, after authentication). The existing by-id and by-seq indexes are not equipped to efficiently return results per user, so we are introducing two new indexes (either can be optionally configured, depending on the use-case and performance and storage needs): by-access-id and by-access-seq. In contrast with by-id and by-seq, these indexes are not stored in the main database file, but in a separate file, ideally managed by the existing couch_index infrastructure. # Development considerations This first spike is only concerned with getting per-access-id to work with minimal effort. To get started, let’s look at how _all_docs works today using the by-id index. ## The Anatomy of a Clustered _all_docs Request CouchDB’s clustering layer consists of three main modules: chttpd, fabric and refi. chttpd’s job is to handle everything HTTP and route requests to the right place in the rest of the code. It’s a HTTP router, mapping URLs, request methods and options to handler functions that do with the work the requests are specified to fulfil. fabric’s job is to distribute a single request from the outside to multiple nodes of the cluster. Some requests require only talking to the local node, but that’s less important for the moment. fabric includes fabric_rpc, a module that turns a request to the cluster into one or more requests to other nodes in the cluster. rexi’s job is know about the cluster state: which nodes are in the cluster, which of them are active/reachable/failed, which shards live on which nodes. fabric uses rexi to know which nodes to contact for which shards. After a bit of indirection, we find ourselves at the first _all_docs-specific function in chttpd_db.erl: all_docs_view/4: ``` all_docs_view(Req, Db, Keys, OP) -> Args0 = couch_mrview_http:parse_params(Req, Keys), Args1 = Args0#mrargs{view_type=map}, Args2 = couch_mrview_util:validate_args(Args1), Args3 = set_namespace(OP, Args2), Options = [{user_ctx, Req#httpd.user_ctx}], Max = chttpd:chunked_response_buffer_size(), VAcc = #vacc{db=Db, req=Req, threshold=Max}, {ok, Resp} = fabric:all_docs(Db, Options, fun couch_mrview_http:view_cb/2, VAcc, Args3), {ok, Resp#vacc.resp}. ``` The first five lines handle query options and request parameters or arguments. The next three lines are the bulk of the job: start a response, call fabric:all_docs/5 with a callback to handle rows. The last line returns the accumulator that is returned by fabric:all_docs/5. fabric:all_docs/5 is a thin wrapper around fabric_view_all_docs:go/5. Before we jump down, we notice that there is also a fabric_view_changes.erl, which we should remember for the next iteration when we implement by-access-seq. go/5 comes in two variants and we’ll ignore the second here for the moment, because it is a performance optimisation. The main work for go/5 is in the top third of the function. First it gets all shards for the current database from mem3, then it starts a fabric_rpc worker for each shard, and then waits for the results to come back by calling go/6 with all workers. The bottom two thirds are timeout and error handling. go/6 registers the handle_message/3 function as the callback for rexi_utils’ recv/6 (read “receive”) function. handle_message/3 comes in a number of variants to handle rexi errors, receiving metadata, receiving result rows and a notification “complete” about all rows having been sent. Our next level down is looking into fabric_rpc and how it handles all_docs requests. fabric_rpc/3 is again a short wrapper, this time around couch_mrview:query_all_docs/4 which is the node-local function that handles querying. couch_mrview includes a bunch of functions map/reduce views. It seems like a natural place doing our distinction between a normal by-id request and a by-access-id request. I’m skipping a step here, but with a little printf-debugging, I’ve found out that the `Db` variable we get passed in, includes the authenticated userCtx including username and any roles. We can use couch_db:is_admin/1 to get a boolean back for the distinction we are going to have to make: ``` query_all_docs(Db, Args0, Callback, Acc) -> case couch_db:is_admin(Db) of true -> query_all_docs_admin(Db, Args0, Callback, Acc); false -> query_all_docs_access(Db, Args0, Callback, Acc) end. ``` query_all_docs_admin/4 is the existing query_all_docs/4 function and we’re introducing query_all_docs_access/4, that we now have to fill out with querying our view. Before we can do that, we need to understand how view work. ## The Anatomy of a View Request Querying a view has three stages: 1. define the view 2. build the view index 3. query the view index A view definition is always in a design document. It can be one or JavaScript map/reduce functions, Erlang map/reduce functions, or a mango index definition. // TODO: link all these view definition options. Building the view index is an implicit step in CouchDB. View indexes are refreshed at query time, but only if there were any changes in the database since the last query. If no refresh is needed, the view result is returned from the index directly. // TODO: explain query_server Querying indexes follows a similar path through chttpd, fabric, rexi, fabric_rpc down to the per-node handlers in couch_mrview. Just a few lines below couch_mrview:query_all_docs/4 we find query_view/5 which decides between map and reduce requests. We care about map-only for now. query_view/5 is preceded by query_view/6 which includes a call to couch_mrview_util:get_view/4 which looks like it is where we want to look next, as the map_fold/5 called by query_view/5 is about looping over rows. We hope we can re-use all that logic, and maybe get_view/4 lets us find out how we can have it return our new view. get_view/4 calls get_view_index_state/4 which in turn calls get_view_index_pid/4 that finally calls into couch_index_server:get_index/4 which looks like it returns the index for our request. Let’s have a look. get_index/4 will dive into get_index/2 eventually and that looks indeed like where we need to look. In there, we look up view index in an ETS table (an in-memory database), and if it can’t find it there, start a new one. Either way, a view index is returned. The lookup is by DbName and Sig(nature), an md5 hash over the `views` property in a design doc, that also corresponds to the *.view filename of the view index. ## Faking the index So how would we get this to return the index we want to query? We need to create an index definition that matches the design doc `views` hash. Hm. It is relatively easy to produce a map function that behaves like we want: function (doc) { var _access = doc.access if (!_access) { return } if (!isArray(_access) || _access,length === 0) { return } _access.forEach( function (user_or_role) { emit([user_or_role, doc._id], doc._rev) }) } At query time, we’d have to match the requesting username and roles against the first element in the key-array and return the results, while replacing the key-array with the second element (the doc _id). All this doesn’t sound too hard. Good. One snag though: if we think ahead and try to see how we could implement by-access-changes we get stuck: a view does not include rows for deleted documents while _changes does. In addition, the update sequence for a document is not available in a map function. So a regular view can not be used here. The filtering of deleted docs from a view index happens in couch_mrview:map_fold/3. So if we could augment that for our internal view requests, that could get us a long way towards reusing the rest of the couch_mrview/couch_index machinery. Note to self: make sure view compaction doesn’t remove deleted docs. But a cursory glance at couch_mrview_compactor:compact_view_btree/5 suggests no such thing, but we need to validate this, and if it doesn’t hold, change view_compation to keep deleted entries. * * * We’ll start giving this a try by forking things off in couch_mrview:query_all_docs/4 and pretending to call a view with a mocked ddoc: { “_id”: “_design/_access”, “language”: “_access” “views”: {} // if needed } // TODO see which other fields it needs We’ll try this road to see if we get to the point where we get a “view index not found” error, because we didn’t actually have a view index yet. We’ll then try and see if we can produce one. We could try the other way around too, building the index first and then trying to query, but the approach doesn’t make much of a difference. First demo working: https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc Next Steps: - make sure the startkey/endkey/descending argument handling is all correct and complete - add key un-munging, so the user/role prefix gets filtered out on reads - handle roles: - instead of querying the _access view once, we need to issue a multi-query, probably via #mrags.multi_get, read up on how that is used - then we could start thinking about by-access-seq: - we need access to the update-seq in couch_access_native_proc:map_doc, might require view protocol upgrade, or we have a post-process function that tags on the update-seq, we’ll see. - the admin/access split we’re doing in query_all_docs should probably happen in couch_db:changes_since/5 # More specification details Documents with in databases with _access enabled are private/admin-only by default, and can be made public with the special role _public TODO: shared id space or auto-prefix ids