> On 16. Nov 2017, at 03:09, Adam Kocoloski <kocol...@apache.org> wrote:
> 
> Oh also, meant to say - nice work on Spiegel :)

+1, nice work Geoff,

Adam summed it all up nicely.

Best
Jan
--

> 
>> On Nov 15, 2017, at 9:00 PM, Adam Kocoloski <kocol...@apache.org> wrote:
>> 
>> Hiya Geoff,
>> 
>> You’re right, there is a non-trivial overhead in calculating view responses 
>> that need to pull from every shard. On the other hand, maintaining a unique 
>> database file for every user is quite problematic at scale. It works OK up 
>> to several thousand users, but eventually you start running into a lot of 
>> operational headaches. Running a million databases in a cluster is possible 
>> but painful.
>> 
>> I have some detailed thoughts about how we can improve the efficiency of 
>> queries scoped to a single user in a large sharded database, but that’s a 
>> topic for another thread :)
>> 
>> Jan - wow, look at that! I’ll take a close look over the next couple of 
>> hours but a quick scan is encouraging.
>> 
>> Adam
>> 
>>> On Nov 15, 2017, at 5:30 PM, Geoffrey Cox <redge...@gmail.com> wrote:
>>> 
>>> Hey Jan,
>>> 
>>> I've been trying to solve a similar problem from a different angle using
>>> efficient and scalable replication via spiegel
>>> <https://github.com/redgeoff/spiegel>. I'm super excited that you are
>>> drafting this level of access, but my major concern is on performance. From
>>> what I gather, if you combine all the db-per-user docs into a single DB
>>> then you'll have a massive DB. I know CouchDB is good at sharding, but
>>> isn't there a significant performance implication when a user's docs are
>>> being pulled from multiple shards on different servers? What about the
>>> added overhead of calculating cross-server views, etc...
>>> 
>>> When I think about how big companies, e.g. Facebook, solve these types of
>>> problems, I imagine that they create a denormalized DB per user. Among
>>> other things, this design allows the set of data that a user needs to be
>>> relatively small and live on less servers per user. Doesn't this lead to
>>> better performance?
>>> 
>>> Even if this new level of access doesn't solve the db-per-user case
>>> entirely, it will still be a useful addition as it would allow for more
>>> data to be shared and less of a create a DB-per-role setup. So, I'm all for
>>> it!
>>> 
>>> I'll take a closer look at these notes when I have some time, but I just
>>> wanted to get you my high-level thoughts now. I'm sorry if any of this has
>>> been based on some wild assumptions :)
>>> 
>>> Exciting stuff!
>>> 
>>> Geoff
>>> 
>>>> On Wed, Nov 15, 2017 at 1:35 PM Jan Lehnardt <j...@apache.org> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> in the midst of handling the security stuff I had a moment of clarity how
>>>> the often requested per document permissions could be implemented. We had
>>>> discussed a potential approach extensively in the February Boston Developer
>>>> Summit (notes here:
>>>> https://lists.apache.org/thread.html/09a5686bca8049010b82796cc0fe99ef27aed4983a3f02fd6956259f@%3Cdev.couchdb.apache.org%3E
>>>> )
>>>> 
>>>> What was so alluring about this proposal was that it solves per doc access
>>>> control and per-user-db in one go. E.g. it would be able to share a single
>>>> database with multiple distrusting users, allow them to have their own set
>>>> of views, and even independently use their share of a single database as a
>>>> replication endpoint without interfering with any of the other users on
>>>> that database.
>>>> 
>>>> I gave it a shot. Essentially, we need to build new indexes: by-access-id
>>>> and by-access-seq to make all that work. I’m just focussing on the core of
>>>> this, trying to re-use the existing couch_mrview/couch_index machinery as
>>>> much as possible. Strictly, for replication only by-access-seq would be
>>>> required, but by-update-id is a little easier to do, so I’ve done that
>>>> first, and I believe the results are encouraging.
>>>> 
>>>> I’ve put a diff against master into a gist for your perusal:
>>>> 
>>>> https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc
>>>> 
>>>> 
>>>> The core bits are:
>>>> 
>>>> 
>>>> https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc#file-by-access-id-diff-L189-L215
>>>> 
>>>> and
>>>> 
>>>> 
>>>> https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc#file-by-access-id-diff-L189-L215
>>>> 
>>>> Here’s an example Doc:
>>>> 
>>>> {
>>>> "_id":"1fb94bf8c3d5a73745f3cc4f5f000a8d”,
>>>> "_rev":"4-bcbc975e61bdb80f3de1b87f6cad6a76”,
>>>> "_access":["b”]
>>>> }
>>>> 
>>>> It shows up for user b:
>>>> 
>>>> 
>>>> curl b:b@127.0.0.1:15984/a/_all_docs
>>>> 
>>>> {"total_rows”:2,"offset":0,"rows":[
>>>> 
>>>> {"id":"1fb94bf8c3d5a73745f3cc4f5f000a8d","key":["b","1fb94bf8c3d5a73745f3cc4f5f000a8d"],"value":"4-bcbc975e61bdb80f3de1b87f6cad6a76”}
>>>> ]}
>>>> 
>>>> But not for user c:
>>>> 
>>>> 
>>>>> curl c:c@127.0.0.1:15984/a/_all_docs
>>>> 
>>>> {"total_rows”:2,"offset":2,"rows":[
>>>> 
>>>> ]}
>>>> 
>>>> 
>>>> * * *
>>>> 
>>>> 
>>>> I’d like to get some general design feedback on this approach to find out
>>>> if it is worth pursuing further. See “Next Steps” way below for my thinking
>>>> on how to get by-access-seq going.
>>>> 
>>>> The rest of this email are my notes from reading the source and trying to
>>>> explain my thinking as well as guide folks that might not be very familiar
>>>> with the CouchDB sources to follow along what is happening.
>>>> 
>>>> I’d especially like to get some feedback about this from some of the folks
>>>> here who don’t spend their days in the main Erlang codebase :)
>>>> 
>>>> Let me know what you think.
>>>> 
>>>> Thanks!
>>>> Jan
>>>> 
>>>> * * *
>>>> 
>>>> CouchDB Access Notes
>>>> 
>>>> Background:
>>>> https://lists.apache.org/thread.html/09a5686bca8049010b82796cc0fe99ef27aed4983a3f02fd6956259f@%3Cdev.couchdb.apache.org%3E
>>>> 
>>>> # Overview
>>>> 
>>>> To solve the problems with the db-per-user pattern, we want to introduce
>>>> document level access control. The result should be a single CouchDB
>>>> database that can be used by multiple mutually untrusting users while
>>>> retaining CouchDB’s full semantics.
>>>> 
>>>> // TODO: link to appendix: problems with db-per-user
>>>> 
>>>> We decided on an approach to define access control in documents with a new
>>>> property `_access` which is specified as an array of strings and arrays.
>>>> Strings represent usernames and roles, sub-arrays are used as logical AND,
>>>> elements in the top level array are used as logical OR. For example. an
>>>> _access field with the value [[‘management’, ‘senior’], ‘ceo-jane’] would
>>>> allow everyone with the roles ‘management’ AND ‘senior’, OR the user
>>>> ‘ceo-jane’ access to that doc. but not e.g. users with roles ‘development’,
>>>> ‘senior’, nor user ‘vp-jenn’.
>>>> 
>>>> To achieve main CouchDB semantics, we need to introduce new behaviour for
>>>> the _all_docs and _changes endpoints. The plan is to special case-this
>>>> based on the authenticated user context (userCtx, e.g, username and
>>>> associated roles, after authentication).
>>>> 
>>>> The existing by-id and by-seq indexes are not equipped to efficiently
>>>> return results per user, so we are introducing two new indexes (either can
>>>> be optionally configured, depending on the use-case and performance and
>>>> storage needs): by-access-id and by-access-seq. In contrast with by-id and
>>>> by-seq, these indexes are not stored in the main database file, but in a
>>>> separate file, ideally managed by the existing couch_index infrastructure.
>>>> 
>>>> 
>>>> # Development considerations
>>>> 
>>>> This first spike is only concerned with getting per-access-id to work with
>>>> minimal effort.
>>>> 
>>>> To get started, let’s look at how _all_docs works today using the by-id
>>>> index.
>>>> 
>>>> ## The Anatomy of a Clustered _all_docs Request
>>>> 
>>>> CouchDB’s clustering layer consists of three main modules: chttpd, fabric
>>>> and refi. chttpd’s job is to handle everything HTTP and route requests to
>>>> the right place in the rest of the code. It’s a HTTP router, mapping URLs,
>>>> request methods and options to handler functions that do with the work the
>>>> requests are specified to fulfil.
>>>> 
>>>> fabric’s job is to distribute a single request from the outside to
>>>> multiple nodes of the cluster. Some requests require only talking to the
>>>> local node, but that’s less important for the moment. fabric includes
>>>> fabric_rpc, a module that turns a request to the cluster into one or more
>>>> requests to other nodes in the cluster.
>>>> 
>>>> rexi’s job is know about the cluster state: which nodes are in the
>>>> cluster, which of them are active/reachable/failed, which shards live on
>>>> which nodes. fabric uses rexi to know which nodes to contact for which
>>>> shards.
>>>> 
>>>> After a bit of indirection, we find ourselves at the first
>>>> _all_docs-specific function in chttpd_db.erl: all_docs_view/4:
>>>> 
>>>> ```
>>>> all_docs_view(Req, Db, Keys, OP) ->
>>>>  Args0 = couch_mrview_http:parse_params(Req, Keys),
>>>>  Args1 = Args0#mrargs{view_type=map},
>>>>  Args2 = couch_mrview_util:validate_args(Args1),
>>>>  Args3 = set_namespace(OP, Args2),
>>>>  Options = [{user_ctx, Req#httpd.user_ctx}],
>>>>  Max = chttpd:chunked_response_buffer_size(),
>>>>  VAcc = #vacc{db=Db, req=Req, threshold=Max},
>>>>  {ok, Resp} = fabric:all_docs(Db, Options, fun
>>>> couch_mrview_http:view_cb/2, VAcc, Args3),
>>>>  {ok, Resp#vacc.resp}.
>>>> ```
>>>> 
>>>> The first five lines handle query options and request parameters or
>>>> arguments. The next three lines are the bulk of the job: start a response,
>>>> call fabric:all_docs/5 with a callback to handle rows. The last line
>>>> returns the accumulator that is returned by fabric:all_docs/5.
>>>> 
>>>> fabric:all_docs/5 is a thin wrapper around fabric_view_all_docs:go/5.
>>>> Before we jump down, we notice that there is also a
>>>> fabric_view_changes.erl, which we should remember for the next iteration
>>>> when we implement by-access-seq.
>>>> 
>>>> go/5 comes in two variants and we’ll ignore the second here for the
>>>> moment, because it is a performance optimisation. The main work for go/5 is
>>>> in the top third of the function. First it gets all shards for the current
>>>> database from mem3, then it starts a fabric_rpc worker for each shard, and
>>>> then waits for the results to come back by calling go/6 with all workers.
>>>> The bottom two thirds are timeout and error handling.
>>>> 
>>>> go/6 registers the handle_message/3 function as the callback for
>>>> rexi_utils’ recv/6 (read “receive”) function.
>>>> 
>>>> handle_message/3 comes in a number of variants to handle rexi errors,
>>>> receiving metadata, receiving result rows and a notification “complete”
>>>> about all rows having been sent.
>>>> 
>>>> Our next level down is looking into fabric_rpc and how it handles all_docs
>>>> requests. fabric_rpc/3 is again a short wrapper, this time around
>>>> couch_mrview:query_all_docs/4 which is the node-local function that handles
>>>> querying.
>>>> 
>>>> couch_mrview includes a bunch of functions map/reduce views. It seems like
>>>> a natural place doing our distinction between a normal by-id request and a
>>>> by-access-id request.
>>>> 
>>>> I’m skipping a step here, but with a little printf-debugging, I’ve found
>>>> out that the `Db` variable we get passed in, includes the authenticated
>>>> userCtx including username and any roles.  We can use couch_db:is_admin/1
>>>> to get a boolean back for the distinction we are going to have to make:
>>>> 
>>>> ```
>>>> query_all_docs(Db, Args0, Callback, Acc) ->
>>>>  case couch_db:is_admin(Db) of
>>>>      true -> query_all_docs_admin(Db, Args0, Callback, Acc);
>>>>      false -> query_all_docs_access(Db, Args0, Callback, Acc)
>>>>  end.
>>>> ```
>>>> 
>>>> query_all_docs_admin/4 is the existing query_all_docs/4 function and we’re
>>>> introducing query_all_docs_access/4, that we now have to fill out with
>>>> querying our view.
>>>> 
>>>> Before we can do that, we need to understand how view work.
>>>> 
>>>> ## The Anatomy of a View Request
>>>> 
>>>> Querying a view has three stages:
>>>> 
>>>> 1. define the view
>>>> 2. build the view index
>>>> 3. query the view index
>>>> 
>>>> A view definition is always in a design document. It can be one or
>>>> JavaScript map/reduce functions, Erlang map/reduce functions, or a mango
>>>> index definition.
>>>> 
>>>> // TODO: link all these view definition options.
>>>> 
>>>> Building the view index is an implicit step in CouchDB. View indexes are
>>>> refreshed at query time, but only if there were any changes in the database
>>>> since the last query. If no refresh is needed, the view result is returned
>>>> from the index directly.
>>>> 
>>>> // TODO: explain query_server
>>>> 
>>>> Querying indexes follows a similar path through chttpd, fabric, rexi,
>>>> fabric_rpc down to the per-node handlers in couch_mrview. Just a few lines
>>>> below couch_mrview:query_all_docs/4 we find query_view/5 which decides
>>>> between map and reduce requests. We care about map-only for now.
>>>> query_view/5 is preceded by query_view/6 which includes a call to
>>>> couch_mrview_util:get_view/4 which looks like it is where we want to look
>>>> next, as the map_fold/5 called by query_view/5 is about looping over rows.
>>>> We hope we can re-use all that logic, and maybe get_view/4 lets us find out
>>>> how we can have it return our new view.
>>>> 
>>>> get_view/4 calls get_view_index_state/4 which in turn calls
>>>> get_view_index_pid/4 that finally calls into couch_index_server:get_index/4
>>>> which looks like it returns the index for our request. Let’s have a look.
>>>> 
>>>> get_index/4 will dive into get_index/2 eventually and that looks indeed
>>>> like where we need to look. In there, we look up view index in an ETS table
>>>> (an in-memory database), and if it can’t find it there, start a new one.
>>>> Either way, a view index is returned. The lookup is by DbName and
>>>> Sig(nature), an md5 hash over the `views` property in a design doc, that
>>>> also corresponds to the *.view filename of the view index.
>>>> 
>>>> 
>>>> ## Faking the index
>>>> 
>>>> So how would we get this to return the index we want to query? We need to
>>>> create an index definition that matches the design doc `views` hash. Hm.
>>>> 
>>>> It is relatively easy to produce a map function that behaves like we want:
>>>> 
>>>> function (doc) {
>>>> var _access = doc.access
>>>> if (!_access) { return }
>>>> if (!isArray(_access) || _access,length === 0) { return }
>>>> _access.forEach( function (user_or_role) {
>>>>  emit([user_or_role, doc._id], doc._rev)
>>>> })
>>>> }
>>>> 
>>>> At query time, we’d have to match the requesting username and roles
>>>> against the first element in the key-array and return the results, while
>>>> replacing the key-array with the second element (the doc _id). All this
>>>> doesn’t sound too hard. Good.
>>>> 
>>>> One snag though: if we think ahead and try to see how we could implement
>>>> by-access-changes we get stuck: a view does not include rows for deleted
>>>> documents while _changes does. In addition, the update sequence for a
>>>> document is not available in a map function. So a regular view can not be
>>>> used here.
>>>> 
>>>> The filtering of deleted docs from a view index happens in
>>>> couch_mrview:map_fold/3. So if we could augment that for our internal view
>>>> requests, that could get us a long way towards reusing the rest of the
>>>> couch_mrview/couch_index machinery.
>>>> 
>>>> Note to self: make sure view compaction doesn’t remove deleted docs. But a
>>>> cursory glance at couch_mrview_compactor:compact_view_btree/5 suggests no
>>>> such thing, but we need to validate this, and if it doesn’t hold, change
>>>> view_compation to keep deleted entries.
>>>> 
>>>> * * *
>>>> 
>>>> We’ll start giving this a try by forking things off in
>>>> couch_mrview:query_all_docs/4 and pretending to call a view with a mocked
>>>> ddoc:
>>>> 
>>>> {
>>>> “_id”: “_design/_access”,
>>>> “language”: “_access”
>>>> “views”: {} // if needed
>>>> } // TODO see which other fields it needs
>>>> 
>>>> We’ll try this road to see if we get to the point where we get a “view
>>>> index not found” error, because we didn’t actually have a view index yet.
>>>> We’ll then try and see if we can produce one. We could try the other way
>>>> around too, building the index first and then trying to query, but the
>>>> approach doesn’t make much of a difference.
>>>> 
>>>> First demo working:
>>>> https://gist.github.com/janl/20b218a3f0eafbf963ee28780261f9fc
>>>> 
>>>> 
>>>> Next Steps:
>>>> - make sure the startkey/endkey/descending argument handling is all
>>>> correct and complete
>>>> - add key un-munging, so the user/role prefix gets filtered out on reads
>>>> - handle roles:
>>>>  - instead of querying the _access view once, we need to issue a
>>>> multi-query, probably via #mrags.multi_get, read up on how that is used
>>>> - then we could start thinking about by-access-seq:
>>>>  - we need access to the update-seq in
>>>> couch_access_native_proc:map_doc, might require view protocol upgrade, or
>>>> we have a post-process function that tags on the update-seq, we’ll see.
>>>>  - the admin/access split we’re doing in query_all_docs should probably
>>>> happen in couch_db:changes_since/5
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> # More specification details
>>>> 
>>>> 
>>>> Documents with in databases with _access enabled are private/admin-only by
>>>> default, and can be made public with the special role _public
>>>> 
>>>> TODO: shared id space or auto-prefix ids
>>>> 
>>>> 
>>>> 
>> 
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Reply via email to