Hi, Firstly, CouchDB today does not have multi-tenancy as a feature. Cloudant does and achieves this by inserting the tenant's name as a prefix on the database name (so "rnewson/db1" is a different database to "sleroux/db1"), with appropriate stripping of the prefix in various responses. I would like to see multi-tenancy carried into CouchDB as first-level feature, though.
With that preamble done, each tenant will have a unique label pretty much by definition, and this would be included in all the keys. Running that, or other properties, through a cryptographically secure message digest algorithm achieves nothing but obfuscation and, as you note, the possibility (however remote) of a collision. Crypto isn't magic, even if it looks like magic. FDB provides the notion of a "Directory" which is a mechanism to help with very long keys, given the key length constraint of 10k. So, instead of representing a doc of {"foo":12} in "db1" of my "rnewson" account simply as; /couchdb/rnewson/db1/doc1/foo => 12 we could create a Directory for the prefix "/couchdb/rnewson/db1" instead; dirspace/couchdb/rnewson/db1 => 0x01 0x01/doc1/foo => 12 We're overdue for the Document Model RFC that would make this explicit. Finally, I think we're passed the "proposition" stage as there is broad agreement (and no disagreement) from the conversations already had. We are a little behind on writing and publishing the RFC's that will describe the full work, though. B. -- Robert Samuel Newson rnew...@apache.org On Mon, 18 Mar 2019, at 17:32, Steven Le Roux wrote: > Hi everyone. > > I'm new here and just discovered the ongoing proposition for CouchDB to > rely upon FDB. > > With my team, we were considering providing an HTTP API over FDB in the > form of the CouchDB API definition, so I'm very pleased to see there is > already an ongoing effort for this (even if still a proposition). I've > tried to catch up with all the good discussions on how you could make this > work, mapping to the K/V model, but sorry if I could have missed a point. > > I'm curious on how you're considering to manage multi tenancy while > ensuring a good scalability and avoiding hotspotting. > > I've read an idea from Mickael with CryptoHash to map the model this way : > > {bucket_id}/{cryptohash} : value > > We currently use this CryptoHash mecanism to manage some data in a multi > tenancy context applied to Time Series. > > Here is a simple diagram that summarize it : > > {raw_data} -> ingress component -> {hashed_metadata+data} -> HBase > -> {crypted_metadata} -> HBase > -> {crypted_metadata} -> Directory service > > Query -> egress component -> HBase > > raw_data is in the metric{tags} format, like in Prometheus/OpenTSDB/Warp10 > style. > hashed metadata is a double 64 or 128 bits hashes of hash(metric) + > hash(tags). > Default is 64bits but it can lead to collision in the keyspace above 1B > unique series where 128bits hashes are safer. > egress will query the Directoy service to get the series list to be read in > the store. > > While authenticating, a custom "application" label is embedded into a label > that ends in the data model, then hashed that avoid conflict between > users.Hashed metadata are suffixed with a timestamp because it's convenient > for Time Series data. > What makes it very useful is : > - it can still use scans per series (metrics+tags) > - it avoids hotspotting the cluster and ensures a very good distributions > among nodes > - it provides authentication through a directory service that act as an > indirection > - keys are consistent while metrics or tags can be very long > > I think this kind of model can perfectly apply to FDB for documents given > that Namespace would be a user application/bucket/... : > > hash ( {NS} + {...} + {DOC_ID} ) / fields / ... > > Drawbacks are that it may require a bit more storage for keys, but hashing > could be adjusted given the use case. Moreover, managing rights at the > document level would also require additional fields or few bytes to manage > this, while using a directory index (could be as memory inside CouchDB, > outside relying on something like Elastic, or available directly inside FDB) > > I realize that just FDB as a backend is a considerable amount of work and > pushing multi tenancy adds even more work maybe into CouchDB itself. For > example, Tokens could embed rights and buckets ids, that would be used by > CouchDB to authorize and build the underlying data model for storing with > scalability and optimizations in mind. Also, did anyone considered reaching > the FDB guys to try to align CouchDB document representation to the > Document Layer ( > https://foundationdb.github.io/fdb-document-layer/data-modeling.html ). > This would make CouchDB to be also MongoDB API compatible. > > I don't where discussions are, but maybe we could help :) >