Hi,

Firstly, CouchDB today does not have multi-tenancy as a feature. Cloudant does 
and achieves this by inserting the tenant's name as a prefix on the database 
name (so "rnewson/db1" is a different database to "sleroux/db1"), with 
appropriate stripping of the prefix in various responses. I would like to see 
multi-tenancy carried into CouchDB as first-level feature, though.

With that preamble done, each tenant will have a unique label pretty much by 
definition, and this would be included in all the keys. Running that, or other 
properties, through a cryptographically secure message digest algorithm 
achieves nothing but obfuscation and, as you note, the possibility (however 
remote) of a collision. Crypto isn't magic, even if it looks like magic.

FDB provides the notion of a "Directory" which is a mechanism to help with very 
long keys, given the key length constraint of 10k.

So, instead of representing a doc of {"foo":12} in "db1" of my "rnewson" 
account simply as;

/couchdb/rnewson/db1/doc1/foo => 12

we could create a Directory for the prefix "/couchdb/rnewson/db1" instead;

dirspace/couchdb/rnewson/db1 => 0x01
0x01/doc1/foo => 12

We're overdue for the Document Model RFC that would make this explicit.

Finally, I think we're passed the "proposition" stage as there is broad 
agreement (and no disagreement) from the conversations already had. We are a 
little behind on writing and publishing the RFC's that will describe the full 
work, though.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 18 Mar 2019, at 17:32, Steven Le Roux wrote:
> Hi everyone.
> 
> I'm new here and just discovered the ongoing proposition for CouchDB to
> rely upon FDB.
> 
> With my team, we were considering providing an HTTP API over FDB in the
> form of the CouchDB API definition, so I'm very pleased to see there is
> already an ongoing effort for this (even if still a proposition). I've
> tried to catch up with all the good discussions on how you could make this
> work, mapping to the K/V model, but sorry if I could have missed a point.
> 
> I'm curious on how you're considering to manage multi tenancy while
> ensuring a good scalability and avoiding hotspotting.
> 
> I've read an idea from Mickael with CryptoHash to map the model this way :
> 
> {bucket_id}/{cryptohash}  : value
> 
> We currently use this CryptoHash mecanism to manage some data in a multi
> tenancy context applied to Time Series.
> 
> Here is a simple diagram that summarize it :
> 
> {raw_data} -> ingress component -> {hashed_metadata+data} -> HBase
>                                 -> {crypted_metadata}     -> HBase
>                                 -> {crypted_metadata}     -> Directory service
> 
> Query -> egress component -> HBase
> 
> raw_data is in the metric{tags} format, like in Prometheus/OpenTSDB/Warp10
> style.
> hashed metadata is a double 64 or 128 bits hashes of hash(metric) +
> hash(tags).
> Default is 64bits but it can lead to collision in the keyspace above 1B
> unique series where 128bits hashes are safer.
> egress will query the Directoy service to get the series list to be read in
> the store.
> 
> While authenticating, a custom "application" label is embedded into a label
> that ends in the data model, then hashed that avoid conflict between
> users.Hashed metadata are suffixed with a timestamp because it's convenient
> for Time Series data.
> What makes it very useful is :
>  - it can still use scans per series (metrics+tags)
>  - it avoids hotspotting the cluster and ensures a very good distributions
> among nodes
>  - it provides authentication through a directory service that act as an
> indirection
>  - keys are consistent while metrics or tags can be very long
> 
> I think this kind of model can perfectly apply to FDB for documents given
> that Namespace would be a user application/bucket/...  :
> 
> hash ( {NS} + {...} + {DOC_ID} ) / fields / ...
> 
> Drawbacks are that it may require a bit more storage for keys, but hashing
> could be adjusted given the use case. Moreover, managing rights at the
> document level would also require additional fields or few bytes to manage
> this, while using a directory index (could be as memory inside CouchDB,
> outside relying on something like Elastic, or available directly inside FDB)
> 
> I realize that just FDB as a backend is a considerable amount of work and
> pushing multi tenancy adds even more work maybe into CouchDB itself. For
> example, Tokens could embed rights and buckets ids, that would be used by
> CouchDB to authorize and build the underlying data model for storing with
> scalability and optimizations in mind. Also, did anyone considered reaching
> the FDB guys to try to align CouchDB document representation to the
> Document Layer (
> https://foundationdb.github.io/fdb-document-layer/data-modeling.html ).
> This would make CouchDB to be also MongoDB API compatible.
> 
> I don't where discussions are, but maybe we could help :)
>

Reply via email to