Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-28 Thread Adam Kocoloski
I’ve gone ahead and submitted an RFC for the design discussed here with a small 
modification:

https://github.com/apache/couchdb/issues/1957

Cheers, Adam

> On Feb 11, 2019, at 2:37 PM, Adam Kocoloski  wrote:
> 
> Agreed, I don’t have answer for this. I propose to drop the optimization for 
> now given the implementation complexity for any solution that does not cause 
> a performance degradation.
> 
> Adam
> 
>> On Feb 11, 2019, at 2:11 PM, Ilya Khlopotov  wrote:
>> 
>>> We could represent these using the following set of KVs:
>>> 
>>> (“foo”, “active”) = true
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”) = “bob”
>> I still cannot see how we can figure out if conflict for JSON path is 
>> present without reading previous revisions. The complex way of solving the 
>> issue is to use some sort of succinct atomically updated structure which we 
>> can quickly read. The structure would have to be capable of answering the 
>> following question:
>> - what are the hashes of different revisions of a subtree for a given json 
>> path
>> 
>> 
>> 
>> On 2019/02/04 23:22:09, Adam Kocoloski  wrote: 
>>> I think it’s fine to start a focused discussion here as it might help 
>>> inform some of the broader debate over in that thread.
>>> 
>>> As a reminder, today CouchDB writes the entire body of each document 
>>> revision on disk as a separate blob. Edit conflicts that have common fields 
>>> between them do not share any storage on disk. The revision tree is encoded 
>>> into a compact format and a copy of it is stored directly in both the by_id 
>>> tree and the by_seq tree. Each leaf entry in the revision tree contain a 
>>> pointer to the position of the associated doc revision on disk.
>>> 
>>> As a further reminder, CouchDB 2.x clusters can generate edit conflict 
>>> revisions just from multiple clients concurrently updating the same 
>>> document in a single cluster. This won’t happen when FoundationDB is 
>>> running under the hood, but users who deploy multiple CouchDB or PouchDB 
>>> servers and replicate between them can of course still produce conflicts 
>>> just like they could in CouchDB 1.x, so we need a solution.
>>> 
>>> Let’s consider the two sub-topics separately: 1) storage of edit conflict 
>>> bodies and 2) revision trees
>>> 
>>> ## Edit Conflict Storage
>>> 
>>> The simplest possible solution would be to store each document revision 
>>> separately, like we do today. We could store document bodies with (“docid”, 
>>> “revid”) as the key prefix, and each transaction could clear the key range 
>>> associated with the base revision against which the edit is being 
>>> attempted. This would work, but I think we can try to be a bit more clever 
>>> and save on storage space given that we’re splitting JSON documents into 
>>> multiple KV pairs.
>>> 
>>> One thought I’d had is to introduce a special enum Value which indicates 
>>> that the subtree “beneath” the given Key is in conflict. For example, 
>>> consider the documents
>>> 
>>> {
>>>   “_id”: “foo”,
>>>   “_rev”: “1-abc”,
>>>   “owner”: “alice”,
>>>   “active”: true
>>> }
>>> 
>>> and 
>>> 
>>> {
>>>   “_id”: “foo”,
>>>   “_rev”: “1-def”,
>>>   “owner”: “bob”,
>>>   “active”: true
>>> }
>>> 
>>> We could represent these using the following set of KVs:
>>> 
>>> (“foo”, “active”) = true
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”) = “bob”
>>> 
>>> This approach also extends to conflicts where the two versions have 
>>> different data types. Consider a more complicated example where bob dropped 
>>> the “active” field and changed the “owner” field to an object:
>>> 
>>> {
>>> “_id”: “foo”,
>>> “_rev”: “1-def”,
>>> “owner”: {
>>>   “name”: “bob”,
>>>   “email”: “b...@example.com"
>>> }
>>> }
>>> 
>>> Now the set of KVs for “foo” looks like this (note that a missing field 
>>> needs to be handled explicitly):
>>> 
>>> (“foo”, “active”) = kCONFLICT
>>> (“foo”, “active”, “1-abc”) = true
>>> (“foo”, “active”, “1-def”) = kMISSING
>>> (“foo”, “owner”) = kCONFLICT
>>> (“foo”, “owner”, “1-abc”) = “alice”
>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
>>> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com”
>>> 
>>> I like this approach for the common case where documents share most of 
>>> their data in common but have a conflict in a very specific field or set of 
>>> fields. 
>>> 
>>> I’ve encountered one important downside, though: an edit that replicates in 
>>> and conflicts with the entire document can cause a bit of a data explosion. 
>>> Consider a case where I have 10 conflicting versions of a 100KB document, 
>>> but the conflicts are all related to a single scalar value. Now I replicate 
>>> in an empty document, and suddenly I have a kCONFLICT at the root. In this 
>>> model I now need to list out every path of every one of the 10 existing 
>>> revisions and I end up with a 1MB update. Yuck. That’s technically no worse 
>>> i

Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Dave Cottlehuber
On Thu, 28 Feb 2019, at 13:19, Robert Newson wrote
> Thanks to you both, and I agree. 
> 
> Adam's "I would like to see a basic “native” attachment provider with 
> the limitations described in 2), as well as an “object store” provider 
> targeting the S3 API." is my position/preference too. 

ditto. node local storage works for me, this is the single node case which is 
important to have not just for devs but any small environment.

there is a plethora of clustered file systems waiting to eat your data if 1 
node isnt enough, and while i dont enjoy the s3 api it is widespread with many 
options for using and self hosting.

range queries are useful to me (thanks Bob!) but if its a deal killer I'd  find 
a workaround, probably a proxy http server.

random thought - if we stored in fdb a url as a pointer then whipping up a 
generic proxy would be reasonably easy and could deal with file system and s3 
alike:

file:///usr/local/filesystem
https://my.cdn.com/
s3://aws.clone.com/

this would need to have suitable credentials in couch and some way of knowing 
which credentials go with which db or remote..o0O

in terms of storing full content say 100s of mb in fdb as Bob's outlined, is 
the main concern handling potentially failed  partial  uploads? like our 
current b tree lost space? or are there other issues as well?

if so, one could imagine using a temporary key while receiving chunks, and only 
on completion moving those into the correct store?

A+
Dave


Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Joan Touzet
Chiming in to agree with option 2, and if that's too small for
you, you should be able to use either an cloud backend, or a
local file system approach.

Cloud backend should be abstracted to the point where you could
build support for something other than S3 - for instance, B2[1]
would be lovely - but S3 is obviously the API to go after first.
(Lots of SMBs I know are picking B2 because of how cheap it is.)

NFS/CIFS/iSCSI mounts (provided by your favourite SAN) or something
like AWS's EFS would make sense here.

-Joan

[1]: https://www.backblaze.com/b2/cloud-storage.html

- Original Message -
> From: "Adam Kocoloski" 
> To: dev@couchdb.apache.org
> Sent: Thursday, February 28, 2019 6:41:15 AM
> Subject: Re: [DISCUSS] Attachment support in CouchDB with FDB
> 
> I would like to see a basic “native” attachment provider with the
> limitations described in 2), as well as an “object store” provider
> targeting the S3 API. I think the consistency considerations are
> tractable if you’re comfortable with the possibility that
> attachments could possibly be orphaned in the object store in the
> case of a failed transaction.
> 
> I had not considered the “just write them on the file system”
> provider but that’s probably partly my cloud-native blinders. I
> think the main question there is redundancy; I would argue against
> trying to do any sort of replication across local disks. Users who
> happen to have an NFS-style mount point accessible to all the
> CouchDB nodes could use this option reliably, though.
> 
> We should calculate a safe maximum attachment size for the native
> provider — as I understand things the FDB transaction size includes
> both keys and values, so our effective attachment size limit will be
> smaller.
> 
> Adam
> 
> > On Feb 28, 2019, at 6:21 AM, Robert Newson 
> > wrote:
> > 
> > Hi,
> > 
> > Yes, I agree we should have a framework like that. Folks should be
> > able to choose S3 or COS (IBM), etc.
> > 
> > I am personally on the hook for the implementation for CouchDB and
> > for IBM Cloudant and expect them to be different, so the
> > framework, IMO, is a given.
> > 
> > B.
> > 
> >> On 28 Feb 2019, at 10:33, Jan Lehnardt  wrote:
> >> 
> >> Thanks for getting this started, Bob!
> >> 
> >> In fear of derailing this right off the bat, is there a potential
> >> 4) approach where on the CouchDB side there is a way to specify
> >> “attachment backends”, one of which could be 2), but others could
> >> be “node local file storage”*, others could be S3-API compatible,
> >> etc?
> >> 
> >> *a bunch of heavy handwaving about how to ensure consistency and
> >> fault tolerance here.
> >> 
> >> * * *
> >> 
> >> My hypothetical 4) could also be a later addition, and we’ll do
> >> one of 1-3 first.
> >> 
> >> 
> >> * * *
> >> 
> >> From 1-3, I think 2 is most pragmatic in terms of keeping
> >> desirable functionality, while limiting it so it can be useful in
> >> practice.
> >> 
> >> I feel strongly about not dropping attachment support. While not
> >> ideal in all cases, it is an extremely useful and reasonably
> >> popular feature.
> >> 
> >> Best
> >> Jan
> >> —
> >> 
> >>> On 28. Feb 2019, at 11:22, Robert Newson 
> >>> wrote:
> >>> 
> >>> Hi All,
> >>> 
> >>> We've not yet discussed attachments in terms of the foundationdb
> >>> work so here's where we do that.
> >>> 
> >>> Today, CouchDB allows you to store large binary values, stored as
> >>> a series of much smaller chunks. These "attachments" cannot be
> >>> indexed, they can only be sent and received (you can fetch the
> >>> whole thing or you can fetch arbitrary subsets of them).
> >>> 
> >>> On the FDB side, we have a few constraints. A transaction cannot
> >>> be more than 10MB and cannot take more than 5 seconds.
> >>> 
> >>> Given that, there are a few paths to attachment support going
> >>> forward;
> >>> 
> >>> 1) Drop native attachment support.
> >>> 
> >>> I suspect this is not going to be a popular approach but it's
> >>> worth hearing a range of views. Instead of direct attachment
> >>> support, a user could store the URL to the large binary content
> >>> and could simply fetch that URL directly.
> >>> 
> >>> 2) Write attachments into FDB but with limits.
> >>> 
> >>> The next simplest is to write the attachments into FDB as a
> >>> series of key/value entries, where the key is {database_name,
> >>> doc_id, attachment_name, 0..N} and the value is a short byte
> >>> array (say, 16K to match current). The 0..N is just a counter
> >>> such that we can do an fdb range get / iterator to retrieve the
> >>> attachment. An embellishment would restore the http Range header
> >>> options, if we still wanted that (disclaimer: I implemented the
> >>> Range thing many years ago, I'm happy to drop support if no one
> >>> really cares for it in 2019).
> >>> 
> >>> This would be subject to the 10mb and 5s limit, which is less
> >>> that you _can_ do today with attachments but not, in my opinion,
> >>> any less that people actually do (wit

Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Robert Newson
Thanks to you both, and I agree. 

Adam's "I would like to see a basic “native” attachment provider with the 
limitations described in 2), as well as an “object store” provider targeting 
the S3 API." is my position/preference too. 

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 28 Feb 2019, at 11:41, Adam Kocoloski wrote:
> I would like to see a basic “native” attachment provider with the 
> limitations described in 2), as well as an “object store” provider 
> targeting the S3 API. I think the consistency considerations are 
> tractable if you’re comfortable with the possibility that attachments 
> could possibly be orphaned in the object store in the case of a failed 
> transaction.
> 
> I had not considered the “just write them on the file system” provider 
> but that’s probably partly my cloud-native blinders. I think the main 
> question there is redundancy; I would argue against trying to do any 
> sort of replication across local disks. Users who happen to have an 
> NFS-style mount point accessible to all the CouchDB nodes could use 
> this option reliably, though.
> 
> We should calculate a safe maximum attachment size for the native 
> provider — as I understand things the FDB transaction size includes 
> both keys and values, so our effective attachment size limit will be 
> smaller.
> 
> Adam
> 
> > On Feb 28, 2019, at 6:21 AM, Robert Newson  wrote:
> > 
> > Hi,
> > 
> > Yes, I agree we should have a framework like that. Folks should be able to 
> > choose S3 or COS (IBM), etc. 
> > 
> > I am personally on the hook for the implementation for CouchDB and for IBM 
> > Cloudant and expect them to be different, so the framework, IMO, is a 
> > given. 
> > 
> > B. 
> > 
> >> On 28 Feb 2019, at 10:33, Jan Lehnardt  wrote:
> >> 
> >> Thanks for getting this started, Bob!
> >> 
> >> In fear of derailing this right off the bat, is there a potential 4) 
> >> approach where on the CouchDB side there is a way to specify “attachment 
> >> backends”, one of which could be 2), but others could be “node local file 
> >> storage”*, others could be S3-API compatible, etc?
> >> 
> >> *a bunch of heavy handwaving about how to ensure consistency and fault 
> >> tolerance here.
> >> 
> >> * * *
> >> 
> >> My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
> >> first.
> >> 
> >> 
> >> * * *
> >> 
> >> From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
> >> functionality, while limiting it so it can be useful in practice.
> >> 
> >> I feel strongly about not dropping attachment support. While not ideal in 
> >> all cases, it is an extremely useful and reasonably popular feature.
> >> 
> >> Best
> >> Jan
> >> —
> >> 
> >>> On 28. Feb 2019, at 11:22, Robert Newson  wrote:
> >>> 
> >>> Hi All,
> >>> 
> >>> We've not yet discussed attachments in terms of the foundationdb work so 
> >>> here's where we do that.
> >>> 
> >>> Today, CouchDB allows you to store large binary values, stored as a 
> >>> series of much smaller chunks. These "attachments" cannot be indexed, 
> >>> they can only be sent and received (you can fetch the whole thing or you 
> >>> can fetch arbitrary subsets of them).
> >>> 
> >>> On the FDB side, we have a few constraints. A transaction cannot be more 
> >>> than 10MB and cannot take more than 5 seconds.
> >>> 
> >>> Given that, there are a few paths to attachment support going forward;
> >>> 
> >>> 1) Drop native attachment support. 
> >>> 
> >>> I suspect this is not going to be a popular approach but it's worth 
> >>> hearing a range of views. Instead of direct attachment support, a user 
> >>> could store the URL to the large binary content and could simply fetch 
> >>> that URL directly.
> >>> 
> >>> 2) Write attachments into FDB but with limits.
> >>> 
> >>> The next simplest is to write the attachments into FDB as a series of 
> >>> key/value entries, where the key is {database_name, doc_id, 
> >>> attachment_name, 0..N} and the value is a short byte array (say, 16K to 
> >>> match current). The 0..N is just a counter such that we can do an fdb 
> >>> range get / iterator to retrieve the attachment. An embellishment would 
> >>> restore the http Range header options, if we still wanted that 
> >>> (disclaimer: I implemented the Range thing many years ago, I'm happy to 
> >>> drop support if no one really cares for it in 2019).
> >>> 
> >>> This would be subject to the 10mb and 5s limit, which is less that you 
> >>> _can_ do today with attachments but not, in my opinion, any less that 
> >>> people actually do (with some notable outliers like npm in the past).
> >>> 
> >>> 3) Full functionality
> >>> 
> >>> This would be the same as today. Attachments of arbitrary size (up to the 
> >>> disk capacity of the fdb cluster). It would require some extra cleverness 
> >>> to work over multiple txn transactions and in such a way that an aborted 
> >>> upload doesn't leave partially uploaded data in fdb forever. I have not 
> >>> sat down and design

Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Adam Kocoloski
I would like to see a basic “native” attachment provider with the limitations 
described in 2), as well as an “object store” provider targeting the S3 API. I 
think the consistency considerations are tractable if you’re comfortable with 
the possibility that attachments could possibly be orphaned in the object store 
in the case of a failed transaction.

I had not considered the “just write them on the file system” provider but 
that’s probably partly my cloud-native blinders. I think the main question 
there is redundancy; I would argue against trying to do any sort of replication 
across local disks. Users who happen to have an NFS-style mount point 
accessible to all the CouchDB nodes could use this option reliably, though.

We should calculate a safe maximum attachment size for the native provider — as 
I understand things the FDB transaction size includes both keys and values, so 
our effective attachment size limit will be smaller.

Adam

> On Feb 28, 2019, at 6:21 AM, Robert Newson  wrote:
> 
> Hi,
> 
> Yes, I agree we should have a framework like that. Folks should be able to 
> choose S3 or COS (IBM), etc. 
> 
> I am personally on the hook for the implementation for CouchDB and for IBM 
> Cloudant and expect them to be different, so the framework, IMO, is a given. 
> 
> B. 
> 
>> On 28 Feb 2019, at 10:33, Jan Lehnardt  wrote:
>> 
>> Thanks for getting this started, Bob!
>> 
>> In fear of derailing this right off the bat, is there a potential 4) 
>> approach where on the CouchDB side there is a way to specify “attachment 
>> backends”, one of which could be 2), but others could be “node local file 
>> storage”*, others could be S3-API compatible, etc?
>> 
>> *a bunch of heavy handwaving about how to ensure consistency and fault 
>> tolerance here.
>> 
>> * * *
>> 
>> My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
>> first.
>> 
>> 
>> * * *
>> 
>> From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
>> functionality, while limiting it so it can be useful in practice.
>> 
>> I feel strongly about not dropping attachment support. While not ideal in 
>> all cases, it is an extremely useful and reasonably popular feature.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 28. Feb 2019, at 11:22, Robert Newson  wrote:
>>> 
>>> Hi All,
>>> 
>>> We've not yet discussed attachments in terms of the foundationdb work so 
>>> here's where we do that.
>>> 
>>> Today, CouchDB allows you to store large binary values, stored as a series 
>>> of much smaller chunks. These "attachments" cannot be indexed, they can 
>>> only be sent and received (you can fetch the whole thing or you can fetch 
>>> arbitrary subsets of them).
>>> 
>>> On the FDB side, we have a few constraints. A transaction cannot be more 
>>> than 10MB and cannot take more than 5 seconds.
>>> 
>>> Given that, there are a few paths to attachment support going forward;
>>> 
>>> 1) Drop native attachment support. 
>>> 
>>> I suspect this is not going to be a popular approach but it's worth hearing 
>>> a range of views. Instead of direct attachment support, a user could store 
>>> the URL to the large binary content and could simply fetch that URL 
>>> directly.
>>> 
>>> 2) Write attachments into FDB but with limits.
>>> 
>>> The next simplest is to write the attachments into FDB as a series of 
>>> key/value entries, where the key is {database_name, doc_id, 
>>> attachment_name, 0..N} and the value is a short byte array (say, 16K to 
>>> match current). The 0..N is just a counter such that we can do an fdb range 
>>> get / iterator to retrieve the attachment. An embellishment would restore 
>>> the http Range header options, if we still wanted that (disclaimer: I 
>>> implemented the Range thing many years ago, I'm happy to drop support if no 
>>> one really cares for it in 2019).
>>> 
>>> This would be subject to the 10mb and 5s limit, which is less that you 
>>> _can_ do today with attachments but not, in my opinion, any less that 
>>> people actually do (with some notable outliers like npm in the past).
>>> 
>>> 3) Full functionality
>>> 
>>> This would be the same as today. Attachments of arbitrary size (up to the 
>>> disk capacity of the fdb cluster). It would require some extra cleverness 
>>> to work over multiple txn transactions and in such a way that an aborted 
>>> upload doesn't leave partially uploaded data in fdb forever. I have not sat 
>>> down and designed this yet, hence I would very much like to hear from the 
>>> community as to which of these paths are sufficient.
>>> 
>>> -- 
>>> Robert Samuel Newson
>>> rnew...@apache.org
>> 
>> -- 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
> 



Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Robert Newson
Hi,

Yes, I agree we should have a framework like that. Folks should be able to 
choose S3 or COS (IBM), etc. 

I am personally on the hook for the implementation for CouchDB and for IBM 
Cloudant and expect them to be different, so the framework, IMO, is a given. 

B. 

> On 28 Feb 2019, at 10:33, Jan Lehnardt  wrote:
> 
> Thanks for getting this started, Bob!
> 
> In fear of derailing this right off the bat, is there a potential 4) approach 
> where on the CouchDB side there is a way to specify “attachment backends”, 
> one of which could be 2), but others could be “node local file storage”*, 
> others could be S3-API compatible, etc?
> 
> *a bunch of heavy handwaving about how to ensure consistency and fault 
> tolerance here.
> 
> * * *
> 
> My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
> first.
> 
> 
> * * *
> 
> From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
> functionality, while limiting it so it can be useful in practice.
> 
> I feel strongly about not dropping attachment support. While not ideal in all 
> cases, it is an extremely useful and reasonably popular feature.
> 
> Best
> Jan
> —
> 
>> On 28. Feb 2019, at 11:22, Robert Newson  wrote:
>> 
>> Hi All,
>> 
>> We've not yet discussed attachments in terms of the foundationdb work so 
>> here's where we do that.
>> 
>> Today, CouchDB allows you to store large binary values, stored as a series 
>> of much smaller chunks. These "attachments" cannot be indexed, they can only 
>> be sent and received (you can fetch the whole thing or you can fetch 
>> arbitrary subsets of them).
>> 
>> On the FDB side, we have a few constraints. A transaction cannot be more 
>> than 10MB and cannot take more than 5 seconds.
>> 
>> Given that, there are a few paths to attachment support going forward;
>> 
>> 1) Drop native attachment support. 
>> 
>> I suspect this is not going to be a popular approach but it's worth hearing 
>> a range of views. Instead of direct attachment support, a user could store 
>> the URL to the large binary content and could simply fetch that URL directly.
>> 
>> 2) Write attachments into FDB but with limits.
>> 
>> The next simplest is to write the attachments into FDB as a series of 
>> key/value entries, where the key is {database_name, doc_id, attachment_name, 
>> 0..N} and the value is a short byte array (say, 16K to match current). The 
>> 0..N is just a counter such that we can do an fdb range get / iterator to 
>> retrieve the attachment. An embellishment would restore the http Range 
>> header options, if we still wanted that (disclaimer: I implemented the Range 
>> thing many years ago, I'm happy to drop support if no one really cares for 
>> it in 2019).
>> 
>> This would be subject to the 10mb and 5s limit, which is less that you _can_ 
>> do today with attachments but not, in my opinion, any less that people 
>> actually do (with some notable outliers like npm in the past).
>> 
>> 3) Full functionality
>> 
>> This would be the same as today. Attachments of arbitrary size (up to the 
>> disk capacity of the fdb cluster). It would require some extra cleverness to 
>> work over multiple txn transactions and in such a way that an aborted upload 
>> doesn't leave partially uploaded data in fdb forever. I have not sat down 
>> and designed this yet, hence I would very much like to hear from the 
>> community as to which of these paths are sufficient.
>> 
>> -- 
>> Robert Samuel Newson
>> rnew...@apache.org
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 



Re: [DISCUSS] Per-doc access control

2019-02-28 Thread Jan Lehnardt
Hanks Adam and Robert for sorting this one.

Michael, the idea is to give mutually untrusting users access a 
as-close-to-verbatim-CouchDB API to their section of a shared database. So you 
get full doc CRUD, _changes, views, replication, the lot, but only for 
documents that you have access to. So there is no sneaking behind the back and 
getting all the data, if you don’t already have access to it, in which case you 
already have access to them :)

Best
Jan
—

> On 27. Feb 2019, at 22:55, Adam Kocoloski  wrote:
> 
> 
>> On Feb 27, 2019, at 3:47 PM, Michael Fair  wrote:
>> 
>>> 
>>> 
 This might be what is already planned (it hasn't sounded like it to me
 though).
 And I definitely think changing the perspective to make "databases" a
 function of the access control system and to make views based on "access
 controlled collection results" instead of "databases" would be quite
 powerful...
 
 Regards,
 Mike
>>> 
>>> Hi Mike, what you’ve described here is very very similar to what Jan is
>>> building.
>>> 
>>> Adam
>>> 
>>> 
>> I read back through the links that Jan posted again; the details I was
>> looking for are probably somewhere in the sharding conversation that my
>> eyes glazed over on or somewhere in the notes of the roadmap discussion
>> which made it a bit hard for me to find just the parts related to this (I
>> most likely scrolled through it). ;-)
>> 
>> Thanks for clarifying for me, and for letting me chime in!
>> 
>> Mike
> 
> Those details are really hard to find — I can only find them because I know 
> exactly where to look in the minutes of a meeting that I attended well over a 
> year ago :) Probably a good case for an RFC so we have a current pointer to 
> the plan.
> 
> Adam

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/



Re: [DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Jan Lehnardt
Thanks for getting this started, Bob!

In fear of derailing this right off the bat, is there a potential 4) approach 
where on the CouchDB side there is a way to specify “attachment backends”, one 
of which could be 2), but others could be “node local file storage”*, others 
could be S3-API compatible, etc?

*a bunch of heavy handwaving about how to ensure consistency and fault 
tolerance here.

* * *

My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
first.


* * *

From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
functionality, while limiting it so it can be useful in practice.

I feel strongly about not dropping attachment support. While not ideal in all 
cases, it is an extremely useful and reasonably popular feature.

Best
Jan
—

> On 28. Feb 2019, at 11:22, Robert Newson  wrote:
> 
> Hi All,
> 
> We've not yet discussed attachments in terms of the foundationdb work so 
> here's where we do that.
> 
> Today, CouchDB allows you to store large binary values, stored as a series of 
> much smaller chunks. These "attachments" cannot be indexed, they can only be 
> sent and received (you can fetch the whole thing or you can fetch arbitrary 
> subsets of them).
> 
> On the FDB side, we have a few constraints. A transaction cannot be more than 
> 10MB and cannot take more than 5 seconds.
> 
> Given that, there are a few paths to attachment support going forward;
> 
> 1) Drop native attachment support. 
> 
> I suspect this is not going to be a popular approach but it's worth hearing a 
> range of views. Instead of direct attachment support, a user could store the 
> URL to the large binary content and could simply fetch that URL directly.
> 
> 2) Write attachments into FDB but with limits.
> 
> The next simplest is to write the attachments into FDB as a series of 
> key/value entries, where the key is {database_name, doc_id, attachment_name, 
> 0..N} and the value is a short byte array (say, 16K to match current). The 
> 0..N is just a counter such that we can do an fdb range get / iterator to 
> retrieve the attachment. An embellishment would restore the http Range header 
> options, if we still wanted that (disclaimer: I implemented the Range thing 
> many years ago, I'm happy to drop support if no one really cares for it in 
> 2019).
> 
> This would be subject to the 10mb and 5s limit, which is less that you _can_ 
> do today with attachments but not, in my opinion, any less that people 
> actually do (with some notable outliers like npm in the past).
> 
> 3) Full functionality
> 
> This would be the same as today. Attachments of arbitrary size (up to the 
> disk capacity of the fdb cluster). It would require some extra cleverness to 
> work over multiple txn transactions and in such a way that an aborted upload 
> doesn't leave partially uploaded data in fdb forever. I have not sat down and 
> designed this yet, hence I would very much like to hear from the community as 
> to which of these paths are sufficient.
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/



[DISCUSS] Attachment support in CouchDB with FDB

2019-02-28 Thread Robert Newson
Hi All,

We've not yet discussed attachments in terms of the foundationdb work so here's 
where we do that.

Today, CouchDB allows you to store large binary values, stored as a series of 
much smaller chunks. These "attachments" cannot be indexed, they can only be 
sent and received (you can fetch the whole thing or you can fetch arbitrary 
subsets of them).

On the FDB side, we have a few constraints. A transaction cannot be more than 
10MB and cannot take more than 5 seconds.

Given that, there are a few paths to attachment support going forward;

1) Drop native attachment support. 

I suspect this is not going to be a popular approach but it's worth hearing a 
range of views. Instead of direct attachment support, a user could store the 
URL to the large binary content and could simply fetch that URL directly.

2) Write attachments into FDB but with limits.

The next simplest is to write the attachments into FDB as a series of key/value 
entries, where the key is {database_name, doc_id, attachment_name, 0..N} and 
the value is a short byte array (say, 16K to match current). The 0..N is just a 
counter such that we can do an fdb range get / iterator to retrieve the 
attachment. An embellishment would restore the http Range header options, if we 
still wanted that (disclaimer: I implemented the Range thing many years ago, 
I'm happy to drop support if no one really cares for it in 2019).

This would be subject to the 10mb and 5s limit, which is less that you _can_ do 
today with attachments but not, in my opinion, any less that people actually do 
(with some notable outliers like npm in the past).

3) Full functionality

This would be the same as today. Attachments of arbitrary size (up to the disk 
capacity of the fdb cluster). It would require some extra cleverness to work 
over multiple txn transactions and in such a way that an aborted upload doesn't 
leave partially uploaded data in fdb forever. I have not sat down and designed 
this yet, hence I would very much like to hear from the community as to which 
of these paths are sufficient.

-- 
  Robert Samuel Newson
  rnew...@apache.org