Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Robert Newson Wed, 23 Jan 2019 11:35:05 -0800

Hi Eli,

I agree with Jan that those are great topics and we owe them a response.


In my opening post I called out custom reduces as something we don’t know how 
to carry over yet. That doesn’t mean they are being removed per se. That 
decision would happen on the mailing list and only after the dev community has 
had a chance to solve it. 

There are several parts of the couchdb api that the developers want to remove, 
quite separately from the fdb proposal. All those need calling out too in a 
dedicated thread. Of course it could help this proposal if there are fewer 
features to implement on fdb. 

There will be separate threads in the days and weeks to come so I think more 
details on those can wait. 

This thread is really about the fdb switch and what the community thinks about 
that. 

B. 

> On 23 Jan 2019, at 18:36, Jan Lehnardt <m...@jan.io> wrote:
> 
> Hi Eli,
> 
> Thanks for chiming in. These are all good topics and are in some form or 
> another already on our list to be discussed.  Re query servers: it is for now 
> really just custom reduces and arbitrary startkey/endkey ranges. JS views 
> aren't going anywhere.
> 
> Cheers
> Jan
> —
> 
>> On 23. Jan 2019, at 18:54, Eli Stevens (Gmail) <wickedg...@gmail.com> wrote:
>> 
>> I'd like to request that there be threads where it's appropriate to discuss:
>> 
>> - Managing the refactoring/merge process to avoid the previous situation
>> where 1.x was mostly dead, but 2.x wasn't going to land for a few years.
>> - Other features to deprecate at the same time as losing JS reduce (I
>> assume that this really means "all external query servers" are going away?).
>> - What the support for users who will be stuck on 2.x will be.
>> 
>> Apologies for the noise if those are already on the list of topics.  :)
>> 
>> Cheers,
>> Eli
>> 
>>> On Wed, Jan 23, 2019 at 5:33 AM Jan Lehnardt <j...@apache.org> wrote:
>>> 
>>> Hi Bob,
>>> 
>>> this is all very exciting!
>>> 
>>> First up, full disclosure, the CouchDB PMC has had about two weeks to
>>> think about this already, so if any of the following doesn’t sound like a
>>> knee-jerk reaction, that’s why.
>>> 
>>> I’m personally tentatively optimistic about this proposal and I’m willing
>>> to work through all open questions from governance, contribution management
>>> to the technical bits to see if we as the CouchDB project arrive at a point
>>> where we are comfortable going down this path.
>>> 
>>> The PMC has already identified a set of discussion areas for this dev@
>>> mailing list to go through before any definite decision can be made.
>>> Separate emails for those discussions are going to be posted on this list
>>> shortly, so I won’t go into further detail here.
>>> 
>>> If anyone sees a need for discussion beyond the threads that will appear
>>> here, please speak up at your earliest convenience. This proposal would
>>> mean a big step for our project, and we must make sure to hear all voices.
>>> 
>>> Once we’ve gone through all this, the resulting answers to all the open
>>> questions coming up will end up in a consensus finding process on this
>>> mailing list, which will signify the final project decision.
>>> 
>>> * * *
>>> 
>>> That said, I’d like to highlight one of these topics: IBM/Cloudant’s
>>> contributions going forward.
>>> 
>>> Looking at how 2.0 came to be, the contributions were mostly taken on good
>>> faith (and legal review), and from the trust Cloudant built up operating a
>>> large number of large instances of clusters of what would eventually become
>>> CouchDB 2.0. It has clearly paid off for CouchDB and our current level of
>>> success wouldn’t be without IBM/Cloudant.
>>> 
>>> However, some of the ways we work with the IBM team leave things to be
>>> desired. Specifically, the Apache CouchDB community is frequently not
>>> involved in design discussions around new features. Those happen inside IBM
>>> and we “only” get a PR that then goes through the regular review process.
>>> Again, this has served us well, but we can do even better, so I’d like to
>>> take the opportunity of this larger proposal to suggest we actually do
>>> better. As promised, a more detailed thread about this is going to come up,
>>> and it’ll be the right place to go through the minutiae of this.
>>> 
>>> With this structural change, I believe we are in a great position to work
>>> through the details of this proposal and the subsequent design and
>>> engineering steps.
>>> 
>>> * * *
>>> 
>>> Finally, I want to reiterate Bob’s point: while this proposal is largely
>>> driven by IBM, IBM has no power to unilaterally force the CouchDB project
>>> to accept this proposal and they have already signalled and worked towards
>>> making this a mutually beneficial endeavour. The CouchDB project has
>>> different objectives from IBM and it is up to us to come up with a proposal
>>> that satisfies all of our objectives as well as IBMs, should this motion
>>> pass.
>>> 
>>> Best
>>> Jan
>>> —
>>> 
>>> 
>>>> On 23. Jan 2019, at 11:00, Robert Samuel Newson <rnew...@apache.org>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> CouchDB 2.0 introduced clustering; the ability to scale a single
>>> database across multiple nodes, increasing both the maximum size of a
>>> database and adding native fault-tolerance. This welcome and considerable
>>> step forward was not without its trade-offs. In the years since 2.0 was
>>> released, users frequently encounter the following issues as a direct
>>> consequence of the 2.0 clustering approach:
>>>> 
>>>> 1. Conflict revisions can be created on normal concurrent updates issued
>>> to a single database, since each replica of a database shard independently
>>> chooses whether to accept a given update, and all replicas will eventually
>>> propagate updates that any one of them has chosen to accept.
>>>> 2. Secondary indexes ("views") do not scale the same way as document
>>> lookups, as they are sharded by doc id, not emitted view key (thus forcing
>>> a consultation of all shard ranges for each query).
>>>> 3. The changes feed is no longer totally ordered and, worse, could
>>> replay earlier changes in the event of a node failure (even a temporary
>>> one).
>>>> 
>>>> The idea is to use FoundationDB as the new CouchDB foundational layer,
>>> letting it take care of data storage and placement. An introduction to
>>> FoundationDB would take up too much space here so I will summarise it as a
>>> highly scalable ordered key-value store with transactional semantics,
>>> provides strong consistency, scaling from a single node to many. It is
>>> licensed under the ASLv2 but is not an Apache project.
>>>> 
>>>> By using FoundationDB we can solve all three of the problems listed
>>> above and deliver semantics much closer to CouchDB 1.x's behaviour while
>>> improving upon the scalability advantages that 2.0 introduced. The
>>> essential character of CouchDB would be preserved (MVCC for documents,
>>> replication between CouchDB databases) but the underlying plumbing would
>>> change significantly. In addition, this new foundation will allow us to add
>>> long wished-for features more easily. For example, multi-document
>>> transactions become possible, as does efficient field-level reading and
>>> writing. A further thought is the ability to update views transactionally
>>> with the database update.
>>>> 
>>>> For those familiar with the CouchDB 2.0 architecture, the proposal is,
>>> in effect, to change all the functions in fabric.erl so that they work
>>> against a (possibly remote) FoundationDB cluster instead of the current
>>> implementation of calling into the original CouchDB 1.x code (couch_btree,
>>> couch_file, etc).
>>>> 
>>>> This is a large change and, for full disclosure, the IBM Cloudant team
>>> are proposing it. We have done our due diligence in investigating
>>> FoundationDB as well as detailed investigation into how CouchDB semantics
>>> would be built on top of FoundationDB. Any and all decisions on that must
>>> take place here on the CouchDB developer mailing list, of course, but we
>>> are confident that this is feasible.
>>>> During those investigations we have identified a small number of CouchDB
>>> features that we do not yet see a way to do on FoundationDB, the main one
>>> being custom (Javascript) reduces. This is a direct consequence of no
>>> longer rolling our own persistence layer (couch_btree and friends) and
>>> would likely apply to any alternative technology.
>>>> 
>>>> I think this would be a great advance for CouchDB, preserving what makes
>>> CouchDB special but taking advantage of the superbly engineered
>>> FoundationDB software at the bottom of the stack.
>>>> 
>>>> Regards,
>>>> Robert Newson
>>> 
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
>>> 
>

Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Reply via email to