Re: Important update on couchdb's foundationdb work

Chintan Mishra from Rebhu Sun, 13 Mar 2022 09:27:45 -0700

As a user, my team and I were keenly looking forward to CouchDB v4 withFoundationDB.

Given the current situation, it is only reasonable to come up with abest alternative.


How about refactoring CouchDB to work with multiple storage engines?

The default CouchDB will support whatever the PMC agrees upon. Whereasthe community can tinker with different backend storage engines. So, theFoundationDB can be one of the backing engines that get used withCouchDB. Other storage engines can be RocksDB, Apache Derby, etc.


Thank you.

--
Chintan Mishra

On 13/03/22 17:09, Robert Newson wrote:

Thank you for this feedback.

I think it’s reasonable to worry about tying CouchDB to FoundationDB for some 
of the reasons you mentioned, but not all of them. We did worry, at the start, 
at the lack of a governance policy around FoundationDB; something that would 
help ensure the project is not beholden to a single corporate entity that might 
abandon the project or take it in places that make it unsuitable for CouchDB in 
the future. There hasn’t been much progress on that, but likewise the project 
has stayed true to form.

CouchDB is critically dependent on Erlang/OTP, among other components, which 
similarly lack the kind of governance or oversight that Apache projects themselves 
work within. At no point have I feared the "project will end up in FoundationDB 
integrating CouchDB rather than the other way around”. FoundationDB is not a 
database, it is explicitly only foundational support to build databases on top of.

"If even you guys weren't treated as a priority, I doubt that my feature 
requests and other input will matter even one bit as a user.” - I’m not sure who you 
refer to with “you guys”, but I remind everyone that the CouchDB contributors from 
IBM Cloudant are the main contributors to CouchDB 2.0 and 3.0, have been so for 
years and are in, many cases, either CouchDB committers or PMC members. They are 
“us” as much as any other contributor. That the Cloudant team has moved focus from 
CouchDB 4.0 (as it would have been) to 3.0 is a re-establishment of the status quo 
ante.

"I doubt that my feature requests and other input will matter even one bit as a 
user.” — I strongly disagree here. Community contributions are hugely valuable and 
valued, the rewrite of the lower layers of CouchDB would not have changed that 
significantly. CouchDB-FDB is still written in Erlang, the http layer is largely the 
same code as before. The parts that interact with FoundationDB are confined to a 
single library application (erlfdb) which exposes the C language bindings as Erlang 
functions and data structures. Unless you are working at that level you can mostly 
ignore it.

Finally, while I don’t think we’ve explicitly described it this way, 
CouchDB-FDB effectively _is_ a “layer” on top of FDB in the same sense that 
their “document layer” (which is mongo-like) is.

B.

On 13 Mar 2022, at 11:17, Reddy B. <redd...@live.fr> wrote:

Hello!

Thanks a lot for this update and overview of the situation. As users (our 
company has been using couchdb since 2015 circa as the main database of our 3 
tier web apps), I feel it may be preferable to move the couchdb-fdb work to a 
separate project having a different name. As Janh has mentioned, the internals 
and daily management of FDB may with certain regards be at odds with the 
philosophy and user experience that couchdb wants to provide.

Moving this effort to a different project would give people interested in this 
effort more flexibility to introduce breaking changes and limitations taking 
full advantage of the philosophy of FDB. I feel the idea that: if you have 
outscaled CouchDB, move to couchdb-fdb (or  another more specialized DB) is the 
right idea. Couchdb-fdb advantage compared to alternative would simply be that 
it implements both the replication protocol and the HTTP API.

This project may/should even "simply" become something under the umbrella of 
the FoundationDB layer similar to the MongoDB-compatible document layer of FoundationDB 
[1].

And this fact is also the cause of the unease I personally have this 
FoundationDB migration: it looks like CouchDB will have much less control over 
its destiny and even philosophy. This is different from say an encrypted 
messaging app deciding to replace its home-made encryption with an established 
and more robust open-source solution. From day 1, I feel like this project will 
end up in FoundationDB integrating CouchDB rather than the other way around. I 
even suspected that maybe the dev team was no longer interested in CouchDB and 
wanted to find it a new home.

My friendly feedback as a user is that I trust the Apache governance model much more than 
I trust Apple, especially when the welcome meal they have offer me is that features will 
be removed and limitations introduced. The political background and what I would call 
"corporate risk" (key capabilities not implemented by upstream, change in 
priority or vision, difficulty to affect the roadmap of upstream etc...), is also a key 
factor when choosing a DB solution as a user.

If even you guys weren't treated as a priority, I doubt that my feature 
requests and other input will matter even one bit as a user. And I would have 
zero chance of having the expertize required to modify the FDB core myself and 
get my changes approved to make my CouchDB Layer- related request possible. 
While right now I get can get my hands dirty and eventually get something done 
if I really want to. The governance here is very friendly, welcoming and 
inspiring trust.

So to summarize, I feel that to realize the full potential of this vision 
rather than settling on compromises not satisfying anyone, it may be better to 
treat it as a separate project and let CouchDB remain CouchDB. I also feel that 
the project would lose too much control and sovereignty with such a migration, 
especially in light of the facts reported.

The scaling challenges and limitations that motivated this effort may probably 
be addressed differently with a fresh outlook. For example, nowadays, there are 
even application-level middleware libraries like Microsoft Orleans being able 
to coordinate ACID distributed transactions from the application layer. My 
point is, challenges may be able to be overcome overtime by approaching things 
in a creative manner.

Users may be able to workaround some of them by adjusting the topology of their 
clusters (using single writer, huge single node with distributed file systems 
etc...), for other challenges application-layer solutions may exist, or the 
solution may simply be shipping extremely user friendly graphical management 
tools making for example things such as conflict resolution a breeze for the 
admin.

My 2 cents

[1]: https://github.com/FoundationDB/fdb-document-layer



12 mars 2022 10:26:35 Jan Lehnardt <j...@apache.org>:

Thanks Bob for passing this along.

I’m looking forward to renewed interest in the 3.x codebase :)

For our 4.x plans, we’ll have to discuss here what we want to do with it and 
I’m looking at everyone for input here. Even if you’ve never spoken up on this 
list before, I’d lie to hear from you.

* * *

First off, as a project, CouchDB is not obliged to follow IBMs lead and abandon 
the FDB-CouchDB effort. At the same time, it is not obliged to take what they 
leave behind and finish it.

I know for some the 4.x release is highly anticipated and we as a project hoped 
to make a generational jump for our underlying storage and distribution 
technologies. During initial discussions about FDB-Couch and during its 
development, we anticipated certain developments on the FDB side (especially 
allowing longer transactions for consistent _changes responses with their new 
Redwood storage engine). It is my understanding that these developments have 
not materialised in the way we would like them. The consequence is that there 
are certain API guarantees that 3.x CouchDB gives (consistent full-database 
snapshots in _changes) are not possible to build with native FDB features. — I 
can’t speak to the very specifics of this, and I hope we can dig into all this 
together in this thread, but my takeaway from this is that *if* we continue 
with FDB-Couch, I think we will have to reevaluate its compatibility story, as 
we had hoped to make it mainly a seamless (but better) API upgrade from 3.x.

We also learned that operating a FDB cluster is a significant effort that 
somewhat goes against CouchDB’s mostly “just works” nature. We had asked the 
IBM team to share their operational FDB learnings with the CoucHDB project, so 
we can build up community knowledge around this, but this has not materialised 
either.

I’m personally still excited about the opportunities we have with FDB-Couch, 
but as a project, we might have to come up with a more realistic positioning of 
FDB-CouchDB. Less a “new and improved drop-in replacement” and maybe more a “if 
you exceed the scale/capacity of 3.x CouchDB, you can upgrade to FDB-CouchDB at 
the expense of a few API differences and higher operational cost”. This might 
be worth a trade-off for large users of CouchDB and thus it might be worth 
having both of these codebases live alongside each other.

However, that comes with a number of consequences:

- The 3.x/4.x naming doesn’t quite work if these are meant to continue 
alongside each other.

- Maybe FDB-Couch gets its own separate project name and versioning, with a 
clear delineation between them.

- We would have to maintain two projects complete with release management, 
vulnerability management, the lot. At the moment, CouchDB has just about enough 
folks contributing to move forward at a reasonable pace. Doubling that effort 
might be tricky. While we had an influx of contributors recently, this would 
probably need more dedicated planning and outreach.

- New API features would have to be implemented twice, if we want to keep a 
majority API overlap. This is not a fun proposition for folks who add features, 
which is hard enough, but now they have to do it twice, onto two different 
subsystems. Some features (say multi-doc-transactions) would only be possible 
in one of the projects (FDB-Couch), what would our policy be for deliberate API 
feature divergence?

- probably more that elude me at the moment.

While there are non-trivial points among these, they are not impossible tasks 
*if* we find enough and the right folks to carry the work forward.

* * *

For myself, I still see a lot of potential in the 3.x codebase and I’m looking 
forward to renewed roadmap discussions there. I know I have a long list of 
things I’d like to see added.

 From my professional observation, the thing that our (Neighbourhoodie) 
customers tend to run into the most is the scaling limits of the 
database-per-user pattern. We have a proposal for per-doc-authentication that 
helps mitigate a subset of those use-cases, which would be a great help 
overall. I have worked on a draft PR of this over the years, but it mostly 
stalled out during the pandemic. I’m planning to restart work on this shortly. 
If anyone wants to contribute with time and/or money, please do get in touch.

The other major issue with 3.x as reported by IBM is _changes feed rewinds when 
nodes are rotated in and out of clusters. We already fixed a number of changes 
rewind bugs relatively recently. I don’t know if we got them all now, or if 
there are theoretical limits to how far we can take this given our consistency 
model, but it’d be worth spending some time on at least getting rid of all 
rewind-to-zero cases.

* * *

I’m also looking forward to all your input on the discussion here. I’m sure 
this will explode into a lot of detailed discussions quickly, so maybe as a 
guide to come back to when get closer to having to make a decision, here are 
three ways forward that I see:

1. Follow IBM in abandoning FDB-Couch, refocus all effort on Erlang-Couch (3.x).

2. Take FDB-Couch development over fully, come up with a story for how 
FDB-Couch and Erlang-Couch can coexist and when users should choose which one.

3. Hand over the FDB-Couch codebase to an independent team that then can do 
what they like with it (if this materialises from this discussion).

* * *

Best
Jan
—

On 10. Mar 2022, at 17:24, Robert Newson <rnew...@apache.org> wrote:

Hi,

For those that are following closely, and particularly those that build or use 
CouchDB from our git repo, you'll be aware that CouchDB embarked on an attempt 
to build a next-generation version of CouchDB using the FoundationDB database 
engine as its new base.

The principal sponsors of this work, the Cloudant team at IBM, have informed us 
that, unfortunately, they will not be continuing to fund the development of 
this version and are refocusing their efforts on CouchDB 3.x.

Cloudant developers will continue to contribute as they always have done and 
the CouchDB PMC thanks them for their efforts.

As the Project Management Committee for the CouchDB project, we are now asking 
the developer community how we’d like to proceed in light of this new 
information.

Regards,
Robert Newson
Apache CouchDB PMC

Re: Important update on couchdb's foundationdb work

Reply via email to