Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Robert Newson Fri, 01 Feb 2019 01:29:54 -0800

Hi,

"rebasing is not just a politically correct way of saying that CouchDb is being 
retired"


Emphatically, no. We see this is an evolution of CouchDB, delivering CouchDB 
1.0 semantics around conflicts and changes feeds but in a way that scales 
better than CouchDB 2.0's approach.

We intend to preserve what makes CouchDB special, which includes being able to 
"drop" documents in without having to declare their format. In my post from 
yesterday I suggested _optional_ schema declarations to improve efficiency and 
to address some of the constraints on doc and field size that might arise based 
on how we plan to map documents into foundationdb key-value entries.

The notion of "schemaless" for CouchDB has never meant that users don't have to 
think about how they map their data into CouchDB documents; it just relieved 
them of the burden of teaching CouchDB about them. That notion will remain.

CouchDB has a long history and a fair few clever ideas at the start are looking 
less relevant today (as you mentioned, couchapps, the _show, _list, _update, 
_rewrite sorts of things), as the ecosystem in which CouchDB lives has been so 
hugely expanded in the last ten years. It is right for the CouchDB project to 
re-evaluate the feature set we present and remove things that are of little 
value or are better done with other technology. That is just basic project 
maintenance, though.

Thank you for raising this concern, you are certainly not adding toxicity. It 
would be toxic if there was no expression of concerns about this change. Please 
continue to follow and contribute to this discussion.

B.

-- 
  Robert Samuel Newson
  [email protected]

On Fri, 1 Feb 2019, at 09:11, Reddy B. wrote:
> By the way, if the FDB migration was to happen, will CouchDb continue to 
> be a schema-less database where we can just drop our documents and map/
> reduce them without further ceremony?
> 
> I mean for the long-term, is there a commitment to keeping this feature? 
> This is a big deal, the basics of CouchDb. I think this is the first 
> assumption you make when you use CouchDb as of today.
> 
> I'm not trying to add toxicity to this very positive, constructive and 
> high quality discussion, but just some humble feedback. As a user, when 
> I see this being questioned, along with the other limitations introduced 
> by FDB I am starting to wonder if rebasing is not just a politically 
> correct way of saying that CouchDb is being retired. For many once core 
> features now become optional extensions to be implemented.
> 
> Which makes me wonder "what's the core" and question the benefit/cost 
> analysis of the switch in light of the current vision of the project. 
> For it's starting to look like FDB may not only be used as an 
> implementation convenience but as a new vision for CouchDb (deprecating 
> the former vision). In light of this the benefit-cost analysis would 
> make sense but such a change in vision has not been publicly announced.
> 
> And this would mean that today's core feature are likely to go the way 
> of Couchapps tomorrow if the vision has indeed changed. This is a very 
> problematic uncertainty as an end-user thinking long-term support for 
> new projects. I totally appreciate that this is dev mailing list where 
> ideas are bounced and technical details worked out, but it's important 
> for us as users to see commitments on vision, thus my question. I also 
> took advantage of this opportunity to voice the more general concern 
> aforementioned.
> 
> But the specific question is: what's the vision for "schema-less" usage 
> of CouchDb.
> 
> Thanks
> 
> 
> 
> ________________________________
> De : Ilya Khlopotov <[email protected]>
> Envoyé : mercredi 30 janvier 2019 22:08
> À : [email protected]
> Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
> 
> > I think I prefer the idea of indexing all document's keys using the same
> > identifier set.  In general I think applications have the behavior that
> > some keys are referenced far more than other keys and giving those keys in
> > each document the same value I think could eventually prove useful for
> > making many features faster and easier than expected.
> 
> This approach would require an invention of schema evolution features 
> similar to recently open sourced Record Layer 
> https://www.foundationdb.org/files/record-layer-paper.pdf
> I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-
> less database):
> - rename fields
> - reuse field names for something else when they update application
> - remove fields
> - have documents of different structure in one database
> 
> > I think regardless of whether the mapping is document local or global, 
> > having
> > FDB return those individual values is faster/easier than having Couch Range
> > fetch the mapping and do the translation work itself.
> in case of global mapping we would do
> - get_schema from different subspace (i.e. contact different nodes)
> - extract all scalar values by issuing FDB's range query (most likely 
> all values are co-located)
> - stitch document together and return it to user
> 
> in case of local mapping we don't need to call get_schema. The schema 
> would be returned by range query.
> 
> We would have to stitch document in either case.
> 
> Can you elaborate if my understanding is not correct (I didn't quite 
> understand the "Couch Range fetch" part of your question)?
> 
> best regards,
> iilyak
> 
> On 2019/01/30 20:11:18, Michael Fair <[email protected]> wrote:
> > On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <[email protected]> wrote:
> >
> > > FoundationDB Records layer uses global schema for JSON documents. They
> > > also have a nice way of creating indexes and schema evolution support.
> > > However this support comes at a cost of extra lookups in different
> > > subspace. With local mapping table we almost (except a corner case) 
> > > certain
> > > that the schema and JSON fields would be collocated on a single node. Due
> > > to common prefix.
> > >
> >
> > In general I think I prefer the global, but separate, key mapping idea and
> > use FDB's "cache the important, frequently accessed data, across
> > distributed memory" features.
> >
> > I think I prefer the idea of indexing all document's keys using the same
> > identifier set.  In general I think applications have the behavior that
> > some keys are referenced far more than other keys and giving those keys in
> > each document the same value I think could eventually prove useful for
> > making many features faster and easier than expected.
> >
> > While I really like the independence and locality of a document local
> > mapping, when I think about the process of transforming a document's keys
> > into that mapping's values, I don't see a particular advantage regarding
> > where in the DB that key mapping came from.  I'm assuming the process will
> > flatten the key paths of the document into an array and then request the
> > value of each key as multiple parallel queries against FDB at once.  I
> > think regardless of whether the mapping is document local or global, having
> > FDB return those individual values is faster/easier than having Couch Range
> > fetch the mapping and do the translation work itself.
> >
> > I could even see some periodic "reorganizing" engine that could renumber
> > frequently used keys to make the reverse transformation back into a value
> > that much faster.
> >
> >
> > > > Personally I wonder if the 10KB limit on field paths is anything more
> > > than a theoretical concern. It’s hard for me to imagine a useful schema
> > > that would get anywhere near that deep, but maybe I’m insufficiently
> > > creative :)
> >
> >
> > +1
> >
> >
> > There’s certainly a storage overhead from repeating the upper portion of a
> > > path over and over again, but that’s also something the storage engine can
> > > optimize away through prefix elision. The current production storage 
> > > engine
> > > in FoundationDB does not do this elision, but the new one in development
> > > does.
> > >
> >
> > Assuming it only does "prefix" and not "segment", then I don't think this
> > will help because the DOCID for each key in JSON_PATH will be different,
> > making the "prefix" to each path across different documents distinct.  The
> > prefix matching engine will only be able to match up to the key element
> > before the DOCID.
> >
> > Does/Could/Would the engine allow an app to use FDB itself to create a
> > mapping identifier for key "segments" or some other method to "skip past"
> > the distinct parts of keys to in a sense "reroot" the search?
> >
> > If FDB was to "bake in" this "key segment mapping" idea as something it
> > exposed to the application layer; that'd be awesome!  Lots of applications
> > could probably make use of that.
> >
> > Mike
> >

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to