Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Jan Lehnardt Fri, 01 Feb 2019 01:29:01 -0800

Heya Reddy,

totally a valid question: JSON documents with revisions, secondary indexing, 
multi-server sync, PouchDB support the lot, all of this is not under 
discussion, even long term.


Any introduction of schema awareness on the storage level would be incidental 
to the schema-less nature of CouchDB documents.

For the short and long, we are committed to keeping existing apps compatible as 
much as possible, while enabling new use-cases going forward.

CouchDB as you know it is not being retired.

Best
Jan
—

> On 1. Feb 2019, at 10:11, Reddy B. <redd...@live.fr> wrote:
> 
> By the way, if the FDB migration was to happen, will CouchDb continue to be a 
> schema-less database where we can just drop our documents and map/reduce them 
> without further ceremony?
> 
> I mean for the long-term, is there a commitment to keeping this feature? This 
> is a big deal, the basics of CouchDb. I think this is the first assumption 
> you make when you use CouchDb as of today.
> 
> I'm not trying to add toxicity to this very positive, constructive and high 
> quality discussion, but just some humble feedback. As a user, when I see this 
> being questioned, along with the other limitations introduced by FDB I am 
> starting to wonder if rebasing is not just a politically correct way of 
> saying that CouchDb is being retired. For many once core features now become 
> optional extensions to be implemented.
> 
> Which makes me wonder "what's the core" and question the benefit/cost 
> analysis of the switch in light of the current vision of the project. For 
> it's starting to look like FDB may not only be used as an implementation 
> convenience but as a new vision for CouchDb (deprecating the former vision). 
> In light of this the benefit-cost analysis would make sense but such a change 
> in vision has not been publicly announced.
> 
> And this would mean that today's core feature are likely to go the way of 
> Couchapps tomorrow if the vision has indeed changed. This is a very 
> problematic uncertainty as an end-user thinking long-term support for new 
> projects. I totally appreciate that this is dev mailing list where ideas are 
> bounced and technical details worked out, but it's important for us as users 
> to see commitments on vision, thus my question. I also took advantage of this 
> opportunity to voice the more general concern aforementioned.
> 
> But the specific question is: what's the vision for "schema-less" usage of 
> CouchDb.
> 
> Thanks
> 
> 
> 
> ________________________________
> De : Ilya Khlopotov <iil...@apache.org>
> Envoyé : mercredi 30 janvier 2019 22:08
> À : dev@couchdb.apache.org
> Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON 
> documents
> 
>> I think I prefer the idea of indexing all document's keys using the same
>> identifier set.  In general I think applications have the behavior that
>> some keys are referenced far more than other keys and giving those keys in
>> each document the same value I think could eventually prove useful for
>> making many features faster and easier than expected.
> 
> This approach would require an invention of schema evolution features similar 
> to recently open sourced Record Layer 
> https://www.foundationdb.org/files/record-layer-paper.pdf
> I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
> database):
> - rename fields
> - reuse field names for something else when they update application
> - remove fields
> - have documents of different structure in one database
> 
>> I think regardless of whether the mapping is document local or global, having
>> FDB return those individual values is faster/easier than having Couch Range
>> fetch the mapping and do the translation work itself.
> in case of global mapping we would do
> - get_schema from different subspace (i.e. contact different nodes)
> - extract all scalar values by issuing FDB's range query (most likely all 
> values are co-located)
> - stitch document together and return it to user
> 
> in case of local mapping we don't need to call get_schema. The schema would 
> be returned by range query.
> 
> We would have to stitch document in either case.
> 
> Can you elaborate if my understanding is not correct (I didn't quite 
> understand the "Couch Range fetch" part of your question)?
> 
> best regards,
> iilyak
> 
> On 2019/01/30 20:11:18, Michael Fair <mich...@daclubhouse.net> wrote:
>> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <iil...@apache.org> wrote:
>> 
>>> FoundationDB Records layer uses global schema for JSON documents. They
>>> also have a nice way of creating indexes and schema evolution support.
>>> However this support comes at a cost of extra lookups in different
>>> subspace. With local mapping table we almost (except a corner case) certain
>>> that the schema and JSON fields would be collocated on a single node. Due
>>> to common prefix.
>>> 
>> 
>> In general I think I prefer the global, but separate, key mapping idea and
>> use FDB's "cache the important, frequently accessed data, across
>> distributed memory" features.
>> 
>> I think I prefer the idea of indexing all document's keys using the same
>> identifier set.  In general I think applications have the behavior that
>> some keys are referenced far more than other keys and giving those keys in
>> each document the same value I think could eventually prove useful for
>> making many features faster and easier than expected.
>> 
>> While I really like the independence and locality of a document local
>> mapping, when I think about the process of transforming a document's keys
>> into that mapping's values, I don't see a particular advantage regarding
>> where in the DB that key mapping came from.  I'm assuming the process will
>> flatten the key paths of the document into an array and then request the
>> value of each key as multiple parallel queries against FDB at once.  I
>> think regardless of whether the mapping is document local or global, having
>> FDB return those individual values is faster/easier than having Couch Range
>> fetch the mapping and do the translation work itself.
>> 
>> I could even see some periodic "reorganizing" engine that could renumber
>> frequently used keys to make the reverse transformation back into a value
>> that much faster.
>> 
>> 
>>>> Personally I wonder if the 10KB limit on field paths is anything more
>>> than a theoretical concern. It’s hard for me to imagine a useful schema
>>> that would get anywhere near that deep, but maybe I’m insufficiently
>>> creative :)
>> 
>> 
>> +1
>> 
>> 
>> There’s certainly a storage overhead from repeating the upper portion of a
>>> path over and over again, but that’s also something the storage engine can
>>> optimize away through prefix elision. The current production storage engine
>>> in FoundationDB does not do this elision, but the new one in development
>>> does.
>>> 
>> 
>> Assuming it only does "prefix" and not "segment", then I don't think this
>> will help because the DOCID for each key in JSON_PATH will be different,
>> making the "prefix" to each path across different documents distinct.  The
>> prefix matching engine will only be able to match up to the key element
>> before the DOCID.
>> 
>> Does/Could/Would the engine allow an app to use FDB itself to create a
>> mapping identifier for key "segments" or some other method to "skip past"
>> the distinct parts of keys to in a sense "reroot" the search?
>> 
>> If FDB was to "bake in" this "key segment mapping" idea as something it
>> exposed to the application layer; that'd be awesome!  Lots of applications
>> could probably make use of that.
>> 
>> Mike
>> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to