I think bson already stores the field names within the serialized data
values, which is indeed more generic but would of course take more space.

These conversations are very interesting, specially considering how many
popular serialization formats exists out there (Parquet, Avro, Protobuf,
etc...) but I'm not sure the serialization itself was the main thing with
Udo's proposal and more the problem that today JSONFormatter + PDXTypes is
the only way to do it and it could cause the "explosion of types" on
unstructured data.

Seems to me that fixing the JSONFormatter to be smarter about it is a quick
path but it would not address the whole picture of making serialization
options modular in Geode which could be it's own new proposal as well.
 Just a thought.

On Tue, Jan 3, 2017 at 7:21 PM, Jacob Barrett <jbarr...@pivotal.io> wrote:

> I don't know that I would be concerned with optimization of unstructured
> data from the start. Given that the data is unstructured it means that it
> can be restructured at a later time. You could have a lazy task running on
> the server the restructures unstructured data to be more uniform and
> compact.
>
> I also don't think there are many good reasons to try wedge this into PDX.
> The only reason I see for wedging this into PDX is to avoid progress on
> modularizing and extending Geode.
>
> If all the where we access fields on a stored object, query, indexing,
> etc., where made a bit more generic then any object that supports a simple
> getValue(field) like interface could be accessed without deserialization or
> specialization.
>
> Consider:
> public interface FieldReadable {
> public object getValue(String field);
> }
>
> You could have an implementation that can getValue on PDX, POJO, JSON,
> BSON, XML, etc. There is no concern at this level with underlying storage
> type or the original unserialized form of the object (if any).
>
> -Jake
>
>
>
>
> On Tue, Jan 3, 2017 at 4:46 PM Dan Smith <dsm...@pivotal.io> wrote:
>
> > Hi Hitesh,
> >
> > There are a few different ways to store self describing data. One way
> might
> > be to just store the json string, or convert it to bson, and then enhance
> > the query engine to handle those formats. Another way might be extend PDX
> > to support self describing serialized values. We xould add a
> selfDescribing
> > boolean flag to RegionService.createPdxInstanceFactory. If that flag is
> > set, we will not register the PDX type in the type registry but instead
> > store it as part of the value. The JSONFormatter could set that flag to
> > true or expose it as an option.
> >
> > Storing self describing documents is a different approach than Udo's
> > original proposal. I do agree there is value in being able to store
> > consistently structured json documents the way we do now to save memory.
> I
> > think maybe I would be happier if the original proposal was more of an
> > external tool or wrapper focused on sanitizing json documents without
> being
> > concerned with type ids or a central registry service. I could picture
> just
> > having a single sanitize method that takes a json string and a standard
> > json
> > schema <http://json-schema.org/> and returns a cleaned up json document.
> > That seems like it would be a lot easier to implement and wouldn't
> require
> > the user to add typeIds to their json documents.
> >
> > I still feel like storing self describing values might serve more users.
> It
> > is probably more work than a simple sanitize method like above though.
> >
> > -Dan
> >
> >
> > On Tue, Jan 3, 2017 at 4:07 PM, Hitesh Khamesra
> > <hitesh...@yahoo.com.invalid
> > > wrote:
> >
> > > >>If we give people the option to store
> > > and query self describing values, then users with inconsistent json
> could
> > > just use that option and pay the extra storage cost.
> > > Dan, are you saying expose some interface to serialize/de and "query
> the
> > > some field in data - getFieldValue(fieldname)" dtata?  Some sort of
> > > ExternalSerializer with getFieldValue() capability.
> > >
> > >
> > >       From: Dan Smith <dsm...@pivotal.io>
> > >  To: dev@geode.apache.org
> > >  Sent: Wednesday, December 21, 2016 6:20 PM
> > >  Subject: Re: New proposal for type definitons
> > >
> > > I'm assuming the type ids here are a different set than the type ids
> used
> > > with regular PDX serialization so they won't conflict if the pdx
> registry
> > > assigns 1 to some class and a user puts @typeId: 1 in their json?
> > >
> > > I'm concerned that this won't really address the type explosion issue.
> > > Users that are able to go to the effort of adding these typeIds to all
> of
> > > their json are probably users that can produce consistently formatted
> > json
> > > in the first place. Users that have inconsistently formatted json are
> > > probably not going to want or be able to add these type ids.
> > >
> > > It might be better for us to pursue a way to store arbitrary documents
> > that
> > > are self describing. Our current approach for json documents is
> assuming
> > > that the documents are all consistently formatted. We are infer a
> schema
> > > for their documents store the field names in the type registry and the
> > > field values in the serialized data. If we give people the option to
> > store
> > > and query self describing values, then users with inconsistent json
> could
> > > just use that option and pay the extra storage cost.
> > >
> > > -Dan
> > >
> > > On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > I've just completed a new proposal on the wiki for a new mechanism
> that
> > > > could be used to define a type definition for an object.
> > > > https://cwiki.apache.org/confluence/display/GEODE/Custom+
> > > > External+Type+Definition+Proposal+for+JSON
> > > >
> > > > Primarily the new type definition proposal will hopefully help with
> the
> > > > "structuring" of JSON document definitions in a manner that will
> allow
> > > > users to submit JSON documents for data types without the need to
> > provide
> > > > every field of the whole domain object type.
> > > >
> > > > Please review and comment as required.
> > > >
> > > > --Udo
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
>



-- 
~/William

Reply via email to