Hi Hitesh,

There are a few different ways to store self describing data. One way might
be to just store the json string, or convert it to bson, and then enhance
the query engine to handle those formats. Another way might be extend PDX
to support self describing serialized values. We xould add a selfDescribing
boolean flag to RegionService.createPdxInstanceFactory. If that flag is
set, we will not register the PDX type in the type registry but instead
store it as part of the value. The JSONFormatter could set that flag to
true or expose it as an option.

Storing self describing documents is a different approach than Udo's
original proposal. I do agree there is value in being able to store
consistently structured json documents the way we do now to save memory. I
think maybe I would be happier if the original proposal was more of an
external tool or wrapper focused on sanitizing json documents without being
concerned with type ids or a central registry service. I could picture just
having a single sanitize method that takes a json string and a standard json
schema <http://json-schema.org/> and returns a cleaned up json document.
That seems like it would be a lot easier to implement and wouldn't require
the user to add typeIds to their json documents.

I still feel like storing self describing values might serve more users. It
is probably more work than a simple sanitize method like above though.

-Dan


On Tue, Jan 3, 2017 at 4:07 PM, Hitesh Khamesra <hitesh...@yahoo.com.invalid
> wrote:

> >>If we give people the option to store
> and query self describing values, then users with inconsistent json could
> just use that option and pay the extra storage cost.
> Dan, are you saying expose some interface to serialize/de and "query the
> some field in data - getFieldValue(fieldname)" dtata?  Some sort of
> ExternalSerializer with getFieldValue() capability.
>
>
>       From: Dan Smith <dsm...@pivotal.io>
>  To: dev@geode.apache.org
>  Sent: Wednesday, December 21, 2016 6:20 PM
>  Subject: Re: New proposal for type definitons
>
> I'm assuming the type ids here are a different set than the type ids used
> with regular PDX serialization so they won't conflict if the pdx registry
> assigns 1 to some class and a user puts @typeId: 1 in their json?
>
> I'm concerned that this won't really address the type explosion issue.
> Users that are able to go to the effort of adding these typeIds to all of
> their json are probably users that can produce consistently formatted json
> in the first place. Users that have inconsistently formatted json are
> probably not going to want or be able to add these type ids.
>
> It might be better for us to pursue a way to store arbitrary documents that
> are self describing. Our current approach for json documents is assuming
> that the documents are all consistently formatted. We are infer a schema
> for their documents store the field names in the type registry and the
> field values in the serialized data. If we give people the option to store
> and query self describing values, then users with inconsistent json could
> just use that option and pay the extra storage cost.
>
> -Dan
>
> On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com>
> wrote:
>
> > Hey there,
> >
> > I've just completed a new proposal on the wiki for a new mechanism that
> > could be used to define a type definition for an object.
> > https://cwiki.apache.org/confluence/display/GEODE/Custom+
> > External+Type+Definition+Proposal+for+JSON
> >
> > Primarily the new type definition proposal will hopefully help with the
> > "structuring" of JSON document definitions in a manner that will allow
> > users to submit JSON documents for data types without the need to provide
> > every field of the whole domain object type.
> >
> > Please review and comment as required.
> >
> > --Udo
> >
> >
>
>
>
>

Reply via email to