I like this direction.
We need to think through hierarchies and containers but I think this would
be a very useful abstraction.

--
Mike Stolz
Principal Engineer - Gemfire Product Manager
Mobile: 631-835-4771
On Jan 3, 2017 8:21 PM, "Jacob Barrett" <jbarr...@pivotal.io> wrote:

> I don't know that I would be concerned with optimization of unstructured
> data from the start. Given that the data is unstructured it means that it
> can be restructured at a later time. You could have a lazy task running on
> the server the restructures unstructured data to be more uniform and
> compact.
>
> I also don't think there are many good reasons to try wedge this into PDX.
> The only reason I see for wedging this into PDX is to avoid progress on
> modularizing and extending Geode.
>
> If all the where we access fields on a stored object, query, indexing,
> etc., where made a bit more generic then any object that supports a simple
> getValue(field) like interface could be accessed without deserialization or
> specialization.
>
> Consider:
> public interface FieldReadable {
> public object getValue(String field);
> }
>
> You could have an implementation that can getValue on PDX, POJO, JSON,
> BSON, XML, etc. There is no concern at this level with underlying storage
> type or the original unserialized form of the object (if any).
>
> -Jake
>
>
>
>
> On Tue, Jan 3, 2017 at 4:46 PM Dan Smith <dsm...@pivotal.io> wrote:
>
> > Hi Hitesh,
> >
> > There are a few different ways to store self describing data. One way
> might
> > be to just store the json string, or convert it to bson, and then enhance
> > the query engine to handle those formats. Another way might be extend PDX
> > to support self describing serialized values. We xould add a
> selfDescribing
> > boolean flag to RegionService.createPdxInstanceFactory. If that flag is
> > set, we will not register the PDX type in the type registry but instead
> > store it as part of the value. The JSONFormatter could set that flag to
> > true or expose it as an option.
> >
> > Storing self describing documents is a different approach than Udo's
> > original proposal. I do agree there is value in being able to store
> > consistently structured json documents the way we do now to save memory.
> I
> > think maybe I would be happier if the original proposal was more of an
> > external tool or wrapper focused on sanitizing json documents without
> being
> > concerned with type ids or a central registry service. I could picture
> just
> > having a single sanitize method that takes a json string and a standard
> > json
> > schema <http://json-schema.org/> and returns a cleaned up json document.
> > That seems like it would be a lot easier to implement and wouldn't
> require
> > the user to add typeIds to their json documents.
> >
> > I still feel like storing self describing values might serve more users.
> It
> > is probably more work than a simple sanitize method like above though.
> >
> > -Dan
> >
> >
> > On Tue, Jan 3, 2017 at 4:07 PM, Hitesh Khamesra
> > <hitesh...@yahoo.com.invalid
> > > wrote:
> >
> > > >>If we give people the option to store
> > > and query self describing values, then users with inconsistent json
> could
> > > just use that option and pay the extra storage cost.
> > > Dan, are you saying expose some interface to serialize/de and "query
> the
> > > some field in data - getFieldValue(fieldname)" dtata?  Some sort of
> > > ExternalSerializer with getFieldValue() capability.
> > >
> > >
> > >       From: Dan Smith <dsm...@pivotal.io>
> > >  To: dev@geode.apache.org
> > >  Sent: Wednesday, December 21, 2016 6:20 PM
> > >  Subject: Re: New proposal for type definitons
> > >
> > > I'm assuming the type ids here are a different set than the type ids
> used
> > > with regular PDX serialization so they won't conflict if the pdx
> registry
> > > assigns 1 to some class and a user puts @typeId: 1 in their json?
> > >
> > > I'm concerned that this won't really address the type explosion issue.
> > > Users that are able to go to the effort of adding these typeIds to all
> of
> > > their json are probably users that can produce consistently formatted
> > json
> > > in the first place. Users that have inconsistently formatted json are
> > > probably not going to want or be able to add these type ids.
> > >
> > > It might be better for us to pursue a way to store arbitrary documents
> > that
> > > are self describing. Our current approach for json documents is
> assuming
> > > that the documents are all consistently formatted. We are infer a
> schema
> > > for their documents store the field names in the type registry and the
> > > field values in the serialized data. If we give people the option to
> > store
> > > and query self describing values, then users with inconsistent json
> could
> > > just use that option and pay the extra storage cost.
> > >
> > > -Dan
> > >
> > > On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > I've just completed a new proposal on the wiki for a new mechanism
> that
> > > > could be used to define a type definition for an object.
> > > > https://cwiki.apache.org/confluence/display/GEODE/Custom+
> > > > External+Type+Definition+Proposal+for+JSON
> > > >
> > > > Primarily the new type definition proposal will hopefully help with
> the
> > > > "structuring" of JSON document definitions in a manner that will
> allow
> > > > users to submit JSON documents for data types without the need to
> > provide
> > > > every field of the whole domain object type.
> > > >
> > > > Please review and comment as required.
> > > >
> > > > --Udo
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to