I like this direction. We need to think through hierarchies and containers but I think this would be a very useful abstraction.
-- Mike Stolz Principal Engineer - Gemfire Product Manager Mobile: 631-835-4771 On Jan 3, 2017 8:21 PM, "Jacob Barrett" <jbarr...@pivotal.io> wrote: > I don't know that I would be concerned with optimization of unstructured > data from the start. Given that the data is unstructured it means that it > can be restructured at a later time. You could have a lazy task running on > the server the restructures unstructured data to be more uniform and > compact. > > I also don't think there are many good reasons to try wedge this into PDX. > The only reason I see for wedging this into PDX is to avoid progress on > modularizing and extending Geode. > > If all the where we access fields on a stored object, query, indexing, > etc., where made a bit more generic then any object that supports a simple > getValue(field) like interface could be accessed without deserialization or > specialization. > > Consider: > public interface FieldReadable { > public object getValue(String field); > } > > You could have an implementation that can getValue on PDX, POJO, JSON, > BSON, XML, etc. There is no concern at this level with underlying storage > type or the original unserialized form of the object (if any). > > -Jake > > > > > On Tue, Jan 3, 2017 at 4:46 PM Dan Smith <dsm...@pivotal.io> wrote: > > > Hi Hitesh, > > > > There are a few different ways to store self describing data. One way > might > > be to just store the json string, or convert it to bson, and then enhance > > the query engine to handle those formats. Another way might be extend PDX > > to support self describing serialized values. We xould add a > selfDescribing > > boolean flag to RegionService.createPdxInstanceFactory. If that flag is > > set, we will not register the PDX type in the type registry but instead > > store it as part of the value. The JSONFormatter could set that flag to > > true or expose it as an option. > > > > Storing self describing documents is a different approach than Udo's > > original proposal. I do agree there is value in being able to store > > consistently structured json documents the way we do now to save memory. > I > > think maybe I would be happier if the original proposal was more of an > > external tool or wrapper focused on sanitizing json documents without > being > > concerned with type ids or a central registry service. I could picture > just > > having a single sanitize method that takes a json string and a standard > > json > > schema <http://json-schema.org/> and returns a cleaned up json document. > > That seems like it would be a lot easier to implement and wouldn't > require > > the user to add typeIds to their json documents. > > > > I still feel like storing self describing values might serve more users. > It > > is probably more work than a simple sanitize method like above though. > > > > -Dan > > > > > > On Tue, Jan 3, 2017 at 4:07 PM, Hitesh Khamesra > > <hitesh...@yahoo.com.invalid > > > wrote: > > > > > >>If we give people the option to store > > > and query self describing values, then users with inconsistent json > could > > > just use that option and pay the extra storage cost. > > > Dan, are you saying expose some interface to serialize/de and "query > the > > > some field in data - getFieldValue(fieldname)" dtata? Some sort of > > > ExternalSerializer with getFieldValue() capability. > > > > > > > > > From: Dan Smith <dsm...@pivotal.io> > > > To: dev@geode.apache.org > > > Sent: Wednesday, December 21, 2016 6:20 PM > > > Subject: Re: New proposal for type definitons > > > > > > I'm assuming the type ids here are a different set than the type ids > used > > > with regular PDX serialization so they won't conflict if the pdx > registry > > > assigns 1 to some class and a user puts @typeId: 1 in their json? > > > > > > I'm concerned that this won't really address the type explosion issue. > > > Users that are able to go to the effort of adding these typeIds to all > of > > > their json are probably users that can produce consistently formatted > > json > > > in the first place. Users that have inconsistently formatted json are > > > probably not going to want or be able to add these type ids. > > > > > > It might be better for us to pursue a way to store arbitrary documents > > that > > > are self describing. Our current approach for json documents is > assuming > > > that the documents are all consistently formatted. We are infer a > schema > > > for their documents store the field names in the type registry and the > > > field values in the serialized data. If we give people the option to > > store > > > and query self describing values, then users with inconsistent json > could > > > just use that option and pay the extra storage cost. > > > > > > -Dan > > > > > > On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com> > > > wrote: > > > > > > > Hey there, > > > > > > > > I've just completed a new proposal on the wiki for a new mechanism > that > > > > could be used to define a type definition for an object. > > > > https://cwiki.apache.org/confluence/display/GEODE/Custom+ > > > > External+Type+Definition+Proposal+for+JSON > > > > > > > > Primarily the new type definition proposal will hopefully help with > the > > > > "structuring" of JSON document definitions in a manner that will > allow > > > > users to submit JSON documents for data types without the need to > > provide > > > > every field of the whole domain object type. > > > > > > > > Please review and comment as required. > > > > > > > > --Udo > > > > > > > > > > > > > > > > > > > > > > >