Hi Dan: If data is self-describing, then it is not necessary to map that data to pdx. This translation will just add an unnecessary layer for mapping. The primary purpose of Pdx format is that it can read fieldValue without de-serializing the data. And we just need to expose some api to "get FieldValue" and let user implement this interface for any format( As you pointed out it can be any format like bson, google-protobuf etc.) For JSONFormattor, it maps JSON data to pdx and it works very well with geode. But there are some issues with pdx typeid generation which we need to tackle separately. Here are some issues
1. One String field can generate three typeids. As it can have three values(fieldExist, null-value, fieldNotExist). So if one JSON document has 10 fields, then theoretically it can generate 1000 pdx typeids. We are now planning to fix this issue. 2. We create pdx type id for each integer value by its range(byte, short, int). This can help to reduce the number of pdx typeids. Btw this creates a problem with query Engine as well, as query engine cares about type. 3. Field ordering in JSON document can create different pdx typeids. Possibly we can solve this issue by sorting the fields. Thanks.Hitesh. From: Dan Smith <dsm...@pivotal.io> To: dev@geode.apache.org; Hitesh Khamesra <hitesh...@yahoo.com> Sent: Tuesday, January 3, 2017 4:46 PM Subject: Re: New proposal for type definitons Hi Hitesh, There are a few different ways to store self describing data. One way might be to just store the json string, or convert it to bson, and then enhance the query engine to handle those formats. Another way might be extend PDX to support self describing serialized values. We xould add a selfDescribing boolean flag to RegionService.createPdxInstanceFactory. If that flag is set, we will not register the PDX type in the type registry but instead store it as part of the value. The JSONFormatter could set that flag to true or expose it as an option. Storing self describing documents is a different approach than Udo's original proposal. I do agree there is value in being able to store consistently structured json documents the way we do now to save memory. I think maybe I would be happier if the original proposal was more of an external tool or wrapper focused on sanitizing json documents without being concerned with type ids or a central registry service. I could picture just having a single sanitize method that takes a json string and a standard json schema and returns a cleaned up json document. That seems like it would be a lot easier to implement and wouldn't require the user to add typeIds to their json documents. I still feel like storing self describing values might serve more users. It is probably more work than a simple sanitize method like above though. -Dan On Tue, Jan 3, 2017 at 4:07 PM, Hitesh Khamesra <hitesh...@yahoo.com.invalid> wrote: >>If we give people the option to store and query self describing values, then users with inconsistent json could just use that option and pay the extra storage cost. Dan, are you saying expose some interface to serialize/de and "query the some field in data - getFieldValue(fieldname)" dtata? Some sort of ExternalSerializer with getFieldValue() capability. From: Dan Smith <dsm...@pivotal.io> To: dev@geode.apache.org Sent: Wednesday, December 21, 2016 6:20 PM Subject: Re: New proposal for type definitons I'm assuming the type ids here are a different set than the type ids used with regular PDX serialization so they won't conflict if the pdx registry assigns 1 to some class and a user puts @typeId: 1 in their json? I'm concerned that this won't really address the type explosion issue. Users that are able to go to the effort of adding these typeIds to all of their json are probably users that can produce consistently formatted json in the first place. Users that have inconsistently formatted json are probably not going to want or be able to add these type ids. It might be better for us to pursue a way to store arbitrary documents that are self describing. Our current approach for json documents is assuming that the documents are all consistently formatted. We are infer a schema for their documents store the field names in the type registry and the field values in the serialized data. If we give people the option to store and query self describing values, then users with inconsistent json could just use that option and pay the extra storage cost. -Dan On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com> wrote: > Hey there, > > I've just completed a new proposal on the wiki for a new mechanism that > could be used to define a type definition for an object. > https://cwiki.apache.org/ confluence/display/GEODE/ Custom+ > External+Type+Definition+ Proposal+for+JSON > > Primarily the new type definition proposal will hopefully help with the > "structuring" of JSON document definitions in a manner that will allow > users to submit JSON documents for data types without the need to provide > every field of the whole domain object type. > > Please review and comment as required. > > --Udo > >