This is pretty interesting actually. It brings back the good parts of formal schema design.
-- Mike Stolz Principal Engineer, GemFire Product Manager Mobile: 631-835-4771 On Fri, Dec 23, 2016 at 11:09 AM, Bruce Schuchardt <bschucha...@pivotal.io> wrote: > I wonder if it would be helpful to use JSON Schema < > http://json-schema.org/> as a starting point for this effort? > > > > Le 12/22/2016 à 6:45 PM, Udo Kohlmeyer a écrit : > >> Ok, I will try and explain all of this better. >> >> --Udo >> >> >> On 12/22/16 16:42, Darrel Schneider wrote: >> >>> The @refTypeId is hard to understand. It is unclear to me how it >>> interacts >>> with other things like "dataType" and "subType". I think you can either >>> specify a dataType/subType OR a @refTypeId. Is this correct? The current >>> spec makes it look like you can specify both but your example just show >>> one >>> or the other. >>> >>> If so wouldn't it be clearer to just have one of the values of "dataType" >>> or "subType" to be "@nnnnn" where nnnnn is a number referring to an >>> already >>> defined typeId? >>> >>> Saying a field has to be of a specific type is much stronger than pdx >>> currently supports. It only has support for specific basic types and then >>> the generic "Object" type. If you plan on using the existing pdx registry >>> then that type system would need to be expanded to deal with @refTypeId >>> fields. >>> >>> Also the "formatter" field seems like a new feature that is not described >>> in your proposal. You have a comment that says it applies to Dates and >>> Doubles but it seems like your type syntax would allow you to specify it >>> on >>> any field type. >>> >>> On Thu, Dec 22, 2016 at 4:32 PM, Darrel Schneider <dschnei...@pivotal.io >>> > >>> wrote: >>> >>> You proposal seems to be only handle types for JSON. I do not see that >>>> you >>>> support all the pdx field types. >>>> Also you have things like List and "subType" which currently have no >>>> support explicit support in the pdx type system. >>>> >>>> So do you intend this proposal to be specific to JSON? If so the gfsh >>>> and >>>> apis need to make this clear. If not then your proposal should make >>>> sure it >>>> supports all the existing pdx types. >>>> >>>> On Thu, Dec 22, 2016 at 4:27 PM, Darrel Schneider < >>>> dschnei...@pivotal.io> >>>> wrote: >>>> >>>> Something I did not see in your proposal was the rules that would be >>>>> used >>>>> when a JSON document uses "@typeId" to determine if that type is valid >>>>> for >>>>> the current document. >>>>> For example I think you want to allow the type to have a field that >>>>> does >>>>> not exist in the document. >>>>> I think you also want to say that if the document has a field that does >>>>> not exist in the type then an exception is thrown. >>>>> >>>>> You may also have exceptions for when the document's field data can not >>>>> be represented in the type's field. For example the type may say the >>>>> field >>>>> is Boolean but the document may have a String whose value is "foobar". >>>>> Before the field type was derived from the actual value in the >>>>> document but >>>>> now you can have a mismatch. >>>>> >>>>> >>>>> On Thu, Dec 22, 2016 at 4:16 PM, Darrel Schneider < >>>>> dschnei...@pivotal.io> >>>>> wrote: >>>>> >>>>> One danger of this solution is users may think they can modify a >>>>>> previously defined type. Since they specify the type they may think >>>>>> they >>>>>> can just edit the file and reload the types with modified >>>>>> definitions. In >>>>>> most cases if data has already been serialized using the old type then >>>>>> modifying the type will lead to data that can no longer be >>>>>> deserialized. >>>>>> >>>>>> Are you thinking that these new user defined types would be loaded >>>>>> into >>>>>> the PDX registry and remembered? If you later tried to reload the >>>>>> same type >>>>>> and it differs then the reload fails? If so then I think this would >>>>>> keep >>>>>> users from making illegal changes. >>>>>> >>>>>> On Thu, Dec 22, 2016 at 4:11 PM, Darrel Schneider < >>>>>> dschnei...@pivotal.io >>>>>> >>>>>>> wrote: >>>>>>> When generating a pdx type for a JSON document couldn't we sort the >>>>>>> field names from the JSON document so that field order would not >>>>>>> generated >>>>>>> different pdx types? >>>>>>> Also when choosing a pdx field type if we always picked a "wider" >>>>>>> type >>>>>>> then it would reduce the number of types generated because of >>>>>>> different >>>>>>> field types. >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 22, 2016 at 10:02 AM, Udo Kohlmeyer < >>>>>>> ukohlme...@pivotal.io> >>>>>>> wrote: >>>>>>> >>>>>>> Hi there Dan, >>>>>>>> >>>>>>>> You are correct, the thought is there to add a flag to the registry >>>>>>>> to >>>>>>>> indicate that a definition is custom and thus should not conflict >>>>>>>> with the >>>>>>>> existing ids. Even if they types were to be stored with the current >>>>>>>> Pdx >>>>>>>> type definitions, upon loading/registration of the custom type >>>>>>>> definitions, >>>>>>>> any conflict will be reported and the custom set will not be >>>>>>>> registered >>>>>>>> until all issues were addressed. >>>>>>>> >>>>>>>> I also had the opinion of the "if they can provide me a typeId, then >>>>>>>> surely they can provide me with a fully populated JSON document". >>>>>>>> Referencing the example document from the wiki, an user can be >>>>>>>> created with >>>>>>>> just a first and surname. It is not required to provide >>>>>>>> currentAddress, >>>>>>>> previousAddresses, dob,etc... Whilst one could force the client to >>>>>>>> provide >>>>>>>> all fields in the JSON document, it is not always possible nor >>>>>>>> feasible to >>>>>>>> do so. In the POJO world we have a structured data definition and >>>>>>>> the >>>>>>>> generation of a type definition is simple. This done because from a >>>>>>>> serialization perspective we always make sure that all fields are >>>>>>>> serialized. BUT if we were to change the serialization, i.e not >>>>>>>> serialize a >>>>>>>> field because it is null, the type definition behavior would be >>>>>>>> exactly the >>>>>>>> same as JSON. Only, in this case, because we changed the type >>>>>>>> definition >>>>>>>> for the 'com.demo.User' object (at runtime) the deserialization >>>>>>>> step for >>>>>>>> previous versions would fail. >>>>>>>> >>>>>>>> I believe that if we were to be able to describe WHAT the structure >>>>>>>> of >>>>>>>> a JSON document should be and define the type according to that >>>>>>>> definition, >>>>>>>> we could improve performance (as we don't have to determine type >>>>>>>> definitions for every JSON document), be more flexible in consuming >>>>>>>> JSON >>>>>>>> documents that are only partially populated and lastly not >>>>>>>> potentially >>>>>>>> cause a vast amount of JSON-based type definitions to be generated. >>>>>>>> >>>>>>>> In addition to just the JSON benefits, having a formal way of >>>>>>>> describing the type definitions will allow us to better maintain the >>>>>>>> current registered type definitions. In addition to this, it would >>>>>>>> allow >>>>>>>> customers/clients to create type definitions, by hand, if they were >>>>>>>> to have >>>>>>>> lost their type registry. >>>>>>>> >>>>>>>> As final thought, the addition of the external type registration >>>>>>>> process is not meant replace the current behavior. But rather >>>>>>>> enhance its >>>>>>>> capabilities. If no external types will have been defined OR the >>>>>>>> client >>>>>>>> does not provide a '@typeId' tag, the current JSON type definition >>>>>>>> behavior >>>>>>>> will stay the same. >>>>>>>> >>>>>>>> --Udo >>>>>>>> >>>>>>>> >>>>>>>> On 12/21/16 18:20, Dan Smith wrote: >>>>>>>> >>>>>>>> I'm assuming the type ids here are a different set than the type ids >>>>>>>>> used >>>>>>>>> with regular PDX serialization so they won't conflict if the pdx >>>>>>>>> registry >>>>>>>>> assigns 1 to some class and a user puts @typeId: 1 in their json? >>>>>>>>> >>>>>>>>> I'm concerned that this won't really address the type explosion >>>>>>>>> issue. >>>>>>>>> Users that are able to go to the effort of adding these typeIds to >>>>>>>>> all of >>>>>>>>> their json are probably users that can produce consistently >>>>>>>>> formatted >>>>>>>>> json >>>>>>>>> in the first place. Users that have inconsistently formatted json >>>>>>>>> are >>>>>>>>> probably not going to want or be able to add these type ids. >>>>>>>>> >>>>>>>>> It might be better for us to pursue a way to store arbitrary >>>>>>>>> documents that >>>>>>>>> are self describing. Our current approach for json documents is >>>>>>>>> assuming >>>>>>>>> that the documents are all consistently formatted. We are infer a >>>>>>>>> schema >>>>>>>>> for their documents store the field names in the type registry and >>>>>>>>> the >>>>>>>>> field values in the serialized data. If we give people the option >>>>>>>>> to >>>>>>>>> store >>>>>>>>> and query self describing values, then users with inconsistent json >>>>>>>>> could >>>>>>>>> just use that option and pay the extra storage cost. >>>>>>>>> >>>>>>>>> -Dan >>>>>>>>> >>>>>>>>> On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer < >>>>>>>>> ukohlme...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hey there, >>>>>>>>> >>>>>>>>>> I've just completed a new proposal on the wiki for a new mechanism >>>>>>>>>> that >>>>>>>>>> could be used to define a type definition for an object. >>>>>>>>>> https://cwiki.apache.org/confluence/display/GEODE/Custom+ >>>>>>>>>> External+Type+Definition+Proposal+for+JSON >>>>>>>>>> >>>>>>>>>> Primarily the new type definition proposal will hopefully help >>>>>>>>>> with >>>>>>>>>> the >>>>>>>>>> "structuring" of JSON document definitions in a manner that will >>>>>>>>>> allow >>>>>>>>>> users to submit JSON documents for data types without the need to >>>>>>>>>> provide >>>>>>>>>> every field of the whole domain object type. >>>>>>>>>> >>>>>>>>>> Please review and comment as required. >>>>>>>>>> >>>>>>>>>> --Udo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >> >