Re: New proposal for type definitons

Michael Stolz Fri, 23 Dec 2016 09:06:06 -0800

This is pretty interesting actually. It brings back the good parts of
formal schema design.


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Fri, Dec 23, 2016 at 11:09 AM, Bruce Schuchardt <bschucha...@pivotal.io>
wrote:

> I wonder if it would be helpful to use JSON Schema <
> http://json-schema.org/> as a starting point for this effort?
>
>
>
> Le 12/22/2016 à 6:45 PM, Udo Kohlmeyer a écrit :
>
>> Ok, I will try and explain all of this better.
>>
>> --Udo
>>
>>
>> On 12/22/16 16:42, Darrel Schneider wrote:
>>
>>> The @refTypeId is hard to understand. It is unclear to me how it
>>> interacts
>>> with other things like "dataType" and "subType". I think you can either
>>> specify a dataType/subType OR a @refTypeId. Is this correct? The current
>>> spec makes it look like you can specify both but your example just show
>>> one
>>> or the other.
>>>
>>> If so wouldn't it be clearer to just have one of the values of "dataType"
>>> or "subType" to be "@nnnnn" where nnnnn is a number referring to an
>>> already
>>> defined typeId?
>>>
>>> Saying a field has to be of a specific type is much stronger than pdx
>>> currently supports. It only has support for specific basic types and then
>>> the generic "Object" type. If you plan on using the existing pdx registry
>>> then that type system would need to be expanded to deal with @refTypeId
>>> fields.
>>>
>>> Also the "formatter" field seems like a new feature that is not described
>>> in your proposal. You have a comment that says it applies to Dates and
>>> Doubles but it seems like your type syntax would allow you to specify it
>>> on
>>> any field type.
>>>
>>> On Thu, Dec 22, 2016 at 4:32 PM, Darrel Schneider <dschnei...@pivotal.io
>>> >
>>> wrote:
>>>
>>> You proposal seems to be only handle types for JSON. I do not see that
>>>> you
>>>> support all the pdx field types.
>>>> Also you have things like List and "subType" which currently have no
>>>> support explicit support in the pdx type system.
>>>>
>>>> So do you intend this proposal to be specific to JSON? If so the gfsh
>>>> and
>>>> apis need to make this clear. If not then your proposal should make
>>>> sure it
>>>> supports all the existing pdx types.
>>>>
>>>> On Thu, Dec 22, 2016 at 4:27 PM, Darrel Schneider <
>>>> dschnei...@pivotal.io>
>>>> wrote:
>>>>
>>>> Something I did not see in your proposal was the rules that would be
>>>>> used
>>>>> when a JSON document uses "@typeId" to determine if that type is valid
>>>>> for
>>>>> the current document.
>>>>> For example I think you want to allow the type to have a field that
>>>>> does
>>>>> not exist in the document.
>>>>> I think you also want to say that if the document has a field that does
>>>>> not exist in the type then an exception is thrown.
>>>>>
>>>>> You may also have exceptions for when the document's field data can not
>>>>> be represented in the type's field. For example the type may say the
>>>>> field
>>>>> is Boolean but the document may have a String whose value is "foobar".
>>>>> Before the field type was derived from the actual value in the
>>>>> document but
>>>>> now you can have a mismatch.
>>>>>
>>>>>
>>>>> On Thu, Dec 22, 2016 at 4:16 PM, Darrel Schneider <
>>>>> dschnei...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>> One danger of this solution is users may think they can modify a
>>>>>> previously defined type. Since they specify the type they may think
>>>>>> they
>>>>>> can just edit the file and reload the types with modified
>>>>>> definitions. In
>>>>>> most cases if data has already been serialized using the old type then
>>>>>> modifying the type will lead to data that can no longer be
>>>>>> deserialized.
>>>>>>
>>>>>> Are you thinking that these new user defined types would be loaded
>>>>>> into
>>>>>> the PDX registry and remembered? If you later tried to reload the
>>>>>> same type
>>>>>> and it differs then the reload fails? If so then I think this would
>>>>>> keep
>>>>>> users from making illegal changes.
>>>>>>
>>>>>> On Thu, Dec 22, 2016 at 4:11 PM, Darrel Schneider <
>>>>>> dschnei...@pivotal.io
>>>>>>
>>>>>>> wrote:
>>>>>>> When generating a pdx type for a JSON document couldn't we sort the
>>>>>>> field names from the JSON document so that field order would not
>>>>>>> generated
>>>>>>> different pdx types?
>>>>>>> Also when choosing a pdx field type if we always picked a "wider"
>>>>>>> type
>>>>>>> then it would reduce the number of types generated because of
>>>>>>> different
>>>>>>> field types.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 22, 2016 at 10:02 AM, Udo Kohlmeyer <
>>>>>>> ukohlme...@pivotal.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi there Dan,
>>>>>>>>
>>>>>>>> You are correct, the thought is there to add a flag to the registry
>>>>>>>> to
>>>>>>>> indicate that a definition is custom and thus should not conflict
>>>>>>>> with the
>>>>>>>> existing ids. Even if they types were to be stored with the current
>>>>>>>> Pdx
>>>>>>>> type definitions, upon loading/registration of the custom type
>>>>>>>> definitions,
>>>>>>>> any conflict will be reported and the custom set will not be
>>>>>>>> registered
>>>>>>>> until all issues were addressed.
>>>>>>>>
>>>>>>>> I also had the opinion of the "if they can provide me a typeId, then
>>>>>>>> surely they can provide me with a fully populated JSON document".
>>>>>>>> Referencing the example document from the wiki, an user can be
>>>>>>>> created with
>>>>>>>> just a first and surname. It is not required to provide
>>>>>>>> currentAddress,
>>>>>>>> previousAddresses, dob,etc... Whilst one could force the client to
>>>>>>>> provide
>>>>>>>> all fields in the JSON document, it is not always possible nor
>>>>>>>> feasible to
>>>>>>>> do so. In the POJO world we have a structured data definition and
>>>>>>>> the
>>>>>>>> generation of a type definition is simple. This done because from a
>>>>>>>> serialization perspective we always make sure that all fields are
>>>>>>>> serialized. BUT if we were to change the serialization, i.e not
>>>>>>>> serialize a
>>>>>>>> field because it is null, the type definition behavior would be
>>>>>>>> exactly the
>>>>>>>> same as JSON. Only, in this case, because we changed the type
>>>>>>>> definition
>>>>>>>> for the 'com.demo.User' object (at runtime) the deserialization
>>>>>>>> step for
>>>>>>>> previous versions would fail.
>>>>>>>>
>>>>>>>> I believe that if we were to be able to describe WHAT the structure
>>>>>>>> of
>>>>>>>> a JSON document should be and define the type according to that
>>>>>>>> definition,
>>>>>>>> we could improve performance (as we don't have to determine type
>>>>>>>> definitions for every JSON document), be more flexible in consuming
>>>>>>>> JSON
>>>>>>>> documents that are only partially populated and lastly not
>>>>>>>> potentially
>>>>>>>> cause a vast amount of JSON-based type definitions to be generated.
>>>>>>>>
>>>>>>>> In addition to just the JSON benefits, having a formal way of
>>>>>>>> describing the type definitions will allow us to better maintain the
>>>>>>>> current registered type definitions. In addition to this, it would
>>>>>>>> allow
>>>>>>>> customers/clients to create type definitions, by hand, if they were
>>>>>>>> to have
>>>>>>>> lost their type registry.
>>>>>>>>
>>>>>>>> As  final thought, the addition of the external type registration
>>>>>>>> process is not meant replace the current behavior. But rather
>>>>>>>> enhance its
>>>>>>>> capabilities. If no external types will have been defined OR the
>>>>>>>> client
>>>>>>>> does not provide a '@typeId' tag, the current JSON type definition
>>>>>>>> behavior
>>>>>>>> will stay the same.
>>>>>>>>
>>>>>>>> --Udo
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/21/16 18:20, Dan Smith wrote:
>>>>>>>>
>>>>>>>> I'm assuming the type ids here are a different set than the type ids
>>>>>>>>> used
>>>>>>>>> with regular PDX serialization so they won't conflict if the pdx
>>>>>>>>> registry
>>>>>>>>> assigns 1 to some class and a user puts @typeId: 1 in their json?
>>>>>>>>>
>>>>>>>>> I'm concerned that this won't really address the type explosion
>>>>>>>>> issue.
>>>>>>>>> Users that are able to go to the effort of adding these typeIds to
>>>>>>>>> all of
>>>>>>>>> their json are probably users that can produce consistently
>>>>>>>>> formatted
>>>>>>>>> json
>>>>>>>>> in the first place. Users that have inconsistently formatted json
>>>>>>>>> are
>>>>>>>>> probably not going to want or be able to add these type ids.
>>>>>>>>>
>>>>>>>>> It might be better for us to pursue a way to store arbitrary
>>>>>>>>> documents that
>>>>>>>>> are self describing. Our current approach for json documents is
>>>>>>>>> assuming
>>>>>>>>> that the documents are all consistently formatted. We are infer a
>>>>>>>>> schema
>>>>>>>>> for their documents store the field names in the type registry and
>>>>>>>>> the
>>>>>>>>> field values in the serialized data. If we give people the option
>>>>>>>>> to
>>>>>>>>> store
>>>>>>>>> and query self describing values, then users with inconsistent json
>>>>>>>>> could
>>>>>>>>> just use that option and pay the extra storage cost.
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>> On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <
>>>>>>>>> ukohlme...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hey there,
>>>>>>>>>
>>>>>>>>>> I've just completed a new proposal on the wiki for a new mechanism
>>>>>>>>>> that
>>>>>>>>>> could be used to define a type definition for an object.
>>>>>>>>>> https://cwiki.apache.org/confluence/display/GEODE/Custom+
>>>>>>>>>> External+Type+Definition+Proposal+for+JSON
>>>>>>>>>>
>>>>>>>>>> Primarily the new type definition proposal will hopefully help
>>>>>>>>>> with
>>>>>>>>>> the
>>>>>>>>>> "structuring" of JSON document definitions in a manner that will
>>>>>>>>>> allow
>>>>>>>>>> users to submit JSON documents for data types without the need to
>>>>>>>>>> provide
>>>>>>>>>> every field of the whole domain object type.
>>>>>>>>>>
>>>>>>>>>> Please review and comment as required.
>>>>>>>>>>
>>>>>>>>>> --Udo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>
>

Re: New proposal for type definitons

Reply via email to