Re: New proposal for type definitons

Bruce Schuchardt Fri, 23 Dec 2016 08:10:16 -0800

I wonder if it would be helpful to use JSON Schema<http://json-schema.org/> as a starting point for this effort?


Le 12/22/2016 à 6:45 PM, Udo Kohlmeyer a écrit :

Ok, I will try and explain all of this better.

--Udo


On 12/22/16 16:42, Darrel Schneider wrote:
The @refTypeId is hard to understand. It is unclear to me how itinteracts
with other things like "dataType" and "subType". I think you can either
specify a dataType/subType OR a @refTypeId. Is this correct? The current
spec makes it look like you can specify both but your example justshow one
or the other.
If so wouldn't it be clearer to just have one of the values of"dataType"or "subType" to be "@nnnnn" where nnnnn is a number referring to analready
defined typeId?

Saying a field has to be of a specific type is much stronger than pdx
currently supports. It only has support for specific basic types andthenthe generic "Object" type. If you plan on using the existing pdxregistry
then that type system would need to be expanded to deal with @refTypeId
fields.
Also the "formatter" field seems like a new feature that is notdescribed
in your proposal. You have a comment that says it applies to Dates and
Doubles but it seems like your type syntax would allow you to specifyit on
any field type.
On Thu, Dec 22, 2016 at 4:32 PM, Darrel Schneider<dschnei...@pivotal.io>
wrote:
You proposal seems to be only handle types for JSON. I do not seethat you
support all the pdx field types.
Also you have things like List and "subType" which currently have no
support explicit support in the pdx type system.
So do you intend this proposal to be specific to JSON? If so thegfsh andapis need to make this clear. If not then your proposal should makesure it
supports all the existing pdx types.
On Thu, Dec 22, 2016 at 4:27 PM, Darrel Schneider<dschnei...@pivotal.io>
wrote:
Something I did not see in your proposal was the rules that wouldbe usedwhen a JSON document uses "@typeId" to determine if that type isvalid for
the current document.
For example I think you want to allow the type to have a field thatdoes
not exist in the document.
I think you also want to say that if the document has a field thatdoes
not exist in the type then an exception is thrown.
You may also have exceptions for when the document's field data cannotbe represented in the type's field. For example the type may saythe field
is Boolean but the document may have a String whose value is "foobar".
Before the field type was derived from the actual value in thedocument but
now you can have a mismatch.
On Thu, Dec 22, 2016 at 4:16 PM, Darrel Schneider<dschnei...@pivotal.io>
wrote:
One danger of this solution is users may think they can modify a
previously defined type. Since they specify the type they maythink theycan just edit the file and reload the types with modifieddefinitions. Inmost cases if data has already been serialized using the old typethenmodifying the type will lead to data that can no longer bedeserialized.
Are you thinking that these new user defined types would be loadedintothe PDX registry and remembered? If you later tried to reload thesame typeand it differs then the reload fails? If so then I think thiswould keep
users from making illegal changes.
On Thu, Dec 22, 2016 at 4:11 PM, Darrel Schneider<dschnei...@pivotal.io
wrote:
When generating a pdx type for a JSON document couldn't we sort the
field names from the JSON document so that field order would notgenerated
different pdx types?
Also when choosing a pdx field type if we always picked a "wider"typethen it would reduce the number of types generated because ofdifferent
field types.
On Thu, Dec 22, 2016 at 10:02 AM, Udo Kohlmeyer<ukohlme...@pivotal.io>
wrote:
Hi there Dan,
You are correct, the thought is there to add a flag to theregistry toindicate that a definition is custom and thus should notconflict with theexisting ids. Even if they types were to be stored with thecurrent Pdxtype definitions, upon loading/registration of the custom typedefinitions,any conflict will be reported and the custom set will not beregistered
until all issues were addressed.
I also had the opinion of the "if they can provide me a typeId,then
surely they can provide me with a fully populated JSON document".
Referencing the example document from the wiki, an user can becreated withjust a first and surname. It is not required to providecurrentAddress,previousAddresses, dob,etc... Whilst one could force the clientto provideall fields in the JSON document, it is not always possible norfeasible todo so. In the POJO world we have a structured data definitionand the
generation of a type definition is simple. This done because from a
serialization perspective we always make sure that all fields are
serialized. BUT if we were to change the serialization, i.e notserialize afield because it is null, the type definition behavior would beexactly thesame as JSON. Only, in this case, because we changed the typedefinitionfor the 'com.demo.User' object (at runtime) the deserializationstep for
previous versions would fail.
I believe that if we were to be able to describe WHAT thestructure ofa JSON document should be and define the type according to thatdefinition,
we could improve performance (as we don't have to determine type
definitions for every JSON document), be more flexible inconsuming JSONdocuments that are only partially populated and lastly notpotentially
cause a vast amount of JSON-based type definitions to be generated.

In addition to just the JSON benefits, having a formal way of
describing the type definitions will allow us to better maintainthecurrent registered type definitions. In addition to this, itwould allowcustomers/clients to create type definitions, by hand, if theywere to have
lost their type registry.

As  final thought, the addition of the external type registration
process is not meant replace the current behavior. But ratherenhance itscapabilities. If no external types will have been defined OR theclientdoes not provide a '@typeId' tag, the current JSON typedefinition behavior
will stay the same.

--Udo


On 12/21/16 18:20, Dan Smith wrote:
I'm assuming the type ids here are a different set than thetype ids
used
with regular PDX serialization so they won't conflict if the pdx
registry
assigns 1 to some class and a user puts @typeId: 1 in their json?
I'm concerned that this won't really address the type explosionissue.
Users that are able to go to the effort of adding these typeIds to
all of
their json are probably users that can produce consistentlyformatted
json
in the first place. Users that have inconsistently formattedjson are
probably not going to want or be able to add these type ids.

It might be better for us to pursue a way to store arbitrary
documents that
are self describing. Our current approach for json documents is
assuming
that the documents are all consistently formatted. We are infer a
schema
for their documents store the field names in the type registryand thefield values in the serialized data. If we give people theoption to
store
and query self describing values, then users with inconsistentjson
could
just use that option and pay the extra storage cost.

-Dan
On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer<ukohlme...@gmail.com>
wrote:

Hey there,
I've just completed a new proposal on the wiki for a newmechanism
that
could be used to define a type definition for an object.
https://cwiki.apache.org/confluence/display/GEODE/Custom+
External+Type+Definition+Proposal+for+JSON
Primarily the new type definition proposal will hopefully helpwith
the
"structuring" of JSON document definitions in a manner that will
allow
users to submit JSON documents for data types without the need to
provide
every field of the whole domain object type.

Please review and comment as required.

--Udo

Re: New proposal for type definitons

Reply via email to