Re: New proposal for type definitons

Udo Kohlmeyer Thu, 22 Dec 2016 18:40:19 -0800

Correct,

The user will not be able to override an existing typeId + typedefinition. It will fail with the error message that an previousdefinition already exists.

I imagine the process would be to remove a typeId (later feature) whichcould check if any data exists for that typeId. If no data exists thenit would remove the type, otherwise the type removal would fail.



On 12/22/16 16:16, Darrel Schneider wrote:

One danger of this solution is users may think they can modify a previously
defined type. Since they specify the type they may think they can just edit
the file and reload the types with modified definitions. In most cases if
data has already been serialized using the old type then modifying the type
will lead to data that can no longer be deserialized.

Are you thinking that these new user defined types would be loaded into the
PDX registry and remembered? If you later tried to reload the same type and
it differs then the reload fails? If so then I think this would keep users
from making illegal changes.

On Thu, Dec 22, 2016 at 4:11 PM, Darrel Schneider <dschnei...@pivotal.io>
wrote:

When generating a pdx type for a JSON document couldn't we sort the field
names from the JSON document so that field order would not generated
different pdx types?
Also when choosing a pdx field type if we always picked a "wider" type
then it would reduce the number of types generated because of different
field types.


On Thu, Dec 22, 2016 at 10:02 AM, Udo Kohlmeyer <ukohlme...@pivotal.io>
wrote:

Hi there Dan,

You are correct, the thought is there to add a flag to the registry to
indicate that a definition is custom and thus should not conflict with the
existing ids. Even if they types were to be stored with the current Pdx
type definitions, upon loading/registration of the custom type definitions,
any conflict will be reported and the custom set will not be registered
until all issues were addressed.

I also had the opinion of the "if they can provide me a typeId, then
surely they can provide me with a fully populated JSON document".
Referencing the example document from the wiki, an user can be created with
just a first and surname. It is not required to provide currentAddress,
previousAddresses, dob,etc... Whilst one could force the client to provide
all fields in the JSON document, it is not always possible nor feasible to
do so. In the POJO world we have a structured data definition and the
generation of a type definition is simple. This done because from a
serialization perspective we always make sure that all fields are
serialized. BUT if we were to change the serialization, i.e not serialize a
field because it is null, the type definition behavior would be exactly the
same as JSON. Only, in this case, because we changed the type definition
for the 'com.demo.User' object (at runtime) the deserialization step for
previous versions would fail.

I believe that if we were to be able to describe WHAT the structure of a
JSON document should be and define the type according to that definition,
we could improve performance (as we don't have to determine type
definitions for every JSON document), be more flexible in consuming JSON
documents that are only partially populated and lastly not potentially
cause a vast amount of JSON-based type definitions to be generated.

In addition to just the JSON benefits, having a formal way of describing
the type definitions will allow us to better maintain the current
registered type definitions. In addition to this, it would allow
customers/clients to create type definitions, by hand, if they were to have
lost their type registry.

As  final thought, the addition of the external type registration process
is not meant replace the current behavior. But rather enhance its
capabilities. If no external types will have been defined OR the client
does not provide a '@typeId' tag, the current JSON type definition behavior
will stay the same.

--Udo


On 12/21/16 18:20, Dan Smith wrote:

I'm assuming the type ids here are a different set than the type ids used
with regular PDX serialization so they won't conflict if the pdx registry
assigns 1 to some class and a user puts @typeId: 1 in their json?

I'm concerned that this won't really address the type explosion issue.
Users that are able to go to the effort of adding these typeIds to all of
their json are probably users that can produce consistently formatted
json
in the first place. Users that have inconsistently formatted json are
probably not going to want or be able to add these type ids.

It might be better for us to pursue a way to store arbitrary documents
that
are self describing. Our current approach for json documents is assuming
that the documents are all consistently formatted. We are infer a schema
for their documents store the field names in the type registry and the
field values in the serialized data. If we give people the option to
store
and query self describing values, then users with inconsistent json could
just use that option and pay the extra storage cost.

-Dan

On Tue, Dec 20, 2016 at 4:53 PM, Udo Kohlmeyer <ukohlme...@gmail.com>
wrote:

Hey there,

I've just completed a new proposal on the wiki for a new mechanism that
could be used to define a type definition for an object.
https://cwiki.apache.org/confluence/display/GEODE/Custom+
External+Type+Definition+Proposal+for+JSON

Primarily the new type definition proposal will hopefully help with the
"structuring" of JSON document definitions in a manner that will allow
users to submit JSON documents for data types without the need to
provide
every field of the whole domain object type.

Please review and comment as required.

--Udo

Re: New proposal for type definitons

Reply via email to