I think the reason we use JSON is because it's easy. I'm not convinced that
90% of JSON data is from JavaScript, unless you have data to back that up
-- a lot of scripting languages use it because it's convenient.

I'm not that familiar with how we handle JSON at present, but it's worth
revisiting our assumptions about how we handle JSON (and how we want to
apply similar assumtions to potential new data types in the future):
* Do we deserialize eagerly (as data comes in) or lazily (when we access
the data)? Do we store the deserialized version after the first lazy access?
* If a type annotation (or region constraint) doesn't match the JSON blob,
how do we proceed? We could:
  - throw an error
  - store the JSON with the object in the hopes that we'll be able to
handle it in the future (PDX does this with old versions of types)
  - make a best effort and carry on
  - silently fail
  - something else?

Perhaps we want a configuration option for strongly/weakly typed regions,
or regions with strict / lax validation?

It does seem like it could be useful to have Geode have a data conversion
service, rather than making clients do all the work to fit a particular
ingest format, and having a data conversion service would make it easier to
add support for new serialization types in the future.

-Galen


On Wed, Jan 25, 2017 at 11:00 AM, Jacob Barrett <jbarr...@pivotal.io> wrote:

> So what you are trying to do is to define types for formats that don't have
> rich types (JSON, CSV, XML without Schema/DTD) and rigid structure for
> formats without rigid structure structure (JSON is unordered, CSV has no
> hierarchy, XML may not be strongly ordered) to map into a format that has
> rich types and structure (PDX).
>
> I worked through this problem some with the Greenplum database
> synchronization work since the parallel transport mechanism is CSV. I
> didn't really deal with ordering and forced order to match the query order
> but I did have to tackle type conversion. I had a set of default
> conversions from text to PDX but I had the added benefit of knowing the
> schema of the fields in the database, so mapping rules were easier. What I
> mean is that if a read text that was a whole number I didn't have to guess
> if the field was nullable, long, short, etc. since that was defined in the
> database. I did however provide some overrides for those fields to coerce
> them into specific types, like DECIMAL to long rather than BigDecimal, etc.
>
> I think what is really need, less than the primitive type converters is
> really a way to describe the schema on both ends of the conversation. Both
> JSON an CSV lack any form of formal schema. So then I ask myself does any
> formal schema for these make sense because then we form on the end user the
> need to use something special when talking to Geode to formally describe
> their data. I feel it makes more sense to pick a strict "standard" we will
> follow and let the end user convert around that if they need something
> different.
>
> The best example is time is JSON. JSON does not describe a time type. There
> are some common standards, like ISO 8601 or long count from Unix epoch UTC.
> Personally I would pick the epoch solution since conversions are a lot
> easier from that to other representations than parsing text again from ISO
> 8601. Providing plugins for custom conversions strikes me as good way to
> screw up something as complicated as date. So if our REST api presented the
> date of one object in format X and the other in format Y but lacks any
> formal exposed schema how is the consumer of the REST service to determine
> the format rules and conversation to their local time format.
>
> The other types in JSON don't concern me since 9/10 you are talking
> JavaScript which doesn't support more types than JSON anyway, so number and
> strings are good enough. When converting that from JSON to PDX the formal
> schema of the PDX should be sufficient. The trick there may be getting the
> formal schema of the PDX if no type has been defined yet. So what I would
> argue for is a more configurable way to define PDX types that doesn't
> require the implementation of a POJO first. In your "deployment" you could
> use some XML or other configuration to formal describe a complex PDX
> object.
>
> -Jake
>
>
> On Wed, Jan 25, 2017 at 10:34 AM Udo Kohlmeyer <ukohlme...@pivotal.io>
> wrote:
>
> > The thought was to have a framework that could convert any incoming
> > format, provided you have a converter for it.
> >
> > Be it JSON,XML,CSV... maybe even eventually POJO -> PDX...
> >
> > Yes, the starting point is humble... but it can be grown to be a service
> > that will convert data formats...
> >
> >
> > On 1/25/17 10:29, Jacob Barrett wrote:
> > > Does JAXB/JAX-RS not provide what yo are looking for to define JSON to
> > > Object mapping?
> > >
> > >
> > > On Wed, Jan 25, 2017 at 7:59 AM Udo Kohlmeyer <u...@apache.org> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I'm currently working on a proposal on a Data Conversion Service. This
> > >> service would primarily replace the ailing JSONFormatter, with the
> > >> ability to provide some rules around how fields are converted (String
> ->
> > >> Date, String -> Numeric, different Date formats). This of course could
> > >> be extended to not only JSON but any format or any type that we have
> > >> converters for.
> > >>
> > >> As I'm working through this process it was brought to my attention
> that
> > >> Spring also had a great converter and formatter service, which has
> many
> > >> more miles of proven capability under the belt, than what a custom
> > >> written Data conversion service would bring.
> > >>
> > >> Pros:
> > >>
> > >>    * Already written and proven framework
> > >>    * Tapping into knowledge of Spring users to write custom data
> > >>      converters if the default converters don't match their needs
> > >>    * Dependent on a framework that is actively being worked on, thus
> > less
> > >>      chance of "stale" frameworks/libs
> > >>
> > >> Cons:
> > >>
> > >>    * Write and maintain data conversion framework
> > >>    * Potentially having to deal with users and their Spring version
> > >> conflicts
> > >>    * Core dependency on another framework outside of Geode
> > >>
> > >> Thoughts?!?
> > >>
> > >> --Udo
> > >>
> > >>
> >
> >
>

Reply via email to