I think that one of the questions is if we want to map the atomic ADM types to a single value in clean JSON of if we want them to map to structured types (objects, arrays).

My first reaction would have been to use the structured types for lossless JSON and only single values for clean JSON. If you look e.g. at the datetime types, those would also naturally lend themselves to a structured representation, but we seem to agree that the XMLSchema string version is a good choice. Staying consistent with that choice, I would try to find a simple string representation for the spatial types - ideally one that is easily parsed and widely accepted :) Looking for something that might be accepted I stumbled upon https://en.wikipedia.org/wiki/Well-known_text , but I'm actually not sure if that's a good fit (even if it seems to be supported by a number of DBMS ...).

Thoughts?

Thanks,
Till

On 5 Aug 2015, at 1:30, Chris Hillery wrote:

Sure, I think that shouldn't be too hard, given some help with the
questions I raised.

To start the discussion, I wrote a query that outputs all ADM types to show how they are serialized to JSON (except the interval types, which throw a NotImplementedException if you try to serialize them to JSON currently) :

{ "string": string("Nancy"),
"float": 32.5f,
"double" : double("-2013.5938237483274"),
"boolean" : true,
"int8": int8("125"),
"int16": int16("32765"),
"int32": int32("294967295"),
"int64": int64("1700000000000000000"),
"unorderedList": {{"reading","writing"}},
"orderedList": ["Brad","Scott"],
"record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
View" },
"date": date("-2011-01-27"),
"time": time("12:20:30Z"),
"datetime": datetime("-1951-12-27T12:20:30"),
"duration": duration("P10Y11M12DT10H50M30S"),
"location2d": point("41.00,44.00"),
"location3d": point3d("44.00,13.00,41.00"),
"line" : line("10.1,11.1 10.2,11.2"),
"rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
"polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
"circle" : circle("10.1,11.1 10.2"),
"binary" : hex("ABCDEF0123456789"),
"uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
}

And here is how that gets serialized in "lossless JSON":

{ "string": "Nancy",
"float": 32.5,
"double": -2013.5938237483274,
"boolean": true,
"int8": { "int8": 125 },
"int16": { "int16": 32765 },
"int32": { "int32": 294967295 },
"int64": { "int64": 1700000000000000000 },
"unorderedList": { "unorderedlist": [ "reading", "writing" ] },
"orderedList": { "orderedlist": [ "Brad", "Scott" ] },
"record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
"Mountain View" },
"date": { "date": -125625945600000},
"time": { "time": 44430000},
"datetime": { "datetime": -123703587570000},
"duration": { "duration": { "months": 131, "millis": 1075830000} },
"location2d": { "point": [41.0, 44.0] },
"location3d": { "point3d": [44.0, 13.0, 41.0] },
"line": { "line": [ { "point": [10.1, 11.1] }, { "point": [10.2, 11.2] }
] },
"rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point": [87.6,
15.6548] } ] },
"polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
},{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
"circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
"binary": hex("ABCDEF0123456789"),
"uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
}

Some observations and proposals:

1. The "JSON" serialization of the hex() and uuid() types are still broken
(not even valid JSON).

2. IMHO the string, float, double, boolean, and record types are already
serialized the way you would want in "clean JSON".

3. IMHO orderedList and unorderedList should be serialized as simple JSON
arrays in "clean JSON".

4. The serializations of date, time, datetime, and duration, while valid JSON, are not very useful. It would be better if they were serialized as canonical date, time, or dateTime forms from XML Schema. In "clean JSON" they would be serialized simply as strings with that value. In "lossless JSON" they would be serialized as records as shown here, but with a string
value, eg. { "date" : "-2011-01-27" }.

5. The serializations of int8/int16/int32/int64 should be serialized as
straight JSON numbers in "clean JSON".

6. Interval types should be supported. I am open to suggestions as to how
best to represent them in both "clean JSON" and "lossless JSON".

7. I'm really not sure what the best serialization of the spatial types
would be in "clean JSON", but as a strawman, how about serializing all
points as simple arrays of JSON numbers? Then line, rectangle, and polygon
could either be an array of arrays, or else objects with names like
"start"/"end" for line and rectangle and "point1", "point2", etc. for
polygon. Circle, I think, should always be an object with the names
"center" and "radius". So, in "clean JSON", the last few lines of the above
query results would look like this:

"location2d" : [41.0, 44.0],
"location3d" : [44.0, 13.0, 41.0],
"line" : [ [10.1, 11.1], [10.2, 11.2] ],
"rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
"polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
"circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },

or like this:

"location2d" : [41.0, 44.0],
"location3d" : [44.0, 13.0, 41.0],
"line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
"rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
"polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
[3.5, 3.6], "point4" : [4.6, 4.8] },
"circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },

My preference would probably be the latter, just so that "circle" doesn't
seem like such an odd duck and "line" and "rectangle" don't become
ambiguous.

(Aside: I think the current serialization of "circle" is broken; it seems
to be scrambling the radius and the point values.)

So there are a number of actions here even with the existing code, in
addition to supporting the new clean JSON output. I also found some issues
with the current AQL implementation and doc:

A. ADM allows numeric serializations like 5550d for double and 12i8 for
int8, but those are not valid in AQL it seems.

B. AQL doesn't seem to have any constructors for intervals; you can only
create them via functions like interval-from-date().

(A) and (B) both basically mean that not all valid ADM can be read as AQL,
which seems like it would be a desirable goal.

C. The ADM doc doesn't mention the "point3d" type.


Now accepting any input on the above, as well as the other issue about how
to select this form of output via the HTTP interface!

Ceej
aka Chris Hillery

On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <[email protected]>
wrote:

HI Chris,

Actually, it would be great if you can fix this since as you mentioned have
touched this part of the code.
Please confirm.

Cheers,
Sattam

On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <[email protected]> wrote:

I could take a look at this as well - it would be a natural extension of the work I did earlier to clean up the existing JSON output. It probably wouldn't be very difficult to do this in a relatively "dumb" way, but
there
also is some amount of duplicated code between the various output formats
and it would be tempting to try and tidy that up a bit as well.

Three issues need to be addressed regardless of who does it or how:

1. We'd need to decide how to "strip down" all ADM types. In most numeric cases it's pretty clear. For spatial types, it deserves a little bit of thought. (It may be that the current "lossless" form is concise enough.
For
example, the ADM instance { "foo" : point("5,5") } gets rendered in JSON
as
{ "foo" : { "point" : [5.0, 5.0] } } . Is there something that would be
better?)

2. How would the user select this format vs. the current JSON form? When
using the HTTP interface, the main way to select the returned
serialization
is via the HTTP Accept: header, and you select the "lossless JSON" form
with the MIME type application/json. If we have two different JSON
serializations, we'd need to invent a new MIME type, or introduce some
kind
of additional flag, or something.

3. When using the HTTP interface, the current lossless JSON is in fact
the
default output type. Should that remain the case, or should the "lossy"
JSON type be preferred?

Ceej
aka Chris Hillery

On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <[email protected]> wrote:

Cool.  Sattam + Wail are going to sign up to do this, I believe!
(They
want/need it first....)


On 8/1/15 9:38 AM, Till Westmann wrote:

Only a few thoughts:
1) Yes, we should definitely have that!
2) For the non-numeric extended atomic types we should find a
reasonable
string serialization and we need to provide functions to parse that serialization back to the extended atomic type (and I think that we
already
have that e.g. for the datetime types).
3) I think that we already had that discussion a few times (I remember arguing for it when I first joined the project) and it’s time to do it
:)

Cheers,
Till

On Aug 1, 2015, at 9:17 AM, Mike Carey <[email protected]> wrote:

Hey - our JSON output format is currently designed to be non-lossy,
in
the sense that it encodes all the details of the source types (since
ADM is
JSON++ and there's quite a bit in that ++ section). We really also
need an
option for "normal application users" that's lossy but produces the
kind of
JSON that would be expected by consuming applications that "don't
appreciate" the many different kinds of numeric data, the existence
of
spatial data, etc.  I.e., it'd be nice to have a default lossy
serialization into JSON as well.... (Note that if someone doesn't
want to
suffer the loss, they can always do their own out-conversions of the
data
in the return section of their AQL query to bridge the gap.)
Thoughts?






Reply via email to