On 12 Aug 2015, at 10:15, Chris Hillery wrote:

So I don't think we've reached anything like a consensus here regarding spatial types. I'll restate my opinion that coercing them into any of the
complex specifications that have been mentioned (GeoJSON, arcgis, even
Well-Known Text) is inappropriate for serialization. Also, most of those specifications are even more complex than the "lossless JSON" serialization
we already have, so would be doubly inappropriate for our "clean JSON"
variant.

That's what I don't think should be done, so what do I think should be
done? Well, the whole purpose of this exercise as initially suggested by Mike was to allow a form of JSON output that was more like what someone consuming this *as JSON* would expect. To me that means that the format should be something that (a) is useful for downstream JSON tools, while (b) being as simply-structured as possible. Also, a non-goal in my mind is that this output format be able to be returned to its original ADM form; it is
explicitly "lossy" in that sense.

Till suggested that the rule should be that all atomic ADM types got
serialized as atomic JSON, generally by creating a string representation of
the data. That works nicely for numerics as well as things that are
basically strings anyway, such as UUID and Hex. It also suggests an obvious way to handle date/time/duration types since there is something of a global
standard string representation for those.

However, upon thinking about it, I don't think that's the simplest nor the most useful way for us to represent spatial types. The best we could do
there would be something not entirely unlike a dramatic subset of
Well-Known Text, eg. "POINT (30 10)". While that arguably meets criterion
(b) above, it definitely doesn't meet (a) since any downstream
JSON-accepting tool is going to have to do non-JSON string processing to
extract the actual meaning. I also come back again to the problem that
Circle cannot be represented unless we create a non-standard extension to
WKT.

I really would like to get to a consistent set of rules on how we serialize ADM instances to JSON.
My proposal for those rules is:

1) structures are represented by JSON structures (objects and arrays)
2) values are represented by JSON values (string, number)
3) types that are not numeric are represented by a widely supported string representation.

And I think that those rules make sense. When consuming some JSON, the JSON parser natively supports the JSON structures and values. And if someone works is a specific domain (e.g. spatial) they probably have a parser for the widely supported string representation that they can use to parse the string value that they got from the JSON parser. If we invent our own structured representation, we might make things a little easier for people who manually craft their application for he first time, but we make it harder for people who are already working in the domain and want to use AsterixDB to store their data.

Also, if our support for spatial types differs significantly from the "usual" support, we should consider if we doing the right thing. I think that we don't want to tell people dealing with spatial data how to do it. I'd like to support them by providing the right infrastructure.

Unfortunately I don't really have the right expertise on the subject and if nobody else in the project has it, I think that we should at least try to find it somewhere else. Maybe we can find someone in the Apache SIS project (http://sis.apache.org) . Looking at their PMC, Chris Mattmann is on the roster, so he might be able to tell us or to point us to the right people.

After considering the various things that have been discussed, I've gotta be honest: I still like my original proposal the best. It's a concise but usable consolidation of the data represented in ADM, which best I can tell
is what we're looking to implement.

"location2d" : [41.0, 44.0],
"location3d" : [44.0, 13.0, 41.0],
"line" : [ [10.1, 11.1], [10.2, 11.2] ],
"rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
"polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
"circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },

I'm not entirely happy that circle gets rendered as as an object; something like "circle": [ [11.1, 10.2], 10.1 ] could work too. Or, if necessary, all shapes (not points) could be rendered as objects as per my secondary
proposal.

The things about this format is, that it's really difficult to see (for humans or parsers) what spatial types are represented by these nested arrays.

My 2c,
Till


On Fri, Aug 7, 2015 at 11:08 PM, Mike Carey <[email protected]> wrote:

I am willing to retract my proposal... :-)
(Consider it retracted; I agree with Ted Dunning's comment, and similar
comments by others.)


On 8/7/15 10:35 PM, Chen Li wrote:

In today's weekly meeting Mike mentioned the idea of getting rid of
the "circle" data type.  It will be good to have a F2F discussion
before we make the final decision.

Chen



Reply via email to