On 12 Aug 2015, at 10:15, Chris Hillery wrote:
So I don't think we've reached anything like a consensus here
regarding
spatial types. I'll restate my opinion that coercing them into any of
the
complex specifications that have been mentioned (GeoJSON, arcgis, even
Well-Known Text) is inappropriate for serialization. Also, most of
those
specifications are even more complex than the "lossless JSON"
serialization
we already have, so would be doubly inappropriate for our "clean JSON"
variant.
That's what I don't think should be done, so what do I think should be
done? Well, the whole purpose of this exercise as initially suggested
by
Mike was to allow a form of JSON output that was more like what
someone
consuming this *as JSON* would expect. To me that means that the
format
should be something that (a) is useful for downstream JSON tools,
while (b)
being as simply-structured as possible. Also, a non-goal in my mind is
that
this output format be able to be returned to its original ADM form; it
is
explicitly "lossy" in that sense.
Till suggested that the rule should be that all atomic ADM types got
serialized as atomic JSON, generally by creating a string
representation of
the data. That works nicely for numerics as well as things that are
basically strings anyway, such as UUID and Hex. It also suggests an
obvious
way to handle date/time/duration types since there is something of a
global
standard string representation for those.
However, upon thinking about it, I don't think that's the simplest nor
the
most useful way for us to represent spatial types. The best we could
do
there would be something not entirely unlike a dramatic subset of
Well-Known Text, eg. "POINT (30 10)". While that arguably meets
criterion
(b) above, it definitely doesn't meet (a) since any downstream
JSON-accepting tool is going to have to do non-JSON string processing
to
extract the actual meaning. I also come back again to the problem that
Circle cannot be represented unless we create a non-standard extension
to
WKT.
I really would like to get to a consistent set of rules on how we
serialize ADM instances to JSON.
My proposal for those rules is:
1) structures are represented by JSON structures (objects and arrays)
2) values are represented by JSON values (string, number)
3) types that are not numeric are represented by a widely supported
string representation.
And I think that those rules make sense. When consuming some JSON, the
JSON parser natively supports the JSON structures and values. And if
someone works is a specific domain (e.g. spatial) they probably have a
parser for the widely supported string representation that they can use
to parse the string value that they got from the JSON parser.
If we invent our own structured representation, we might make things a
little easier for people who manually craft their application for he
first time, but we make it harder for people who are already working in
the domain and want to use AsterixDB to store their data.
Also, if our support for spatial types differs significantly from the
"usual" support, we should consider if we doing the right thing. I think
that we don't want to tell people dealing with spatial data how to do
it. I'd like to support them by providing the right infrastructure.
Unfortunately I don't really have the right expertise on the subject and
if nobody else in the project has it, I think that we should at least
try to find it somewhere else.
Maybe we can find someone in the Apache SIS project
(http://sis.apache.org) .
Looking at their PMC, Chris Mattmann is on the roster, so he might be
able to tell us or to point us to the right people.
After considering the various things that have been discussed, I've
gotta
be honest: I still like my original proposal the best. It's a concise
but
usable consolidation of the data represented in ADM, which best I can
tell
is what we're looking to implement.
"location2d" : [41.0, 44.0],
"location3d" : [44.0, 13.0, 41.0],
"line" : [ [10.1, 11.1], [10.2, 11.2] ],
"rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
"polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
"circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
I'm not entirely happy that circle gets rendered as as an object;
something
like "circle": [ [11.1, 10.2], 10.1 ] could work too. Or, if
necessary,
all shapes (not points) could be rendered as objects as per my
secondary
proposal.
The things about this format is, that it's really difficult to see (for
humans or parsers) what spatial types are represented by these nested
arrays.
My 2c,
Till
On Fri, Aug 7, 2015 at 11:08 PM, Mike Carey <[email protected]> wrote:
I am willing to retract my proposal... :-)
(Consider it retracted; I agree with Ted Dunning's comment, and
similar
comments by others.)
On 8/7/15 10:35 PM, Chen Li wrote:
In today's weekly meeting Mike mentioned the idea of getting rid of
the "circle" data type. It will be good to have a F2F discussion
before we make the final decision.
Chen