In designing the integration of Apache Daffodil into Drill, I'm trying to
figure out how queries would look operating on deeply nested data.

Here's an example.

This is the path to many geo-location latLong field pairs in some
"messageSet" data:

messageSet/noc_message[*]/message_content/content/vmf/payload/message/K05_17/overlay_message/r1_group/item[*]/points_group/item[*]/latLong

This is sort-of like XPath, except in the above I have put "[*]" to
indicate the child elements that are vectors. You can see there are 3
nested vectors here.

Beneath that path are these two fields, which are what I would want out of
my query, along with some fields from higher up in the nest.

entity_latitude_1/degrees
entity_longitude_1/degrees

The tutorial information here

    https://drill.apache.org/docs/selecting-nested-data-for-a-column/

describes how to index into JSON arrays with specific integer values, but I
don't want specific integers, I want all values of them.

Can someone show me what a hypothetical Drill query would look like that
pulls out all the values of this latLong pair?

My stab is:

SELECT pairs.entity_latitude_1.degrees AS lat,
pairs.entity_longitude_1.degrees AS lon FROM
 
messageSet.noc_message[*].message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item[*].points_group.item[*].latLong
AS pairs

I'm not at all sure about the vectors in that though.

The other idea was this quasi-notation (that I'm making up on the fly here)
which treats each vector as a table.

SELECT pairs.entity_latitude_1.degrees AS lat,
pairs.entity_longitude_1.degrees AS lon FROM
  messageSet.noc_message AS messages,

messages.message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item
AS parents
  parents.points_group.item AS items
  items.latLong AS pairs

I have no idea if that makes any sense at all for Drill

Any help greatly appreciated.

-Mike Beckerle

Reply via email to