Drill can internally handle scalars, arrays (AKA vectors) and maps (AKA tuples, structs). SQL, however, prefers to work with scalars: there is no good syntax to reach inside a complex object for, say, a WHERE condition without also projecting that item as a top-level scalar.
The cool thing, for ML use cases, is that Drill's arrays can also be structured: a vector of input values each of which is a vector of data points along with a class label. That said, if you have a record with a field "obj" that is a map (struct, object) that contains a field "coord" that is an array of two (or three) doubles, you can project it as: SELECT obj.coord FROM something The value you get back will be an array. Drill's native API handles this just fine. JDBC does not really speak "vector". So, in that case, you could project the elements: SELECT obj.coord[0] AS x, obj.coord[1] AS y FROM something I find it helpful to first think about how Drill's internal data vectors will look, then work from there to the SQL that will do what needs doing. - Paul On Tue, Jul 11, 2023 at 11:46 AM Charles Givre <cgi...@gmail.com> wrote: > HI Mike, > When you say "you want all of them', can you clarify a bit about what > you'd want the data to look like? > Best, > -- C > > > > > On Jul 11, 2023, at 12:33 PM, Mike Beckerle <mbecke...@apache.org> > wrote: > > > > In designing the integration of Apache Daffodil into Drill, I'm trying to > > figure out how queries would look operating on deeply nested data. > > > > Here's an example. > > > > This is the path to many geo-location latLong field pairs in some > > "messageSet" data: > > > > > messageSet/noc_message[*]/message_content/content/vmf/payload/message/K05_17/overlay_message/r1_group/item[*]/points_group/item[*]/latLong > > > > This is sort-of like XPath, except in the above I have put "[*]" to > > indicate the child elements that are vectors. You can see there are 3 > > nested vectors here. > > > > Beneath that path are these two fields, which are what I would want out > of > > my query, along with some fields from higher up in the nest. > > > > entity_latitude_1/degrees > > entity_longitude_1/degrees > > > > The tutorial information here > > > > https://drill.apache.org/docs/selecting-nested-data-for-a-column/ > > > > describes how to index into JSON arrays with specific integer values, > but I > > don't want specific integers, I want all values of them. > > > > Can someone show me what a hypothetical Drill query would look like that > > pulls out all the values of this latLong pair? > > > > My stab is: > > > > SELECT pairs.entity_latitude_1.degrees AS lat, > > pairs.entity_longitude_1.degrees AS lon FROM > > > messageSet.noc_message[*].message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item[*].points_group.item[*].latLong > > AS pairs > > > > I'm not at all sure about the vectors in that though. > > > > The other idea was this quasi-notation (that I'm making up on the fly > here) > > which treats each vector as a table. > > > > SELECT pairs.entity_latitude_1.degrees AS lat, > > pairs.entity_longitude_1.degrees AS lon FROM > > messageSet.noc_message AS messages, > > > > > messages.message_content.content.vmf.payload.message.K05_17.overlay_message.r1_group.item > > AS parents > > parents.points_group.item AS items > > items.latLong AS pairs > > > > I have no idea if that makes any sense at all for Drill > > > > Any help greatly appreciated. > > > > -Mike Beckerle > >