One thing I'll add is that if you have nested data, Drill will allow you to 
query the data using both dotted notation and bracket notation.  IE:

SELECT foo.bar FROM....

AND

SELECT foo['bar'] FROM

Both will work, however, I STRONGLY recommend using the bracket notation.  I've 
found that the dotted notation will cause some strange and inconsistent errors. 
 The bracket notation is much more reliable.
Best,
-- C


> On Aug 18, 2023, at 5:00 PM, Mike Beckerle <mbecke...@apache.org> wrote:
> 
> I'm using Apache Daffodil in the mode where it outputs JSON data. (For the
> moment, until we build a tighter integration. This is my conceptual test
> framework for that integration.)
> 
> I have parsed data to create this JSON which represents 2-level nested
> repeating subrecords.
> 
> All the simple fields are int.
> 
> [{"a":1,  "b":2,  "c":[{"d":3,  "e":4,  "f":[{"g":5,  "h":6 },
>                                             {"g":7,  "h":8 }]},
>                       {"d":9,  "e":10, "f":[{"g":11, "h":12},
>                                             {"g":13, "h":14}]}]},
> {"a":21, "b":22, "c":[{"d":23, "e":24, "f":[{"g":25, "h":26 },
>                                             {"g":27, "h":28 }]},
>                       {"d":29, "e":30, "f":[{"g":31, "h":32},
>                                             {"g":33, "h":34}]}]}]
> 
> So, the top level is a vector of maps,
> within that, field "c" is a vector of maps,
> and within "c" is a field f which is a vector of maps.
> 
> The reason I created this is I'm trying to understand the arrays and how
> they work with Drill SQL.
> 
> I'm trying to figure out how to get this rowset of 3 rows from a query, and
> I'm stumped.
> 
>  a   b   d   e   g   h
> ( 1,  2,  3,  4,  5,  6)
> ( 1,  2,  9, 10, 13, 14)
> (21, 22, 29, 30, 33, 34)
> 
> This is the SQL that is my conceptual framework, but I'm sure it won't work.
> 
> SELECT a, b, c.d AS d, c.e AS e, c.f.g AS g, c.f.h AS h
> FROM ... the json file...
> WHERE g mod 10 == 3 OR g == 5
> 
> But I know it's not going to be that easy to get the query to traverse the
> vector inside the vector.
> 
> From the doc, the FLATTEN operator seems to be needed, but I can't really
> figure it out.
> 
> This is what all my data is like. Trees of nested vectors of sub-records.
> 
> Can anyone advise on what the SQL might look like, or where there's an
> example doing something like this I can learn from?
> 
> Thanks for any help
> 
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to