I'm using Apache Daffodil in the mode where it outputs JSON data. (For the moment, until we build a tighter integration. This is my conceptual test framework for that integration.)
I have parsed data to create this JSON which represents 2-level nested repeating subrecords. All the simple fields are int. [{"a":1, "b":2, "c":[{"d":3, "e":4, "f":[{"g":5, "h":6 }, {"g":7, "h":8 }]}, {"d":9, "e":10, "f":[{"g":11, "h":12}, {"g":13, "h":14}]}]}, {"a":21, "b":22, "c":[{"d":23, "e":24, "f":[{"g":25, "h":26 }, {"g":27, "h":28 }]}, {"d":29, "e":30, "f":[{"g":31, "h":32}, {"g":33, "h":34}]}]}] So, the top level is a vector of maps, within that, field "c" is a vector of maps, and within "c" is a field f which is a vector of maps. The reason I created this is I'm trying to understand the arrays and how they work with Drill SQL. I'm trying to figure out how to get this rowset of 3 rows from a query, and I'm stumped. a b d e g h ( 1, 2, 3, 4, 5, 6) ( 1, 2, 9, 10, 13, 14) (21, 22, 29, 30, 33, 34) This is the SQL that is my conceptual framework, but I'm sure it won't work. SELECT a, b, c.d AS d, c.e AS e, c.f.g AS g, c.f.h AS h FROM ... the json file... WHERE g mod 10 == 3 OR g == 5 But I know it's not going to be that easy to get the query to traverse the vector inside the vector. >From the doc, the FLATTEN operator seems to be needed, but I can't really figure it out. This is what all my data is like. Trees of nested vectors of sub-records. Can anyone advise on what the SQL might look like, or where there's an example doing something like this I can learn from? Thanks for any help Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com