I'm using Apache Daffodil in the mode where it outputs JSON data. (For the
moment, until we build a tighter integration. This is my conceptual test
framework for that integration.)
I have parsed data to create this JSON which represents 2-level nested
repeating subrecords.
All the simple fields are int.
[{"a":1, "b":2, "c":[{"d":3, "e":4, "f":[{"g":5, "h":6 },
{"g":7, "h":8 }]},
{"d":9, "e":10, "f":[{"g":11, "h":12},
{"g":13, "h":14}]}]},
{"a":21, "b":22, "c":[{"d":23, "e":24, "f":[{"g":25, "h":26 },
{"g":27, "h":28 }]},
{"d":29, "e":30, "f":[{"g":31, "h":32},
{"g":33, "h":34}]}]}]
So, the top level is a vector of maps,
within that, field "c" is a vector of maps,
and within "c" is a field f which is a vector of maps.
The reason I created this is I'm trying to understand the arrays and how
they work with Drill SQL.
I'm trying to figure out how to get this rowset of 3 rows from a query, and
I'm stumped.
a b d e g h
( 1, 2, 3, 4, 5, 6)
( 1, 2, 9, 10, 13, 14)
(21, 22, 29, 30, 33, 34)
This is the SQL that is my conceptual framework, but I'm sure it won't work.
SELECT a, b, c.d AS d, c.e AS e, c.f.g AS g, c.f.h AS h
FROM ... the json file...
WHERE g mod 10 == 3 OR g == 5
But I know it's not going to be that easy to get the query to traverse the
vector inside the vector.
>From the doc, the FLATTEN operator seems to be needed, but I can't really
figure it out.
This is what all my data is like. Trees of nested vectors of sub-records.
Can anyone advise on what the SQL might look like, or where there's an
example doing something like this I can learn from?
Thanks for any help
Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com