Question about Drill internal data representation for Daffodil tree infosets

Mike Beckerle Tue, 10 Oct 2023 04:57:54 -0700

I am trying to understand the options for populating Drill data from a
Daffodil data parse.


Suppose you have this JSON

{"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}

or this equivalent XML:

<parent>
  <sub1><a1>1</a1><a2>2</a2></sub1>
  <sub2><b1>3</b1><b2>4</b2><b3>5</b3></sub2>
</parent>

Unlike those texts, Daffodil is going to have a tree data structure where a
parent node contains two child nodes sub1 and sub2, and each of those has
children a1, a2, and b1, b2, b3 respectively.
It's analogous roughly to the DOM tree of the XML, or the tree of nested
JSON map nodes you'd get back from a JSON parse of that text.

In Drill to query the JSON like:

select parent.sub1 from myStructure

gives you back single column containing what seems to be a string like

|        sub1        |
----------------------
| { "a1":1, "a2":2}  |

So, my question is this. Is this actually a string in Drill, (what is the
type of sub1?) or is sub1 actually a Drill data row/map node value with two
node children, that just happens to print out looking like a JSON string?

Thanks for any insight here.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com

Question about Drill internal data representation for Daffodil tree infosets

Reply via email to