Consider this XML: <test1> <int1 x="2">A</int1> <int1 x="7">B</int1> <int1 y="3">Y</int1> <char1 y="4">C</char1> </test1>
And this drill query: SELECT * FROM cp.`xml/foo.xml` I am using datalevel = 1. The results I get (calling RowSet results.print() in my junit test) are: #: `attributes` STRUCT<`int1_x` VARCHAR, `int1_y` VARCHAR, `char1_y` VARCHAR>, `int1` VARCHAR, `char1` VARCHAR 0: {"27", "3", "4"}, "ABY", "C" So questions: First, why is it constructing 1 row, not multiple? The only way I expect to get only 1 row out is if I did a group-by with the whole row-set having only 1 key value. Second, why is it concatenating the value strings? I'd expect to write like: "SELECT '1' AS key, * FROM ...theTable... GROUP BY key", and only then would I expect concatenation if everything is a string and concat is somehow the default grouping operation. Even then it's a stretch. Here's what I expected to get out after inspecting the schema that was inferred from the data: 0: {"2", null, null}, "A", null 1: {"7", null, null}, "B", null 2: {null, "3", null}, "Y", null 3: {null, null, "4"}, null, "C" Those correspond to the 3 columns "attributes", "int1", "char1", where attributes is itself { int1_x, int1_y, char1_y}. Third, how would I change my query to get out what I expect? Lastly, what is the rationale for the name "int1_x" (also int1_y, and char1_y) ? I expected to see two separate attributes columns: "attributes_int1" and "attributes_char1" as maps with non-prefixed children named x, y and y respectively. I guess I just don't grok the rationale for how queries work against XML. The natural XML schema for this XML document is: <xs:element name="test1"> <xs:complexType> <xs:choice> <xs:element name="int1"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="x" type="xs:int"/> <xs:attribute name="y" type="xs:int"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="char1"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="y" type="xs:int"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> I need to synthesize the same TupleMetadata from this schema that the current XML reader infers incrementally, so I really need to understand the rationale, because I wouldn't expect this choice to be entirely flattened including the attributes. Thanks for any help Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com