benj created DRILL-8481:
---------------------------
Summary: Ability to query root attributes
Key: DRILL-8481
URL: https://issues.apache.org/jira/browse/DRILL-8481
Project: Apache Drill
Issue Type: Improvement
Components: Storage - XML
Affects Versions: 1.21.1
Reporter: benj
Hi,
It is possible to retrieve the field attributes except those of the root
It would be interesting to be able to retrieve the attributes found in the root
node of XML files.
In my common use cases, I have many XML files each containing a single XML
frame with often one or more attributes in the root tag.
To recover this value, I am currently forced to preprocess the files to "copy"
this attribute into the fields of the XML record.
Even with multiple xml records under the root, it would be useful to consider
that the root attributes are accessible for each record
Example (fichier aaa.xml):
{noformat}
<PPP Version="2023-001" TimeStamp="2023-06-09T21:17:14.416+02:00">
<P1 SubVersion="a1" MID="XX003" PN="156" SL="3"/>
<P2 SubVersion="b1"><Color>blue</Color></P2>
</PPP>
{noformat}
With request :
{code:sql}
SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml',
dataLevel=>1)) as xml) AS x;
{code}
I can access to :
* P1_SubVersion
* P1_MID
* P1_PN
* P1_SL
* P2_SubVersion
* P2.Color
But I can' access to :
* PPP_Version
* PPP_TimeStamp
and changing the DataLevel does not solve the problem
Regards,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)