[
https://issues.apache.org/jira/browse/DRILL-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Givre updated DRILL-8481:
---------------------------------
Fix Version/s: 1.21.2
> Ability to query XML root attributes
> ------------------------------------
>
> Key: DRILL-8481
> URL: https://issues.apache.org/jira/browse/DRILL-8481
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - XML
> Affects Versions: 1.21.1
> Reporter: benj
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.21.2
>
>
> Hi,
> It is possible to retrieve the field attributes except those of the root
> It would be interesting to be able to retrieve the attributes found in the
> root node of XML files.
> In my common use cases, I have many XML files each containing a single XML
> frame with often one or more attributes in the root tag.
> To recover this value, I am currently forced to preprocess the files to
> "copy" this attribute into the fields of the XML record.
> Even with multiple xml records under the root, it would be useful to consider
> that the root attributes are accessible for each record
> Example (fichier aaa.xml):
> {noformat}
> <PPP Version="2023-001" TimeStamp="2023-06-09T21:17:14.416+02:00">
> <P1 SubVersion="a1" MID="XX003" PN="156" SL="3"/>
> <P2 SubVersion="b1"><Color>blue</Color></P2>
> </PPP>
> {noformat}
> With request :
> {code:sql}
> SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml',
> dataLevel=>1)) as xml) AS x;
> {code}
> I can access to :
> * P1_SubVersion
> * P1_MID
> * P1_PN
> * P1_SL
> * P2_SubVersion
> * P2.Color
> But I can' access to :
> * PPP_Version
> * PPP_TimeStamp
> and changing the DataLevel does not solve the problem
> Regards,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)