[ 
https://issues.apache.org/jira/browse/NIFI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226647#comment-17226647
 ] 

Pierre Villard commented on NIFI-7790:
--------------------------------------

Yeah we definitely want to support more than just Avro for defining schemas... 
Supporting XSDs would provide a lot of features that we don't have with Avro 
schemas. This would be a large effort though and not sure someone is working on 
this at the moment.

I'm not aware of an automated way of converting XSDs to Avro schemas, let us 
know if what you found is working.

> XML record reader - failure on well-formed XML
> ----------------------------------------------
>
>                 Key: NIFI-7790
>                 URL: https://issues.apache.org/jira/browse/NIFI-7790
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.11.4
>            Reporter: Pierre Gramme
>            Priority: Major
>              Labels: records, xml
>         Attachments: bug-parse-xml.xml
>
>
> I am using ConvertRecord in order to parse XML flowfiles to Avro, with the 
> Infer Schema strategy. Some input flowfiles are sent to the failure output 
> queue whereas they are well-formed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <commercialName>Hachette</commercialName>
>               </item>
>       </editors>
> </root>
> {code}
> Note the use of authors/item/name on one side, and 
> editors/item/commercialName on the other side.
> On the other hand, this gets correctly parsed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <name>Hachette</name>
>               </item>
>       </editors>
> </root>
> {code}
> See the attached template for minimal reproducible example.
>  
> My interpretation is that the failure in the first case is due to 2 
> independent XML node types having the same name (<item> in this case) but 
> having different types and occurring in different parents with different 
> types. In the second case, both <item>'s actually have the same node type. I 
> didn't use any Schema Inference Cache, so both item types should be inferred 
> independently. 
> Since the first document is legal XML (an XSD could be written for it) and 
> can also be represented in Avro, its conversion shouldn't fail.
> I'll be happy to provide more details if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to