[
https://issues.apache.org/jira/browse/NIFI-11577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059775#comment-18059775
]
Daniel Stieglitz edited comment on NIFI-11577 at 2/19/26 9:08 PM:
------------------------------------------------------------------
It appears the above XML as is has something wrong with the last tag
{code:java}
<wd:wd:Report_Entry>{code}
as the namespace is repeated twice and its not an ending tag. When trying to
read this in NIFI with an XMLReader I got the following exception
{code:java}
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[8,7]
Message: Element type "wd:wd" must be followed by either attribute
specifications, ">" or "/>".{code}
After removing the extra namespace (and without making it an end tag I got the
following:
{code:java}
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[8,18]
Message: XML document structures must start and end within the same
entity.{code}
After those corrections and then using the schema above and specifying an
attribute to have the errors of the validation written to, I was able to
reproduce what the reporter saw with the following message reported by
ValidateRecord.
{code:java}
Records in this FlowFile were invalid for the following reasons: ; The
following 1 fields had values whose type did not match the schema:
[/job_requisition]{code}
I did notice though when ingesting the XML and inferring its schema and then
using it in ValidateRecord that there was no issue. Below is the inferred
schema. One thing that stood out was that the inferred schema infers the
Job_Requisition element as a type record while the schema above assumes its an
array.
{code:java}
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "Req_ID",
"type": [
"string",
"null"
]
},
{
"name": "Job_Requisition",
"type": [
{
"type": "record",
"name": "Job_RequisitionType",
"fields": [
{
"name": "Descriptor",
"type": [
"string",
"null"
]
},
{
"name": "ID",
"type": [
{
"type": "array",
"items": {
"type": "record",
"name": "Job_Requisition_IDType",
"fields": [
{
"name": "type",
"type": [
"string",
"null"
]
},
{
"name": "content_value",
"type": [
"string",
"null"
]
}
]
}
},
"null"
]
}
]
},
"null"
]
}
]
} {code}
was (Author: JIRAUSER294662):
It appears the above XML as is has something wrong with the last tag
{code:java}
<wd:wd:Report_Entry>{code}
as the namespace is repeated twice and its not an ending tag. When trying to
read this in NIFI with an XMLReader I got the following exception
{code:java}
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[8,7]
Message: Element type "wd:wd" must be followed by either attribute
specifications, ">" or "/>".{code}
After removing the extra namespace (and without making it an end tag I got the
following:
{code:java}
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[8,18]
Message: XML document structures must start and end within the same
entity.{code}
After those corrections and then using the schema above and specifying an
attribute to have the errors of the validation written to, I was able to
reproduce what the reporter saw with the following message reported by
ValidateRecord.
{code:java}
Records in this FlowFile were invalid for the following reasons: ; The
following 1 fields had values whose type did not match the schema:
[/job_requisition]{code}
I did notice though when ingesting the XML and inferring its schema and then
using it in ValidateRecord that there was no issue. Below is the inferred schema
{code:java}
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "Req_ID",
"type": [
"string",
"null"
]
},
{
"name": "Job_Requisition",
"type": [
{
"type": "record",
"name": "Job_RequisitionType",
"fields": [
{
"name": "Descriptor",
"type": [
"string",
"null"
]
},
{
"name": "ID",
"type": [
{
"type": "array",
"items": {
"type": "record",
"name": "Job_Requisition_IDType",
"fields": [
{
"name": "type",
"type": [
"string",
"null"
]
},
{
"name": "content_value",
"type": [
"string",
"null"
]
}
]
}
},
"null"
]
}
]
},
"null"
]
}
]
} {code}
> ValidateRecord fails to validate XML record of record and attributes;
> ConvertRecord recognises just fine
> --------------------------------------------------------------------------------------------------------
>
> Key: NIFI-11577
> URL: https://issues.apache.org/jira/browse/NIFI-11577
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: crissaegrim
> Assignee: Bryan Bende
> Priority: Major
> Attachments: image-2023-05-21-08-08-48-669.png
>
>
> h1. Input Data
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <wd:Report_Entry xmlns:wd="urn:com.workday.report/foo">
> <wd:Req_ID>REQ-7602</wd:Req_ID>
> <wd:Job_Requisition Descriptor="REQ-7602 Trader (Open)">
> <wd:ID type="WID">91384a20a89001a955bb7ded1401271f</wd:ID>
> <wd:ID type="Job_Requisition_ID">REQ-7602</wd:ID>
> </wd:Job_Requisition>
> <wd:wd:Report_Entry>
> {code}
> h1. Schema (avro)
> {code:json}
> {
> "type" : "record",
> "name" : "PendingHiresEntryType",
> "namespace" : "ns",
> "fields" : [ {
> "name" : "requisition_id",
> "type" : [ "null", "string" ],
> "default" : null,
> "aliases" : [ "Req_ID" ]
> }, {
> "name" : "job_requisition",
> "type" : [ "null", {
> "type" : "array",
> "items" : {
> "type" : "record",
> "name" : "WdDescribedIdType",
> "fields" : [ {
> "name" : "id_field",
> "type" : [ "null", {
> "type" : "array",
> "items" : {
> "type" : "record",
> "name" : "WdAttributedIdType",
> "fields" : [ {
> "name" : "content_value",
> "type" : [ "null", "string" ]
> }, {
> "name" : "type",
> "type" : "string"
> } ]
> }
> } ],
> "default" : null,
> "aliases" : [ "ID" ]
> }, {
> "name" : "description",
> "type" : [ "null", "string" ],
> "default" : null,
> "aliases" : [ "Descriptor" ]
> } ]
> }
> } ],
> "default" : null,
> "aliases" : [ "Job_Requisition" ]
> } ]
> }
> {code}
> h1. Expected Output
> It should successfully validate the record. Whereas `ConvertRecord` will
> successfully parse the record according to schema, `ValidateRecord` will
> report a failure.
> h1. Actual Output
> !image-2023-05-21-08-08-48-669.png!
> h1. Output Data of ConvertRecord (using same schema, no changes)
> {code:xml}
> <?xml version="1.0" ?>
> <PendingHiresEntryType>
> <requisition_id>REQ-7602</requisition_id>
> <job_requisition>
> <id_field>
> <content_value>91384a20a89001a955bb7ded1401271f</content_value>
> <type>WID</type>
> </id_field>
> <id_field>
> <content_value>REQ-7602</content_value>
> <type>Job_Requisition_ID</type>
> </id_field>
> <description>REQ-7602 Trader (Open)</description>
> </job_requisition>
> </PendingHiresEntryType>
> {code}
> h1. Schema (IDL)
> {code:java}
> record WdAttributedIdType {
> string? content_value;
> string type;
> }
> record WdDescribedIdType {
> union {null, array<WdAttributedIdType>} @aliases(["ID"]) id_field = null;
> string? @aliases(["Descriptor"]) description = null;
> }
> record PendingHiresEntryType {
> string? @aliases(["Req_ID"]) requisition_id = null;
> union { null, array<WdDescribedIdType> } @aliases(["Job_Requisition"])
> job_requisition = null;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)