[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408889#comment-17408889 ] John Wise commented on NIFI-6241: - [~dsargrad] - I'm not sure if you've figured this out in the interim, but the JsonRecordSetWriter has a "Schema Write Strategy" property which can be set to "Set 'avro.schema' Attribute". The writer will then write the inferred schema into that attribute. We use that regularly for new or problematic datasets to create or debug conversions. > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png, > image-2019-05-20-09-01-02-488.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844065#comment-16844065 ] David Sargrad commented on NIFI-6241: - Hi Otto. I'm not sure I follow your point. My XMLReader is configured with a schema access strategy of "Infer Schema" . > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png, > image-2019-05-20-09-01-02-488.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844044#comment-16844044 ] Otto Fowler commented on NIFI-6241: --- I'm confused, in your preproduction case, the generate flow flow has an avro schema that doesn't have the fields that you are missing > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png, > image-2019-05-20-09-01-02-488.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843946#comment-16843946 ] David Sargrad commented on NIFI-6241: - Is it easy to expose the structure of the inferred schema? I was looking to use that inferred schema as a starting point for a schema that I further refine by hand. I did not figure out how to do that. > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png, > image-2019-05-20-09-01-02-488.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843943#comment-16843943 ] David Sargrad commented on NIFI-6241: - Hi. Thank you for this response. I do think your idea of relaxing the requirement for a root tag even if there is only one record in the flow file is a good idea. I do think that often someone will have a structure such as the one in my example. Relative to your second point, I am not sure I fully comprehend the expected behavior as you describe it. I'd have expected that the inference engine give me an inferred structure for the fltdMessage. Specifically this would include values for the following: !image-2019-05-20-09-01-02-488.png! > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png, > image-2019-05-20-09-01-02-488.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails
[ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826255#comment-16826255 ] Matt Burgess commented on NIFI-6241: I believe the need for a root tag is because the record-based processors are meant to work on flow files containing multiple records. Currently for the XMLReader it expects a root tag even if there is only one record in the flow file. Perhaps it is possible to relax this requirement if there is only one record. For the "missing" fields, to me it looks like no fields were inferred because there are no fields with explicit values within, only self-closing tags with attributes. I think that's expected behavior until we revamp the schema system to support formats that have metadata about the fields themselves (XML tag attributes, e.g.). What fields/values were you expecting? Perhaps we could add a property to extract attributes as fields or something. > ConvertRecord Schema Inference fails to infer complete schema, or simply fails > -- > > Key: NIFI-6241 > URL: https://issues.apache.org/jira/browse/NIFI-6241 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 >Reporter: David Sargrad >Priority: Major > Attachments: Reproduce_ConvertRecord_Shortcoming.xml, > image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, > image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, > image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, > image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, > image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png > > > I've got a simple test flow as depicted below: > > > !image-2019-04-24-13-38-16-605.png! > > The input XML is: > !image-2019-04-24-13-41-26-860.png! > > The output JSON is almost correct, yet it is missing two critical fields > (they both show up as "null". The null fields are > {color:#ff}position{color} and {color:#ff}ncsmTrackData{color}. It is > also missing all of the attributes on fltdMessage. > > !image-2019-04-24-13-41-00-704.png! > > The configuration of my ConvertRecord is: > !image-2019-04-24-13-43-28-531.png! > > My XMLReader configuration is: > !image-2019-04-24-13-43-59-706.png! > > Questions: > # Why are these two fields null? > # Why are all the fltdMessage attributes being ignored? > It would seem that this is a bug, or at least a major shortcoming, in the > schema inference capability. If there were a way for me to view the inferred > schema, then I could use that as a starting point. However its not clear from > the documentation how to view that schema. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)