[jira] [Updated] (NIFI-10169) JoinEnrichment merged schema incorrect if first Enrichment record is null

Mark Payne (Jira) Sat, 25 Jun 2022 08:50:05 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Payne updated NIFI-10169:
------------------------------
    Description: 
>From the users mailing list:
{quote}
I am performing some enrichments, and sometimes the enrichment look up fails as 
the item is not there. If the enrichment fails for the first item in an array 
of records then the merge fails to properly merge subsequent records in the 
same array. The join processor uses an “infer” JSON reader for the original 
leg, a schema based reader for the enriched leg, and a “inherit” JSON writer. I 
am using the “Insert” join strategy. Here is an example.
 
If the original record is
 
[{"transport_protocol_id":17,"Enrichment":{}},{"transport_protocol_id":6,"Enrichment":{}}]
 
Then both lookups succeed, and the enrichment record looks like this
 
[
  {"network_transport" : {"name" : "udp", "code" : 17, "alias" : "UDP", 
"comment" : "user datagram protocol"}},
  {"network_transport" : {"name" : "tcp", "code" : 6, "alias" : "TCP", 
"comment" : "transmission control protocol"}
]
 
And the joined record looks like this.
 
[
  {"transport_protocol_id" : 17,"Enrichment" : {
    "network_transport" : {"name" : "udp","code" : 17,"alias" : "UDP","comment" 
: "user datagram protocol"}}},
  {"transport_protocol_id" : 6,"Enrichment" : {
    "network_transport" : {"name" : "tcp","code" : 6,"alias" : "TCP","comment" 
: "transmission control protocol"}}}
]
 
However if the first record has a key value that is out of range, such as this
 
[{"transport_protocol_id":9999,"Enrichment":{}},{"transport_protocol_id":6,"Enrichment":{}}]
 
Then the first record in the enriched leg will be null, even if the rest of the 
records are correct. However the enrichment is still valid JSON once I have 
processed it in the enrichment leg.
 
[
  {"network_transport" : null},
  {"network_transport" : {"name" : "tcp", "code" : 6,"alias" : "TCP", "comment" 
: "transmission control protocol"}}
]
 
But the joined record does not properly process the subsequent records, and the 
content looks like this.
 
[
  {"transport_protocol_id" : 9999,"Enrichment" : {
    "network_transport" : null}},
  {"transport_protocol_id" : 6,"Enrichment" : {
    "network_transport" : "MapRecord[{name=tcp, alias=TCP, comment=transmission 
control protocol, code=6}]"}}
]
 
Is there any step I could use to ensure the join happens as expected? Or is 
this the same situation as the JIRA I mentioned above? I am not able to use a 
schema based writer as our real case has too many input record types and 
enrichment options that the number of combinations, and hence schemas, could 
not be managed.
{quote}

  was:
>From the users mailing list:
{quote}I am performing some enrichments, and sometimes the enrichment look up 
fails as the item is not there. If the enrichment fails for the first item in 
an array of records then the merge fails to properly merge subsequent records 
in the same array. The join processor uses an “infer” JSON reader for the 
original leg, a schema based reader for the enriched leg, and a “inherit” JSON 
writer. I am using the “Insert” join strategy. Here is an example.

 

If the original record is

 

[\{"transport_protocol_id":17,"Enrichment":{}},\{"transport_protocol_id":6,"Enrichment":{}}]

 

Then both lookups succeed, and the enrichment record looks like this

 

[

  \{"network_transport" : {"name" : "udp", "code" : 17, "alias" : "UDP", 
"comment" : "user datagram protocol"}},

  \{"network_transport" : {"name" : "tcp", "code" : 6, "alias" : "TCP", 
"comment" : "transmission control protocol"}

]

 

And the joined record looks like this.

 

[

  {"transport_protocol_id" : 17,"Enrichment" : {

    "network_transport" : \{"name" : "udp","code" : 17,"alias" : 
"UDP","comment" : "user datagram protocol"}}},

  {"transport_protocol_id" : 6,"Enrichment" : {

    "network_transport" : \{"name" : "tcp","code" : 6,"alias" : "TCP","comment" 
: "transmission control protocol"}}}

]

 

However if the first record has a key value that is out of range, such as this

 

[\{"transport_protocol_id":9999,"Enrichment":{}},\{"transport_protocol_id":6,"Enrichment":{}}]

 

Then the first record in the enriched leg will be null, even if the rest of the 
records are correct. However the enrichment is still valid JSON once I have 
processed it in the enrichment leg.

 

[

  \{"network_transport" : null},

  \{"network_transport" : {"name" : "tcp", "code" : 6,"alias" : "TCP", 
"comment" : "transmission control protocol"}}

]

 

But the joined record does not properly process the subsequent records, and the 
content looks like this.

 

[

  {"transport_protocol_id" : 9999,"Enrichment" : {

    "network_transport" : null}},

  {"transport_protocol_id" : 6,"Enrichment" : {

    "network_transport" : "MapRecord[\{name=tcp, alias=TCP, 
comment=transmission control protocol, code=6}]"}}

]

 

Is there any step I could use to ensure the join happens as expected? Or is 
this the same situation as the JIRA I mentioned above? I am not able to use a 
schema based writer as our real case has too many input record types and 
enrichment options that the number of combinations, and hence schemas, could 
not be managed.
{quote}


> JoinEnrichment merged schema incorrect if first Enrichment record is null
> -------------------------------------------------------------------------
>
>                 Key: NIFI-10169
>                 URL: https://issues.apache.org/jira/browse/NIFI-10169
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>         Attachments: Enrich.json
>
>
> From the users mailing list:
> {quote}
> I am performing some enrichments, and sometimes the enrichment look up fails 
> as the item is not there. If the enrichment fails for the first item in an 
> array of records then the merge fails to properly merge subsequent records in 
> the same array. The join processor uses an “infer” JSON reader for the 
> original leg, a schema based reader for the enriched leg, and a “inherit” 
> JSON writer. I am using the “Insert” join strategy. Here is an example.
>  
> If the original record is
>  
> [{"transport_protocol_id":17,"Enrichment":{}},{"transport_protocol_id":6,"Enrichment":{}}]
>  
> Then both lookups succeed, and the enrichment record looks like this
>  
> [
>   {"network_transport" : {"name" : "udp", "code" : 17, "alias" : "UDP", 
> "comment" : "user datagram protocol"}},
>   {"network_transport" : {"name" : "tcp", "code" : 6, "alias" : "TCP", 
> "comment" : "transmission control protocol"}
> ]
>  
> And the joined record looks like this.
>  
> [
>   {"transport_protocol_id" : 17,"Enrichment" : {
>     "network_transport" : {"name" : "udp","code" : 17,"alias" : 
> "UDP","comment" : "user datagram protocol"}}},
>   {"transport_protocol_id" : 6,"Enrichment" : {
>     "network_transport" : {"name" : "tcp","code" : 6,"alias" : 
> "TCP","comment" : "transmission control protocol"}}}
> ]
>  
> However if the first record has a key value that is out of range, such as this
>  
> [{"transport_protocol_id":9999,"Enrichment":{}},{"transport_protocol_id":6,"Enrichment":{}}]
>  
> Then the first record in the enriched leg will be null, even if the rest of 
> the records are correct. However the enrichment is still valid JSON once I 
> have processed it in the enrichment leg.
>  
> [
>   {"network_transport" : null},
>   {"network_transport" : {"name" : "tcp", "code" : 6,"alias" : "TCP", 
> "comment" : "transmission control protocol"}}
> ]
>  
> But the joined record does not properly process the subsequent records, and 
> the content looks like this.
>  
> [
>   {"transport_protocol_id" : 9999,"Enrichment" : {
>     "network_transport" : null}},
>   {"transport_protocol_id" : 6,"Enrichment" : {
>     "network_transport" : "MapRecord[{name=tcp, alias=TCP, 
> comment=transmission control protocol, code=6}]"}}
> ]
>  
> Is there any step I could use to ensure the join happens as expected? Or is 
> this the same situation as the JIRA I mentioned above? I am not able to use a 
> schema based writer as our real case has too many input record types and 
> enrichment options that the number of combinations, and hence schemas, could 
> not be managed.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (NIFI-10169) JoinEnrichment merged schema incorrect if first Enrichment record is null

Reply via email to