[ 
https://issues.apache.org/jira/browse/NIFI-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Bejan updated NIFI-11402:
-------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> PutBigQuery processor case sensitive and Append Record Count issues
> -------------------------------------------------------------------
>
>                 Key: NIFI-11402
>                 URL: https://issues.apache.org/jira/browse/NIFI-11402
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.18.0, 1.19.0, 1.20.0, 1.19.1, 1.21.0
>            Reporter: Julien G.
>            Assignee: Pierre Villard
>            Priority: Major
>             Fix For: 2.0.0, 1.22.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The {{PutBigQuery}} processor seems to have to some issues. I detected 2 
> issues that can be quite blocking.
> For the first one, if you set a hight value in the {{Append Record Count}} 
> property in my case 500 000 and that you have a big flowfile (number of 
> records and size, in my case 54 000 records for a size of 74MB) you will get 
> an error because the message to send is too big. That is quite normal.
> {code:java}
> PutBigQuery[id=16da3694-c886-3b31-929e-0dc81be51bf7] Stream processing 
> failed: java.lang.RuntimeException: io.grpc.StatusRuntimeException: 
> INVALID_ARGUMENT: MessageSize is too large. Max allow: 10000000 Actual: 
> 13593340
> - Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: MessageSize is 
> too large. Max allow: 10000000 Actual: 13593340
> {code}
> So you replace the value with a smaller one, but the error message remains 
> the same. Even if you reduce your flowfile to a single record, you will still 
> get the error. The only way to fix this is to delete the processor and readd 
> it, then reduce the value of the property before running it. Seems to be an 
> issue here. 
> It would also be interesting to give information about the limit of the 
> message sent in the processor documentation because the limit in the previous 
> implementation of the {{PutBigQueryStreaming}} and {{PutBigQueryBatch}} 
> processors was quite straightforward and linked to the size of the file sent. 
> But now the limit is on the {{Message}} but it doesn't really correspond to 
> the size of the FlowFile or the number of records in it.
> The second issue occure if you are using upper case in your field name. For 
> example, you have a table with the following schema:
> {code:java}
> timestamp | TIMESTAMP | REQUIRED
> original_payload | STRING | NULLABLE
> error_message | STRING | REQUIRED
> error_type | STRING REQUIRED
> error_subType | STRING | REQUIRED
> {code}
> and try to put the following event in it:
> {code:java}
> {
>   "original_payload" : "XXXXXXXX",
>   "error_message" : "XXXXXX",
>   "error_type" : "XXXXXXXXXX",
>   "error_subType" : "XXXXXXXXXXX",
>   "timestamp" : "2023-04-07T10:31:45Z"
> }
> {code}
> (in my case this event was in Avro)
> You will get the following telling you that the required field 
> {{error_subtype}} is missing:
> {code:java}
> Cannot convert record to message: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: error_subtype
> {code}
> So to fix it, you need to change your Avro Schema and put {{error_subtype}} 
> instead of {{error_subType}} in it.
> BigQuery columns aren't case sensitive so it should be ok to put a field with 
> upper case but it's not. In the previous implementation of the 
> {{PutBigQueryStreaming}} and {{PutBigQueryBatch}}, we were able to use upper 
> case in the schema fields. So it should still be the case.
> {color:#DE350B}If you get this error, the flowfile will not go in the failure 
> queue but just disappear.{color}
> Link to the slack thread: 
> https://apachenifi.slack.com/archives/C0L9VCD47/p1680866688318739



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to