Re: Unable to send JSON to BigQuery

2019-07-03 Thread Denes Arvay
Hi Nicolas,

It seems that NiFi expects to have the "mode" field being present, even
though based on the BigQuery doc [1] it's optional.
I'd suggest trying adding it to every name-type pair with its default value
"NULLABLE".  (i.e. { "name": "Consent", "type": "record", *"mode":
"NULLABLE"*, "fields": [ { "name": "id", "type": "STRING", *"mode":
"NULLABLE"* }, ...)

Let me know if it solved the issue. If yes, I'll file a Jira ticket to fix
it.

Best,
Denes

[1]
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema

On Wed, Jul 3, 2019 at 11:07 AM Nicolas Delsaux 
wrote:

>   I'm using Apache Nifi 1.9.2 and trying to post JSON content to a
> BigQuery table.
>
> There seems to be something wrong, sicne I get
>
>
> 2019-07-03 08:35:24,964 ERROR [Timer-Driven Process Thread-8]
> o.a.n.p.gcp.bigquery.PutBigQueryBatch
> PutBigQueryBatch[id=b2b1c6bf-016b-1000-e8c9-b3f9fb5b417e] null:
> java.lang.NullPointerException
> java.lang.NullPointerException: null
>  at
>
> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.mapToField(BigQueryUtils.java:42)
>  at
>
> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.listToFields(BigQueryUtils.java:68)
>  at
>
> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.schemaFromString(BigQueryUtils.java:80)
>  at
>
> org.apache.nifi.processors.gcp.bigquery.PutBigQueryBatch.onTrigger(PutBigQueryBatch.java:277)
>  at
>
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>  at
>
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
>  at
>
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
>  at
>
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>  at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>
>
> Where can it come from ? And how can i fix it ?
>
>
>  From the stack, I'm understanding there is something wrong with my
> BigQuery schema (which is however recognized as valid by BigQuery).
>
>
> My schema is
>
>
> [
>{
>  "name": "Consent",
>  "type": "record",
>  "fields": [
>{
>  "name": "id",
>  "type": "STRING"
>},
>{
>  "name": "identity",
>  "type": "record",
>  "fields": [
>{
>  "name": "id",
>  "type": "STRING"
>},
>{
>  "name": "type",
>  "type": "STRING"
>},
>{
>  "name": "businessUnit",
>  "type": "STRING"
>}
>  ]
>},
>{
>  "name": "finality",
>  "type": "STRING"
>},
>{
>  "name": "source",
>  "type": "record",
>  "fields": [
>{
>  "name": "id",
>  "type": "STRING"
>},
>{
>  "name": "type",
>  "type": "STRING"
>},
>{
>  "name": "origin",
>  "type": "STRING"
>},
>{
>  "name": "collaborator",
>  "type": "record",
>  "fields": [
>{
>  "name": "id",
>  "type": "STRING"
>},
>{
>  "name": "type",
>  "type": "STRING"
>}
>  ]
>}
>  ]
>},
>{
>  "name": "consentDate",
>  "type": "TIMESTAMP"
>},
>{
>  "name": "expiryDate",
>  "type": "TIMESTAMP"
>},
>{
>  "name": "expired",
>  "type": "BOOLEAN"
>},
>{
>  "name": "createdBy",
>  "type": "STRING"
>},
>{
>  "name": "createdDate",
>  "type": "TIMESTAMP"
>}
>  ]
>}
> ]
>
>
> What can cause the trouble ?
>
>
> Thanks
>
>


Re: Unable to send JSON to BigQuery

2019-07-03 Thread Nicolas Delsaux

I'm ivnestigating the same way.

I've added the mode field everywhere, but still have the issue.

I'll try to create a minimal reproducing schema for your ticket (by 
running unit tests)


Le 03/07/2019 à 11:28, Denes Arvay a écrit :

Hi Nicolas,

It seems that NiFi expects to have the "mode" field being present, 
even though based on the BigQuery doc [1] it's optional.
I'd suggest trying adding it to every name-type pair with its default 
value "NULLABLE".  (i.e. { "name": "Consent", "type": "record", 
*"mode": "NULLABLE"*, "fields": [ { "name": "id", "type": "STRING", 
*"mode": "NULLABLE"* }, ...)


Let me know if it solved the issue. If yes, I'll file a Jira ticket to 
fix it.


Best,
Denes

[1] 
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema


On Wed, Jul 3, 2019 at 11:07 AM Nicolas Delsaux 
mailto:nicolas.dels...@gmx.fr>> wrote:


  I'm using Apache Nifi 1.9.2 and trying to post JSON content to a
BigQuery table.

There seems to be something wrong, sicne I get


2019-07-03 08:35:24,964 ERROR [Timer-Driven Process Thread-8]
o.a.n.p.gcp.bigquery.PutBigQueryBatch
PutBigQueryBatch[id=b2b1c6bf-016b-1000-e8c9-b3f9fb5b417e] null:
java.lang.NullPointerException
java.lang.NullPointerException: null
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.mapToField(BigQueryUtils.java:42)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.listToFields(BigQueryUtils.java:68)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.schemaFromString(BigQueryUtils.java:80)
 at

org.apache.nifi.processors.gcp.bigquery.PutBigQueryBatch.onTrigger(PutBigQueryBatch.java:277)
 at

org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at

org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
 at

org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
 at

org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


Where can it come from ? And how can i fix it ?


 From the stack, I'm understanding there is something wrong with my
BigQuery schema (which is however recognized as valid by BigQuery).


My schema is


[
   {
 "name": "Consent",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "identity",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   },
   {
 "name": "businessUnit",
 "type": "STRING"
   }
 ]
   },
   {
 "name": "finality",
 "type": "STRING"
   },
   {
 "name": "source",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   },
   {
 "name": "origin",
 "type": "STRING"
   },
   {
 "name": "collaborator",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   }
 ]
   }
 ]
   },
   {
 "name": "consentDate",
 "type": "TIMESTAMP"
   },
   {
 "name": "expiryDate",
 "type": "TIMESTAMP"
   },
   {
 "name": "expired",
 "type": "BOOLEAN"
   },
   {
 "name": "createdBy",
 "type": "S

Re: Unable to send JSON to BigQuery

2019-07-03 Thread Nicolas Delsaux
So I have a simple test that replicate the bug. Do I have to open the 
issue in Apache JIRA (I already have access to) ?


Le 03/07/2019 à 11:28, Denes Arvay a écrit :

Hi Nicolas,

It seems that NiFi expects to have the "mode" field being present, 
even though based on the BigQuery doc [1] it's optional.
I'd suggest trying adding it to every name-type pair with its default 
value "NULLABLE".  (i.e. { "name": "Consent", "type": "record", 
*"mode": "NULLABLE"*, "fields": [ { "name": "id", "type": "STRING", 
*"mode": "NULLABLE"* }, ...)


Let me know if it solved the issue. If yes, I'll file a Jira ticket to 
fix it.


Best,
Denes

[1] 
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema


On Wed, Jul 3, 2019 at 11:07 AM Nicolas Delsaux 
mailto:nicolas.dels...@gmx.fr>> wrote:


  I'm using Apache Nifi 1.9.2 and trying to post JSON content to a
BigQuery table.

There seems to be something wrong, sicne I get


2019-07-03 08:35:24,964 ERROR [Timer-Driven Process Thread-8]
o.a.n.p.gcp.bigquery.PutBigQueryBatch
PutBigQueryBatch[id=b2b1c6bf-016b-1000-e8c9-b3f9fb5b417e] null:
java.lang.NullPointerException
java.lang.NullPointerException: null
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.mapToField(BigQueryUtils.java:42)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.listToFields(BigQueryUtils.java:68)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.schemaFromString(BigQueryUtils.java:80)
 at

org.apache.nifi.processors.gcp.bigquery.PutBigQueryBatch.onTrigger(PutBigQueryBatch.java:277)
 at

org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at

org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
 at

org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
 at

org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


Where can it come from ? And how can i fix it ?


 From the stack, I'm understanding there is something wrong with my
BigQuery schema (which is however recognized as valid by BigQuery).


My schema is


[
   {
 "name": "Consent",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "identity",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   },
   {
 "name": "businessUnit",
 "type": "STRING"
   }
 ]
   },
   {
 "name": "finality",
 "type": "STRING"
   },
   {
 "name": "source",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   },
   {
 "name": "origin",
 "type": "STRING"
   },
   {
 "name": "collaborator",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   }
 ]
   }
 ]
   },
   {
 "name": "consentDate",
 "type": "TIMESTAMP"
   },
   {
 "name": "expiryDate",
 "type": "TIMESTAMP"
   },
   {
 "name": "expired",
 "type": "BOOLEAN"
   },
   {
 "name": "createdBy",
 "type": "STRING"
   },
   {
 "name": "createdDate

Re: Unable to send JSON to BigQuery

2019-07-03 Thread Denes Arvay
Yes, and please attach the test cases too.
Does this mean that your original issue hasn't been resolved yet by adding
the "mode" fields?

On Wed, Jul 3, 2019, 19:27 Nicolas Delsaux  wrote:

> So I have a simple test that replicate the bug. Do I have to open the
> issue in Apache JIRA (I already have access to) ?
> Le 03/07/2019 à 11:28, Denes Arvay a écrit :
>
> Hi Nicolas,
>
> It seems that NiFi expects to have the "mode" field being present, even
> though based on the BigQuery doc [1] it's optional.
> I'd suggest trying adding it to every name-type pair with its default
> value "NULLABLE".  (i.e. { "name": "Consent", "type": "record", *"mode":
> "NULLABLE"*, "fields": [ { "name": "id", "type": "STRING", *"mode":
> "NULLABLE"* }, ...)
>
> Let me know if it solved the issue. If yes, I'll file a Jira ticket to fix
> it.
>
> Best,
> Denes
>
> [1]
> https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema
>
> On Wed, Jul 3, 2019 at 11:07 AM Nicolas Delsaux 
> wrote:
>
>>   I'm using Apache Nifi 1.9.2 and trying to post JSON content to a
>> BigQuery table.
>>
>> There seems to be something wrong, sicne I get
>>
>>
>> 2019-07-03 08:35:24,964 ERROR [Timer-Driven Process Thread-8]
>> o.a.n.p.gcp.bigquery.PutBigQueryBatch
>> PutBigQueryBatch[id=b2b1c6bf-016b-1000-e8c9-b3f9fb5b417e] null:
>> java.lang.NullPointerException
>> java.lang.NullPointerException: null
>>  at
>>
>> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.mapToField(BigQueryUtils.java:42)
>>  at
>>
>> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.listToFields(BigQueryUtils.java:68)
>>  at
>>
>> org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.schemaFromString(BigQueryUtils.java:80)
>>  at
>>
>> org.apache.nifi.processors.gcp.bigquery.PutBigQueryBatch.onTrigger(PutBigQueryBatch.java:277)
>>  at
>>
>> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>>  at
>>
>> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
>>  at
>>
>> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
>>  at
>>
>> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>>  at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>  at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>  at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>  at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Where can it come from ? And how can i fix it ?
>>
>>
>>  From the stack, I'm understanding there is something wrong with my
>> BigQuery schema (which is however recognized as valid by BigQuery).
>>
>>
>> My schema is
>>
>>
>> [
>>{
>>  "name": "Consent",
>>  "type": "record",
>>  "fields": [
>>{
>>  "name": "id",
>>  "type": "STRING"
>>},
>>{
>>  "name": "identity",
>>  "type": "record",
>>  "fields": [
>>{
>>  "name": "id",
>>  "type": "STRING"
>>},
>>{
>>  "name": "type",
>>  "type": "STRING"
>>},
>>{
>>  "name": "businessUnit",
>>  "type": "STRING"
>>}
>>  ]
>>},
>>{
>>  "name": "finality",
>>  "type": "STRING"
>>},
>>{
>>  "name": "source",
>>  "type": "record",
>>  "fields": [
>>{
>>  "name": "id",
>>  "type": "STRING"
>>},
>>{
>>  "name": "type",
>>  "type": "STRING"
>>},
>>{
>>  "name": "origin",
>>  "type": "STRING"
>>},
>>{
>>  "name": "collaborator",
>>  "type": "record",
>>  "fields": [
>>{
>>  "name": "id",
>>  "type": "STRING"
>>},
>>{
>>  "name": "type",
>>  "type": "STRING"
>>}
>>  ]
>>}
>>  ]
>>},
>>{
>>  "name": "consentDate",
>>  "type": "TIMESTAMP"
>>},
>>{
>>  "name": "expiryDate",
>>  "type": "TIMESTAMP"
>>},
>>{
>>  "name": "expired",
>>  "type": "BOOLEAN"
>>},
>>{
>> 

Re: Unable to send JSON to BigQuery

2019-07-03 Thread Nicolas Delsaux
Well, if you take a look at my schema, the error is subtle, but obvious 
(once I've added the tests and modified the code).


I've set "Consent" to be of typ "record", not "RECORD". Yes, it was a 
case issue.


So I've modified code in BigQueryUtils to use uppercased type in all 
cases, AND an exception which is thrown if string corresponds to no type.


Finally, I've set a default value of NULLABLE for mode.


All these changes fix the bug described in 
https://issues.apache.org/jira/browse/NIFI-6422


I'm also trying to create the pull request

Le 03/07/2019 à 19:51, Denes Arvay a écrit :

Yes, and please attach the test cases too.
Does this mean that your original issue hasn't been resolved yet by 
adding the "mode" fields?


On Wed, Jul 3, 2019, 19:27 Nicolas Delsaux > wrote:


So I have a simple test that replicate the bug. Do I have to open
the issue in Apache JIRA (I already have access to) ?

Le 03/07/2019 à 11:28, Denes Arvay a écrit :

Hi Nicolas,

It seems that NiFi expects to have the "mode" field being
present, even though based on the BigQuery doc [1] it's optional.
I'd suggest trying adding it to every name-type pair with its
default value "NULLABLE".  (i.e. { "name": "Consent", "type":
"record", *"mode": "NULLABLE"*, "fields": [ { "name": "id",
"type": "STRING", *"mode": "NULLABLE"* }, ...)

Let me know if it solved the issue. If yes, I'll file a Jira
ticket to fix it.

Best,
Denes

[1]

https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema

On Wed, Jul 3, 2019 at 11:07 AM Nicolas Delsaux
mailto:nicolas.dels...@gmx.fr>> wrote:

  I'm using Apache Nifi 1.9.2 and trying to post JSON content
to a
BigQuery table.

There seems to be something wrong, sicne I get


2019-07-03 08:35:24,964 ERROR [Timer-Driven Process Thread-8]
o.a.n.p.gcp.bigquery.PutBigQueryBatch
PutBigQueryBatch[id=b2b1c6bf-016b-1000-e8c9-b3f9fb5b417e] null:
java.lang.NullPointerException
java.lang.NullPointerException: null
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.mapToField(BigQueryUtils.java:42)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.listToFields(BigQueryUtils.java:68)
 at

org.apache.nifi.processors.gcp.bigquery.BigQueryUtils.schemaFromString(BigQueryUtils.java:80)
 at

org.apache.nifi.processors.gcp.bigquery.PutBigQueryBatch.onTrigger(PutBigQueryBatch.java:277)
 at

org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at

org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
 at

org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
 at

org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at
org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


Where can it come from ? And how can i fix it ?


 From the stack, I'm understanding there is something wrong
with my
BigQuery schema (which is however recognized as valid by
BigQuery).


My schema is


[
   {
 "name": "Consent",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "identity",
 "type": "record",
 "fields": [
   {
 "name": "id",
 "type": "STRING"
   },
   {
 "name": "type",
 "type": "STRING"
   },
   {
 "name": "businessUnit",
 "type": "STRING"
   }
 ]
   },
   {
 "name": "finality",
 "type": "STRING"
   },
   {
 "n