Need a sample JSON input file for InferAvroSchema

2018-08-13 Thread Kuhfahl, Bob
Trying to develop a sample input file of json data to feed into InferAvroSchema 
so I can feed that into PutDatabaseRecord.
Need a hello world example ☺

But, to get started, I’d be happy to get InferAvroSchema working.  I’m “trial 
and error”-ing the input file hoping to get lucky, but..

No log messages, flow of json data is going to failure,  I’m reading the code 
for InferAvroSchema()
But it just calls  JsonUtil.inferSchema(), so I’ll keep digging down the path 
but… if someone has a sample input that demonstrates how it’s supposed to work, 
I’d be grateful!







Re: Need a sample JSON input file for InferAvroSchema

2018-08-13 Thread Matt Burgess
Bob,

InferAvroSchema can infer types like boolean, integer, long, float, double,
and I believe for JSON can correctly descend into arrays and nested
maps/structs/objects. Here is an example record from NiFi provenance data
that has most of those covered (except bool and float/double, but you can
add those):

{
  "eventId" : "7422645d-056e-423b-b280-6305f9daccaa",
  "eventOrdinal" : 0,
  "eventType" : "CREATE",
  "timestampMillis" : 1496934288944,
  "timestamp" : "2017-06-08T15:04:48.944Z",
  "durationMillis" : -1,
  "lineageStart" : 1496934288930,
  "componentId" : "8821e5d8-015c-1000-30b0-f7211bbf43e5",
  "componentType" : "GenerateFlowFile",
  "componentName" : "_GenerateFlowFile",
  "entityId" : "b99a56c6-e032-4396-915e-24186974b84a",
  "entityType" : "org.apache.nifi.flowfile.FlowFile",
  "entitySize" : 52,
  "updatedAttributes" : {
"path" : "./",
"uuid" : "b99a56c6-e032-4396-915e-24186974b84a",
"filename" : "924304881186293"
  },
  "previousAttributes" : { },
  "actorHostname" : "localhost",
  "contentURI" : "
http://localhost:8989/nifi-api/provenance-events/0/content/output";,
  "previousContentURI" : "
http://localhost:8989/nifi-api/provenance-events/0/content/input";,
  "parentIds" : [ ],
  "childIds" : [ ],
  "platform" : "nifi",
  "application" : "NiFi Flow"
}

 Note that the timestamps are longs as InferAvroSchema does not support
Avro logical types (such as timestamp, date, decimal). I'd like to see an
InferRecordSchema that is record-aware, supports time/date types, etc. I
wrote up a Jira a while back to cover it [1] but haven't gotten around to
implementing it yet.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-4109


On Mon, Aug 13, 2018 at 11:02 AM Kuhfahl, Bob  wrote:

> Trying to develop a sample input file of json data to feed into
> InferAvroSchema so I can feed that into PutDatabaseRecord.
>
> Need a hello world example ☺
>
>
>
> But, to get started, I’d be happy to get InferAvroSchema working.  I’m
> “trial and error”-ing the input file hoping to get lucky, but..
>
>
>
> No log messages, flow of json data is going to failure,  I’m reading the
> code for InferAvroSchema()
>
> But it just calls  JsonUtil.inferSchema(), so I’ll keep digging down the
> path but… if someone has a sample input that demonstrates how it’s supposed
> to work, I’d be grateful!
>
>
>
>
>
>
>
>
>
>
>


Re: Need a sample JSON input file for InferAvroSchema -> PutDatabaseRecord

2018-08-14 Thread Kuhfahl, Bob
Sorry for the newbie problems.
For me, I have to format my input file to be more like:

{"producer": [{
"fpa": "MATP",
"owner_producer": "US",
"prod_lvl_cap": "M",
"producer_datetime_last_chg": "20190101",
"producer_userid": "mytest",
"res_prod": "DJ",
"review_date": "20071015"
},
{
"fpa": "ELEC",
"owner_producer": "US",
"prod_lvl_cap": "M",
"producer_datetime_last_chg": "20190101",
"producer_userid": "fdolomite",
"res_prod": "DJ",
"review_date": "2018"
},
{
"fpa": "AFLD",
"owner_producer": "US",
"prod_lvl_cap": "M",
"producer_datetime_last_chg": "20190101",
"producer_userid": "brenda",
"res_prod": "YF",
"review_date": "20140918"
}]}

Such that it will parse.  Anything shaped like what was in previous email will 
not make it past InferAvroSchema.
Once I do this, I can define the JsonPathReader in PutDatabaseRecord to pick up 
this schema from ${inferred.avro.schema}
All this works, and I’m confident PutDatabaseRecord is talking to the database 
as I am getting the error:

Record does not have a value for the Required column 'owner_producer'

The database is the only one that knows that’s a required field.
The data is in the flow, but…. Not being found.
Something is not lined up right…

The schema coming out of InferAvroSchema is:

{
   "type": "record",
   "name": "anything",
   "fields": [{
  "name": "producer",
  "type": {
 "type": "array",
 "items": {
"type": "record",
"name": "producer",
"fields": [{
   "name": "fpa",
   "type": "string",
   "doc": "Type inferred from '\"MATP\"'"
}, {
   "name": "owner_producer",
   "type": "string",
   "doc": "Type inferred from '\"US\"'"
}, {
   "name": "prod_lvl_cap",
   "type": "string",
   "doc": "Type inferred from '\"M\"'"
}, {
   "name": "producer_datetime_last_chg",
   "type": "string",
   "doc": "Type inferred from '\"20190101\"'"
}, {
   "name": "producer_userid",
   "type": "string",
   "doc": "Type inferred from '\"mytest\"'"
}, {
   "name": "res_prod",
   "type": "string",
   "doc": "Type inferred from '\"DJ\"'"
}, {
   "name": "review_date",
   "type": "string",
   "doc": "Type inferred from '\"20071015\"'"
}]
 }
  },
  "doc": "Type inferred from 
'[{\"fpa\":\"MATP\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"mytest\",\"res_prod\":\"DJ\",\"review_date\":\"20071015\"},{\"midb_sk\":\"10035001359911\",\"midb_source_entity\":\"FacAka\",\"fpa\":\"ELEC\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"fdolomite\",\"res_prod\":\"DJ\",\"review_date\":\"2018\"},{\"fpa\":\"AFLD\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"brenda\",\"res_prod\":\"YF\",\"review_date\":\"20140918\"}]'"
   }]
}


From: Matt Burgess 
Reply-To: "users@nifi.ap

Re: Need a sample JSON input file for InferAvroSchema -> PutDatabaseRecord

2018-08-14 Thread Matt Burgess
;Type inferred from '\"US\"'"
>
> }, {
>
>"name": "prod_lvl_cap",
>
>"type": "string",
>
>"doc": "Type inferred from '\"M\"'"
>
> }, {
>
>"name": "producer_datetime_last_chg",
>
>"type": "string",
>
>    "doc": "Type inferred from '\"20190101\"'"
>
> }, {
>
>"name": "producer_userid",
>
>"type": "string",
>
>"doc": "Type inferred from '\"mytest\"'"
>
> }, {
>
>"name": "res_prod",
>
>"type": "string",
>
>"doc": "Type inferred from '\"DJ\"'"
>
> }, {
>
>"name": "review_date",
>
>"type": "string",
>
>"doc": "Type inferred from '\"20071015\"'"
>
> }]
>
>  }
>
>   },
>
>   "doc": "Type inferred from
> '[{\"fpa\":\"MATP\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"mytest\",\"res_prod\":\"DJ\",\"review_date\":\"20071015\"},{\"midb_sk\":\"10035001359911\",\"midb_source_entity\":\"FacAka\",\"fpa\":\"ELEC\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"fdolomite\",\"res_prod\":\"DJ\",\"review_date\":\"2018\"},{\"fpa\":\"AFLD\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"brenda\",\"res_prod\":\"YF\",\"review_date\":\"20140918\"}]'"
>
>}]
>
> }
>
>
>
>
>
> *From: *Matt Burgess 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Monday, August 13, 2018 at 11:19 AM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Need a sample JSON input file for InferAvroSchema
>
>
>
> Bob,
>
>
>
> InferAvroSchema can infer types like boolean, integer, long, float,
> double, and I believe for JSON can correctly descend into arrays and nested
> maps/structs/objects. Here is an example record from NiFi provenance data
> that has most of those covered (except bool and float/double, but you can
> add those):
>
>
>
> {
>
>   "eventId" : "7422645d-056e-423b-b280-6305f9daccaa",
>
>   "eventOrdinal" : 0,
>
>   "eventType" : "CREATE",
>
>   "timestampMillis" : 1496934288944,
>
>   "timestamp" : "2017-06-08T15:04:48.944Z",
>
>   "durationMillis" : -1,
>
>   "lineageStart" : 1496934288930,
>
>   "componentId" : "8821e5d8-015c-1000-30b0-f7211bbf43e5",
>
>   "componentType" : "GenerateFlowFile",
>
>   "componentName" : "_GenerateFlowFile",
>
>   "entityId" : "b99a56c6-e032-4396-915e-24186974b84a",
>
>   "entityType" : "org.apache.nifi.flowfile.FlowFile",
>
>   "entitySize" : 52,
>
>   "updatedAttributes" : {
>
> "path" : "./",
>
> "uuid" : "b99a56c6-e032-4396-915e-24186974b84a",
>
> "filename" : "924304881186293"
>
>   },
>
>   "previousAttributes" : { },
>
>   "actorHostname" : "localhost",
>
>   "contentURI" : "
> http://localhost:8989/nifi-api/provenance-events/0/content/output";,
>
>   "previousContentURI" : "
> http://localhost:8989/nifi-api/provenance-events/0/content/input";,
>
>   "parentIds" : [ ],
>
>   "childIds" : [ ],
>
>   "platform" : "nifi",
>
>   "application" : "NiFi Flow"
>
> }
>
>
>
>  Note that the timestamps are longs as InferAvroSchema does not support
> Avro logical types (such as timestamp, date, decimal). I'd like to see an
> InferRecordSchema that is record-aware, supports time/date types, etc. I
> wrote up a Jira a while back to cover it [1] but haven't gotten around to
> implementing it yet.
>
>
>
> Regards,
>
> Matt
>
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-4109
>
>
>
>
>
> On Mon, Aug 13, 2018 at 11:02 AM Kuhfahl, Bob  wrote:
>
> Trying to develop a sample input file of json data to feed into
> InferAvroSchema so I can feed that into PutDatabaseRecord.
>
> Need a hello world example ☺
>
>
>
> But, to get started, I’d be happy to get InferAvroSchema working.  I’m
> “trial and error”-ing the input file hoping to get lucky, but..
>
>
>
> No log messages, flow of json data is going to failure,  I’m reading the
> code for InferAvroSchema()
>
> But it just calls  JsonUtil.inferSchema(), so I’ll keep digging down the
> path but… if someone has a sample input that demonstrates how it’s supposed
> to work, I’d be grateful!
>
>
>
>
>
>
>
>
>
>
>
>