Re: NiFi ram usage

2017-08-30 Thread Adam Lamar
Jeff,

This was a new installation so I actually hadn't set up any flows yet. NiFi
wouldn't start immediately after installation (before I could configure any
flows) because the system had too little ram. The 1.1GB figure is private
(RSS) memory usage, which exceeded the 1GB instance limit (and the instance
had no swap configured).

Is there any system requirements documentation? I couldn't find any docs on
minimum system specs, so I guess I'm wondering if the ram usage is known
and expected, and if there are any ways to get the ram usage down.

Thanks in advance,
Adam

​


Re: NiFi ram usage

2017-08-30 Thread Jeff
HI Adam,

Can you provide some more detail about what your NiFi flow is like?  Are
you using custom processors?  I regularly use NiFi with the default
bootstrap settings without issue, but if you're bringing lots of data into
memory, have lots of flowfiles being processed concurrently, etc, memory
usage can ramp up.  Flow design can have quite a bit of impact on how much
ram you need allocated to the JVM to avoid OOMEs.

On Wed, Aug 30, 2017 at 3:15 PM Adam Lamar  wrote:

> Hi everybody,
>
> I recently started up a new cloud Linux instance with 1GB of ram to do
> some quick tasks in NiFi. I noticed NiFi kept dying without much
> information in the logs - it just seemed to stop during startup.
>
> Eventually I realized the system was running out of memory and OOM killing
> the process, hence the lack of information in the NiFi logs. Empirically
> version 1.3.0 needs about 1.1GB of RAM to start, and my flow caused an
> additional 200MB of ram usage.
>
> Are there any recommendations to get NiFi running with a lighter
> footprint? I noted the default 512MB heap limits in the bootstrap config
> (which I didn't change) so I'm guessing the ram usage is related to NiFi's
> plethora of processors.
>
> Cheers,
> Adam
>


Re: JSON array chunking

2017-08-30 Thread Neil Derraugh
I should have mentioned I tried starting with a JsonPathReader before the
AvroReader.  I had a property I was calling root with a value of $.  I can
post details about that too if it would be helpful.

On Wed, Aug 30, 2017 at 8:08 PM, Neil Derraugh <
neil.derra...@intellifylearning.com> wrote:

> I have arbitrary JSON arrays that I want to split into chunks.  I've been
> (unsuccessfully) trying to figure this out with InferAvroSchema ->
> SplitJson(AvroReader, JsonRecordSetWriter).
>
> Here's an example payload:
> [{
> "id": "56740f4b-48de-0502-afdc-59a463b3f6dc",
> "account_id": "b0dad7e2-7bb9-4ca9-b9fd-134870656eb2",
> "contact_id": "a0ebd53a-77c5-e2ea-4787-59a463053b1b",
> "date_modified": 1503959931000,
> "deleted": 0
>   },
>   {
> "id": "1ac80e25-7f28-f5c6-bac0-59a4636ef31f",
> "account_id": "71d4904e-f8f1-4209-bff9-4d080057ea84",
> "contact_id": "e429bfe6-9c89-8b81-9ee6-59a463fc7fd8",
> "date_modified": 1503959873000,
> "deleted": 0
>   }]
>
> Here's the schema that gets inferred (the AvroReader's Avro Record Name is
> "root"):
> {
>   "type": "array",
>   "items": {
> "type": "record",
> "name": "root",
> "fields": [
>   {
> "name": "id",
> "type": "string",
> "doc": "Type inferred from '\"56740f4b-48de-0502-afdc-
> 59a463b3f6dc\"'"
>   },
>   {
> "name": "account_id",
> "type": "string",
> "doc": "Type inferred from '\"b0dad7e2-7bb9-4ca9-b9fd-
> 134870656eb2\"'"
>   },
>   {
> "name": "contact_id",
> "type": "string",
> "doc": "Type inferred from '\"a0ebd53a-77c5-e2ea-4787-
> 59a463053b1b\"'"
>   },
>   {
> "name": "date_modified",
> "type": "long",
> "doc": "Type inferred from '1503959931000'"
>   },
>   {
> "name": "deleted",
> "type": "int",
> "doc": "Type inferred from '0'"
>   }
> ]
>   }
> }
>
> When I use ${inferred.avro.schema} for both the AvroReader and the
> JsonRecordSetWriter I get:
> SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5] Failed to create
> Record Writer for StandardFlowFileRecord[uuid=45d7a0d2-258a-4f40-b5f9-
> 4886eb2c2a76,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1504118228480-325, container=default,
> section=325], offset=0, length=86462199],offset=0,
> name=accounts-contacts.json.avro,size=86462199]; routing to failure:
> org.apache.nifi.schema.access.SchemaNotFoundException: 
> org.apache.avro.AvroRuntimeException:
> Not a record: {"type":"array","items":{"type":"record","name":"root","
> fields":[{"name":"id","type":"string","doc":"Type inferred from
> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"
> account_id","type":"string","doc":"Type inferred from
> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"
> contact_id","type":"string","doc":"Type inferred from
> '\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"
> date_modified","type":"long","doc":"Type inferred from
> '1503959931000'"},{"name":"deleted","type":"int","doc":"Type inferred
> from '0'"}]}}.
>
> The stack trace:
> 2017-08-30 19:42:21,692 ERROR [Timer-Driven Process Thread-9]
> o.a.nifi.processors.standard.SplitRecord 
> SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5]
> Failed to create Record Writer for StandardFlowFileRecord[uuid=
> a5f720cf-98a8-4c29-bd91-098c7f25448d,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1504121074997-336,
> container=default, section=336], offset=1013917, 
> length=454],offset=0,name=626851422080935,size=454];
> routing to failure: org.apache.nifi.schema.access.SchemaNotFoundException:
> org.apache.avro.AvroRuntimeException: Not a record:
> {"type":"array","items":{"type":"record","name":"root","
> fields":[{"name":"id","type":"string","doc":"Type inferred from
> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"
> account_id","type":"string","doc":"Type inferred from
> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"
> contact_id","type":"string","doc":"Type inferred from
> '\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"
> date_modified","type":"long","doc":"Type inferred from
> '1503959931000'"},{"name":"deleted","type":"int","doc":"Type inferred
> from '0'"}]}}
> org.apache.nifi.schema.access.SchemaNotFoundException: 
> org.apache.avro.AvroRuntimeException:
> Not a record: {"type":"array","items":{"type":"record","name":"root","
> fields":[{"name":"id","type":"string","doc":"Type inferred from
> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"
> account_id","type":"string","doc":"Type inferred from
> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"
> contact_id","type":"string","doc":"Type inferred from
> '\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"
> date_modified","type":"long","doc":"Type inferred from
> '1503959931000'"},{"name":"deleted","type":"int","doc":"Type inferred
> from '0'"}]}}
> at 

JSON array chunking

2017-08-30 Thread Neil Derraugh
I have arbitrary JSON arrays that I want to split into chunks.  I've been
(unsuccessfully) trying to figure this out with InferAvroSchema ->
SplitJson(AvroReader, JsonRecordSetWriter).

Here's an example payload:
[{
"id": "56740f4b-48de-0502-afdc-59a463b3f6dc",
"account_id": "b0dad7e2-7bb9-4ca9-b9fd-134870656eb2",
"contact_id": "a0ebd53a-77c5-e2ea-4787-59a463053b1b",
"date_modified": 1503959931000,
"deleted": 0
  },
  {
"id": "1ac80e25-7f28-f5c6-bac0-59a4636ef31f",
"account_id": "71d4904e-f8f1-4209-bff9-4d080057ea84",
"contact_id": "e429bfe6-9c89-8b81-9ee6-59a463fc7fd8",
"date_modified": 1503959873000,
"deleted": 0
  }]

Here's the schema that gets inferred (the AvroReader's Avro Record Name is
"root"):
{
  "type": "array",
  "items": {
"type": "record",
"name": "root",
"fields": [
  {
"name": "id",
"type": "string",
"doc": "Type inferred from
'\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"
  },
  {
"name": "account_id",
"type": "string",
"doc": "Type inferred from
'\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"
  },
  {
"name": "contact_id",
"type": "string",
"doc": "Type inferred from
'\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"
  },
  {
"name": "date_modified",
"type": "long",
"doc": "Type inferred from '1503959931000'"
  },
  {
"name": "deleted",
"type": "int",
"doc": "Type inferred from '0'"
  }
]
  }
}

When I use ${inferred.avro.schema} for both the AvroReader and the
JsonRecordSetWriter I get:
SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5] Failed to create
Record Writer for
StandardFlowFileRecord[uuid=45d7a0d2-258a-4f40-b5f9-4886eb2c2a76,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1504118228480-325,
container=default, section=325], offset=0,
length=86462199],offset=0,name=accounts-contacts.json.avro,size=86462199];
routing to failure: org.apache.nifi.schema.access.SchemaNotFoundException:
org.apache.avro.AvroRuntimeException: Not a record:
{"type":"array","items":{"type":"record","name":"root","fields":[{"name":"id","type":"string","doc":"Type
inferred from
'\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"account_id","type":"string","doc":"Type
inferred from
'\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"contact_id","type":"string","doc":"Type
inferred from
'\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"date_modified","type":"long","doc":"Type
inferred from '1503959931000'"},{"name":"deleted","type":"int","doc":"Type
inferred from '0'"}]}}.

The stack trace:
2017-08-30 19:42:21,692 ERROR [Timer-Driven Process Thread-9]
o.a.nifi.processors.standard.SplitRecord
SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5] Failed to create
Record Writer for
StandardFlowFileRecord[uuid=a5f720cf-98a8-4c29-bd91-098c7f25448d,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1504121074997-336,
container=default, section=336], offset=1013917,
length=454],offset=0,name=626851422080935,size=454]; routing to failure:
org.apache.nifi.schema.access.SchemaNotFoundException:
org.apache.avro.AvroRuntimeException: Not a record:
{"type":"array","items":{"type":"record","name":"root","fields":[{"name":"id","type":"string","doc":"Type
inferred from
'\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"account_id","type":"string","doc":"Type
inferred from
'\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"contact_id","type":"string","doc":"Type
inferred from
'\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"date_modified","type":"long","doc":"Type
inferred from '1503959931000'"},{"name":"deleted","type":"int","doc":"Type
inferred from '0'"}]}}
org.apache.nifi.schema.access.SchemaNotFoundException:
org.apache.avro.AvroRuntimeException: Not a record:
{"type":"array","items":{"type":"record","name":"root","fields":[{"name":"id","type":"string","doc":"Type
inferred from
'\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"account_id","type":"string","doc":"Type
inferred from
'\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"contact_id","type":"string","doc":"Type
inferred from
'\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"date_modified","type":"long","doc":"Type
inferred from '1503959931000'"},{"name":"deleted","type":"int","doc":"Type
inferred from '0'"}]}}
at
org.apache.nifi.schema.access.AvroSchemaTextStrategy.getSchema(AvroSchemaTextStrategy.java:55)
at
org.apache.nifi.serialization.SchemaRegistryService.getSchema(SchemaRegistryService.java:112)
at sun.reflect.GeneratedMethodAccessor1466.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:89)
at com.sun.proxy.$Proxy144.getSchema(Unknown Source)
at

Re: Adding a Receive Date Time stamp to MQTT message.

2017-08-30 Thread Bruce Lowther
Koji, thank you for your feedback.  With a little more review and brain
calories, I realized that I had incorrectly accepted default option for
'Skip header line'.

Skip Header Line: false  --translate--> Consider first line to be header.

In my case this is not possible as my MQTT feed does not provide a header
line, just single data row in tsv format.   I think the UpdateAttribute
processor was seeing a 'Temp' header value and trying to translate it as a
float.

Once I found that configuration issue, I am now getting a ReceiveDT stamp
as soon as MQTT Message is received from source.

Thank you for your suggestions.  Spurred me on to try different alternatives

Bruce W. Lowther

On Mon, Aug 28, 2017 at 2:49 AM, Koji Kawamura 
wrote:

> Hi Bruce,
>
> By looking at the stacktrace, the exception complains that
> CSVRecordReader could not read record correctly at PutDatabaseRecord.
> Probably you need to use JsonTreeReader at PutDatabaseRecord since you
> used JsonRecordSetWriter at UpdateRecord.
> The error looks as if PutDatabaseRecord tried to parse JSON as CSV.
>
> CSV -> UpdateRecord -> JSON -> PutDatabaseRecord
>
> Thanks,
> Koji
>
> On Mon, Aug 28, 2017 at 1:43 AM, Bruce Lowther 
> wrote:
> > 1.3.0
> >
> > 06/05/2017 12:31:48 EDT
> >
> > Tagged nifi-1.3.0-RC1
> >
> >
> > I am building a workflow for receiving MQTT messages then insert into
> > database.  Records transmitted by my simple IOT device arrive on MQTT
> queue
> > as a csv string and I'm trying to use record operators to insert into
> > database.  My IOT device has no clock, so I am augmenting the received
> MQTT
> > messages by adding a Date time stamp property on the flow files using
> > UpdateAttribute immediately after receipt.
> >
> > (I know I could use insert to database time.. but I want to try to get
> the
> > time stamp closest to the origination of data)
> >
> > I am using AVRO schema registry and added property 'mqtt_house_sensor'
> >
> > {
> >  "type": "record",
> >  "namespace": "bruceco",
> >  "name": "mqttReading",
> >  "fields": [
> >{ "name": "DID", "type": "string" },
> >{ "name": "Version", "type": "string" },
> >{ "name": "Temp", "type": "float" },
> >{ "name": "RHumid", "type": "float" },
> >{ "name": "ReceiveDT", "type": "string" }
> >  ]
> > }
> >
> > sqlite event schema
> >
> > sqlite> .schema mqtt_event
> >
> > CREATE TABLE mqtt_event (id integer primary key autoincrement, dbdt
> datetime
> > default (datetime('now','localtime')), DID text, Version text, Temp
> float,
> > RHumid float, ReceiveDT datetime);
> >
> > Using the PutDatabaseRecord processor I can successfully insert rows into
> > database with  this schema.
> >
> > However I cannot successfully update mqtt records with the datetime stamp
> > that I capture upon receive.
> >
> > Example successful rows:
> >
> > sqlite> select * from mqtt_event limit 2;
> >
> > 1|2017-08-27 11:45:09|v0.1|0x01|71.950844727|53.707629395|
> >
> > 2|2017-08-27 11:45:10|v0.1|0x01|72.683051758|54.084741211|
> >
> >
> > I am trying to use the use UpdateRecord processor to transfer date time
> > stamp property to record.
> > On the UpdateRecord processor, I set Replacement Value Strategy: Literal
> > Value; RecordReader to CSVReader and RecordWriter to JsonRecordSetWriter.
> >
> > I have added a property name:
> > /ReceiveDT
> >
> > And Value:
> > ${ReceiveDT:format("/MM/dd HH:mm:ss")}
> >
> > Actually I've tried several variations of the value but I can't seem to
> get
> > them to work.
> >
> > Here is my current error result from the nifi-app.log:
> >
> > 2017-08-27 12:37:27,376 ERROR [Timer-Driven Process Thread-7]
> > o.a.n.p.standard.PutDatabaseRecord PutDatabaseRecord[id=015d1000-7fbf
> >
> > -10d2-f081-23fe14b66269] Failed to process session due to
> > org.apache.nifi.processor.exception.ProcessException: Failed to process
> S
> >
> > tandardFlowFileRecord[uuid=35dae543-f631-4843-9f8f-afc72f2fec77,claim=
> StandardContentClaim
> > [resourceClaim=StandardResourceClaim[id=
> >
> > 1503842607224-1, container=default, section=1], offset=168373,
> > length=122],offset=0,name=9285334964575,size=122] due to java.lang.R
> >
> > untimeException: java.io.IOException: (line 2) invalid char between
> > encapsulated token and delimiter: {}
> >
> > org.apache.nifi.processor.exception.ProcessException: Failed to process
> > StandardFlowFileRecord[uuid=35dae543-f631-4843-9f8f-afc72f2
> >
> > fec77,claim=StandardContentClaim
> > [resourceClaim=StandardResourceClaim[id=1503842607224-1,
> container=default,
> > section=1], offset=168
> >
> > 373, length=122],offset=0,name=9285334964575,size=122] due to
> > java.lang.RuntimeException: java.io.IOException: (line 2) invalid cha
> >
> > r between encapsulated token and delimiter
> >
> > at
> > org.apache.nifi.processor.util.pattern.ExceptionHandler.
> lambda$createOnGroupError$14(ExceptionHandler.java:226)
> >
> > 

NiFi ram usage

2017-08-30 Thread Adam Lamar
Hi everybody,

I recently started up a new cloud Linux instance with 1GB of ram to do some
quick tasks in NiFi. I noticed NiFi kept dying without much information in
the logs - it just seemed to stop during startup.

Eventually I realized the system was running out of memory and OOM killing
the process, hence the lack of information in the NiFi logs. Empirically
version 1.3.0 needs about 1.1GB of RAM to start, and my flow caused an
additional 200MB of ram usage.

Are there any recommendations to get NiFi running with a lighter footprint?
I noted the default 512MB heap limits in the bootstrap config (which I
didn't change) so I'm guessing the ram usage is related to NiFi's plethora
of processors.

Cheers,
Adam


DBCPConnectionPool SqlServer and Kerberos

2017-08-30 Thread Noe Detore
Hello

Does anyone have experience or know if DBCPConnectionPool using SqlServer
can be configured to authenticate with Kerberos?

Thanks
Noe


RE: Jolt Quesiton

2017-08-30 Thread Jones, Patrick L.
I’ve got another with Jolt.

I have a json where sometimes I have
"value": {
"$numberLong": "-75928320"
}

I want to change it to “value” : "-75928320"
I want to make this change where every the value. numberLong occurs.

So if I have input like:

{
"birthDate": [
{
"thisOne": "x",
"totherOne": [
{
"somthing": "19451210",
"value": {
"$numberLong": "-75928320"
}
}
],
"value": {
"$numberLong": "-75928320"
}
}
]
}


What I want is :
{
"birthDate": [
{
"thisOne": "x",
"totherOne": [
{
"somthing": "19451210",
"value" : "-75928320"
}
],
"value": "-75928320"
}
]
}


I’m close with the below but not quite there
{
"*": {
"*": {
"@": "&",
"value": {
"\\$numberLong": "&1"
}

}
}

}

Any suggestions?


Thank you,

Pat

From: Yolanda Davis [mailto:yolanda.m.da...@gmail.com]
Sent: Friday, August 25, 2017 4:05 PM
To: users@nifi.apache.org
Subject: Re: Jolt Quesiton

Ok Pat I think I have something for you.  You'll need to use the Chain 
operation in NiFi with the below:

[
  {
"operation": "modify-overwrite-beta",
"spec": {
  "*": {
"*": {
  "value": "=toString"
}
  }
}
  }, {
"operation": "modify-overwrite-beta",
"spec": {
  "*": {
"value": "=toString"
  }
}
  }
]

So in the previous spec you had, the attempt to try to match both the array and 
the key:value in one operation with a wildcard wasn't working. That's because 
when you attempt to use a wildcard for a label it matches anything it 
encounters in the json and applies the first operation listed.  So that's why 
one part would get converted but not the other (because the first section would 
apply to an array but it wouldn't apply to the second key:value section).  So 
my approach was to have two separate operations; the first that matches the 
array and the second which would match the key value pair.  Chaining them 
together allows Jolt to match one at a time and hopefully gets you the output 
you need.

Hope this helps! Please feel free to send any other questions.

-yolanda

On Fri, Aug 25, 2017 at 3:25 PM, Jones, Patrick L. 
> wrote:
Yolanda,

Yes, I am looking for a more dynamic solution.  I have a bunch of 
different “value” that I would like to turn to strings.  Using wildcards if 
possible would certainly make my life easier.

Thank you,

Pat



From: Yolanda Davis 
[mailto:yolanda.m.da...@gmail.com]
Sent: Friday, August 25, 2017 3:20 PM
To: users@nifi.apache.org
Subject: Re: Jolt Quesiton

Hi Pat,

Give the below spec a try:

 {
  "val": {
"*": {
  "value": "=toString"
}
  },
  "arr": {
"value": "=toString"
  }
}

It is more specific on matching the incoming labels.  So I believe in your spec 
you had a section that accounted for the array entry (the section with the 
"val" label), however you also needed something that could match on the basic 
key/value entry (with the "arr" label).  The above should work. However I can 
also work through a more dynamic solution for you if you need it?

Please let me know if this helps,

Yolanda

On Fri, Aug 25, 2017 at 2:50 PM, pat > 
wrote:
Howdy,

  I'm trying to get jolt to change all of my "value" to strings.  I can't
figure out how to do it.
My source json is:
{
"val": [{
"value": 230
}],
"arr": {
"value": 9878
}
}

my modify overwrite spec is
{

"*": {
"value": "=toString",
"*": {
"value": "=toString"
}
}
}

and the result is:
{
"val": [{
"value": "230"
}],
"arr": {
"value": 9878
}
}

Any thoughts on how to get past this?  I can't seem to get both of these to
change to strings


thank you



--
View this message in context: 
http://apache-nifi-users-list.2361937.n4.nabble.com/Jolt-Quesiton-tp2759.html
Sent from the Apache NiFi Users List mailing list archive at Nabble.com.



--
--
yolanda.m.da...@gmail.com
@YolandaMDavis




--
--
yolanda.m.da...@gmail.com
@YolandaMDavis