Re: Load balancer queues stuck on 1.9.2?

2019-06-10 Thread Mark Payne
Joe,

I did just get a PR up for this JIRA. If you are inclined to test the PR, 
please do and let us know how everything goes.

Thanks!
-Mark


On Jun 5, 2019, at 2:29 PM, Mark Payne 
mailto:marka...@hotmail.com>> wrote:

Hey Joe,

Thanks for the feedback here on the logs and the analysis. I think you're very 
right - the
connection in the second flow appears to be causing your first flow to stop 
transmitting.
I have been able to replicate it pretty consistently and am starting to work on 
a fix. Hopefully
will have a PR up very shortly. If you're in a position to do so, it would be 
great if you have
a chance to test it out. I just created a JIRA to track the issue, NIFI-6353 
[1].

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-6353

On Jun 4, 2019, at 8:13 PM, Joe Gresock 
mailto:jgres...@gmail.com>> wrote:

Ok, after a couple hours from the above restart, all the load balanced 
connections stopped sending again.

I enabled DEBUG on the above 2 classes, and found the following messages being 
spammed in the logs:
2019-06-04 23:39:15,497 DEBUG [Load-Balanced Client Thread-2] 
o.a.n.c.q.c.c.a.nio.LoadBalanceSession Will not communicate with Peer 
prod-6.ec2.internal:8443 for Connection e1d23323-5630-1703--0481bd04 
because session is penalized

The same message is also spammed for prod-7 on the same connection, but I don't 
see any other connections in the log.

Now, interestingly, these are the only messages I see for any of the 8 
"Load-Balanced Client Thread-X" threads, so this makes me wonder if this 
penalized session has consumed all of the available load balance threads 
(nifi.cluster.load.balance.max.thread.count=8), such that no other load 
balancing can occur for any of the other connections in the flow, at least from 
that server.

On that hunch, I changed this connection (e1d...) to "Not load balanced" and 
bled out all the flow files on it, and the spammed log message stopped right 
away.  At the same time, several of my other load balanced queues began sending 
their flow files, as if a dam was released.

At this point, I stopped to consider why this connection would be penalized, 
and realized it was backpressured for unrelated reasons (part of our flow is 
stopped, which leads to backpressure all the way back to this queue).

Could it be that if any one load balanced connection is back-pressured, it 
could consume all of the available load balancer threads such that no other 
load balanced connection can function?

Joe

On Tue, Jun 4, 2019 at 6:00 PM Joe Gresock 
mailto:jgres...@gmail.com>> wrote:
Thanks!  I'm taking notes for next time.  For now, a full cluster restart 
appears to have resolved this case.

On Tue, Jun 4, 2019 at 5:55 PM Mark Payne 
mailto:marka...@hotmail.com>> wrote:

Joe,


You may want to try enabling DEBUG logging for the following classes:


org.apache.nifi.controller.queue.clustered.client.async.nio.LoadBalanceSession

org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient

That may provide some interesting information, especially if grepping for 
specific nodes. But I'll warn you - the logging can certainly be quite verbose.

Thanks
-Mark



On Jun 4, 2019, at 12:29 PM, Mark Payne 
mailto:marka...@hotmail.com>> wrote:

Well, that is certainly interesting. Thanks for the updated. Please do let us 
know if it occurs again.

On Jun 4, 2019, at 12:23 PM, Joe Gresock 
mailto:jgres...@gmail.com>> wrote:

Ok.. I just tried disconnecting each node from the cluster, in turn.  The first 
three (prod-6, -7, and -8) didn't make a difference, but when I reconnected 
prod-5, the load balanced connection started flowing again.

I'll continue to monitor it and let you know if this happens again.

Thanks for the suggestions!

On Tue, Jun 4, 2019 at 4:14 PM Joe Gresock 
mailto:jgres...@gmail.com>> wrote:
prod-5 and -6 don't appear to be receiving any data in that queue, based on the 
status history.  Is there anything I should see in the logs to confirm this?

On Tue, Jun 4, 2019 at 4:05 PM Mark Payne 
mailto:marka...@hotmail.com>> wrote:
Joe,

So it looks like from the Diagnostics info, that there are currently 500 
FlowFiles queued up.
They all live on prod-8.ec2.internal:8443. Of those 500, 250 are waiting to go 
to prod-5.ec2.internal:8443,
and 250 are waiting to go to prod-6.ec2.internal:8443.

So this tells us that if there are any problems, they are likely occurring on 
one of those 3 nodes. It's also not
related to swapping if it's in this state with only 500 FlowFiles queued.

Are you able to confirm that you are indeed receiving data from the load 
balanced queue on both prod-5 and prod-6?


On Jun 4, 2019, at 11:47 AM, Joe Gresock 
mailto:jgres...@gmail.com>> wrote:

Thanks Mark.

I'm running on Linux.  I've followed your suggestion and added an 
UpdateAttribute processor to the flow, and attached the diagnostics for it.

I also don't see any errors in the logs.

On Tue, Jun 4, 2019 at 3:34 PM Mark Payne 

RE: Question on ValidateRecord w/ Timestamps

2019-06-10 Thread David Gallagher
Thanks, Mark. Appreciate you taking a look.

Dave

From: Mark Payne 
Sent: Monday, June 10, 2019 10:31 AM
To: users@nifi.apache.org
Subject: Re: Question on ValidateRecord w/ Timestamps

Hi David,

Thanks for creating the template. I can see the issue with a little bit of 
debugging. I went ahead and created a JIRA to address it [1]. Unfortunately, I 
don't know that there's a good way to work around this problem. Typically when 
a timestamp field is parsed in JSON, it gets converted into an appropriate 
Timestamp object. But with ValidateRecord, it parses a few things differently, 
and intentionally avoids some of the type coercion so that the processor is 
able to check the raw data to ensure that it is valid. For timestamps, though, 
this logically needs to be modified a bit.

Thanks
-Mark

[1] 
https://issues.apache.org/jira/browse/NIFI-6369


On Jun 7, 2019, at 2:58 PM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Hi Mark - attached.

Thanks,

Dave

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Thursday, June 6, 2019 12:06 PM
To: users@nifi.apache.org
Subject: Re: Question on ValidateRecord w/ Timestamps

David,

Can you send a template of your flow and a sample piece of data?

Thanks
-Mark




On Jun 6, 2019, at 11:47 AM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Thanks, Mark. I tried that but it causes even messages with the correct format 
to be rejected.

Dave

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Thursday, June 6, 2019 11:27 AM
To: users@nifi.apache.org
Subject: Re: Question on ValidateRecord w/ Timestamps

David,

Avro supports a logical type of "timestamp-millis" only for a long field, not 
for a String field.
So I think you'd need to use:

{"name": "activationDate", "type": { "type":"long", 
"logicalType":"timestamp-millis"} }

Thanks
-Mark




On Jun 6, 2019, at 11:21 AM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Hi – I’ve got an incoming JSON message with a timestamp that I want to 
validate. I have a ValidateRecord (1.8.0) processor set up with a 
JSONPathReader for the message. The relevant field is defined in the schema as:

{"name": "activationDate", "type": { "type":"string", 
"logicalType":"timestamp-millis"} }

And in the JSONPathReader service, I have the timestamp format defined as:

MM/dd/ HH:mm:ss'Z'

As that is our intended format. However, even with strict type checking turned 
on, the validator will validate a timestamp of “2019-30-06 
15:02:39Z". In fact, it seems as though it will validate / pass on absolutely 
any string in that field. Is there a way to make the validation work?

Thanks,

Dave





Re: Keeping NiFi 1.9.2 console available

2019-06-10 Thread Joe Gresock
In htop, nifi is using between 50-95% of the CPU during heavy processing.

Once the UI comes back up, I can see that the G1 Young Generation fired
about 200 times since I started the cluster up 45 minutes ago, and it says
this was between 3-7 minutes of GC time (if I'm reading that right).

After the load died down, the heap has between 24-58% used, of a 24GB heap.

I tried out decreasing the max number of timer threads, per [1], which I
suspect was a big part of my problem: We had our max timer driven threads
set to 150, which is way more than the recommended 2-4 times number of
cores.  I decreased this to 32 and saw the above results.  Similar behavior
happened as before, but to a lesser extent, so I suspect I need to decrease
the max threads again and keep trying.


[1]
https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor.html



On Mon, Jun 10, 2019 at 12:54 PM Joe Witt  wrote:

> ...also how does the heap and garbage collection look during this?
>
> On Sun, Jun 9, 2019, 5:58 PM Joe Witt  wrote:
>
>> Joe
>>
>> When you view top or other tools what is dominating the cpu?
>>
>> thanks
>> joe
>>
>> On Sun, Jun 9, 2019, 5:35 PM Joe Gresock  wrote:
>>
>>> I posted about this a while back on 1.6.0, but as far as I can tell it
>>> has only gotten worse in 1.9.2.
>>>
>>> I have a cluster of 7 nifi nodes running on CentOS 6 VMs.  When no data
>>> is being processed, the UI console is very responsive.  However, once I
>>> start processing lots of data, the CPU gets maxed by "user" processes and
>>> the console can no longer be used (either one or more node gets
>>> disconnected due to timeouts talking to the rest of the cluster, or the UI
>>> simply goes away).
>>>
>>> The impression I get is that threads are so prioritized for the
>>> processes that nothing can be spared for interacting with the console, or
>>> keeping the nodes connected to each other.
>>>
>>> Now, my latest idea would be to decrease the number of threads that my
>>> processors use, to try to give more CPU to the cluster overhead.. does this
>>> seem like a reasonable approach?
>>>
>>> A separate observation is that in 1.6.0, it usually took about a minute
>>> for all of the nodes to connect enough that the UI was available, starting
>>> from when the last node became officially "up".  However, in 1.9.2, this
>>> takes about 15 minutes for the UI to become usable, during which time the
>>> UI claims no nodes are connected.
>>>
>>> I'm happy to provide thread dumps or other debugging output if anyone
>>> has an idea of how to improve this behavior.
>>>
>>> Thanks!
>>> Joe
>>>
>>>
>>> 
>>>  Virus-free.
>>> www.avast.com
>>> 
>>> <#m_-5640816275201314745_m_-4644020240664756686_m_-5737275650488698501_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Erik Anderson
If you point NiFi at a large dataset this is the expected behavior.

We use micro-clusters of NiFi's (4 node cluster, 8CPU, 48GB ram). Different 
clusters for different business purposes/use cases.

java.arg.2=-Xms2g
java.arg.3=-Xmx32g (48g machines - 20% less than system memory)

6 max threads per machine.

If you max one of these clusters then we are doing something wrong or need to 
move those use cases to their own cluster.

There is always a bottleneck somewhere and you will need to tweak the 
parameters of processors in your flow.

The NiFi toolkit+registry has a nice commandline interface to copy your flows 
between different clusters. We ruthlessly automated the creation/destruction of 
these NiFi clusters so its trivial to bring up an ephemeral cluster, run a 
hungry flow, and destroy that cluster.

Erik Anderson
Bloomberg


Re: Question on ValidateRecord w/ Timestamps

2019-06-10 Thread Mark Payne
Hi David,

Thanks for creating the template. I can see the issue with a little bit of 
debugging. I went ahead and created a JIRA to address it [1]. Unfortunately, I 
don't know that there's a good way to work around this problem. Typically when 
a timestamp field is parsed in JSON, it gets converted into an appropriate 
Timestamp object. But with ValidateRecord, it parses a few things differently, 
and intentionally avoids some of the type coercion so that the processor is 
able to check the raw data to ensure that it is valid. For timestamps, though, 
this logically needs to be modified a bit.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-6369

On Jun 7, 2019, at 2:58 PM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Hi Mark - attached.

Thanks,

Dave

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Thursday, June 6, 2019 12:06 PM
To: users@nifi.apache.org
Subject: Re: Question on ValidateRecord w/ Timestamps

David,

Can you send a template of your flow and a sample piece of data?

Thanks
-Mark



On Jun 6, 2019, at 11:47 AM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Thanks, Mark. I tried that but it causes even messages with the correct format 
to be rejected.

Dave

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Thursday, June 6, 2019 11:27 AM
To: users@nifi.apache.org
Subject: Re: Question on ValidateRecord w/ Timestamps

David,

Avro supports a logical type of "timestamp-millis" only for a long field, not 
for a String field.
So I think you'd need to use:

{"name": "activationDate", "type": { "type":"long", 
"logicalType":"timestamp-millis"} }

Thanks
-Mark



On Jun 6, 2019, at 11:21 AM, David Gallagher 
mailto:dgallag...@cleverdevices.com>> wrote:

Hi – I’ve got an incoming JSON message with a timestamp that I want to 
validate. I have a ValidateRecord (1.8.0) processor set up with a 
JSONPathReader for the message. The relevant field is defined in the schema as:

{"name": "activationDate", "type": { "type":"string", 
"logicalType":"timestamp-millis"} }

And in the JSONPathReader service, I have the timestamp format defined as:

MM/dd/ HH:mm:ss'Z'

As that is our intended format. However, even with strict type checking turned 
on, the validator will validate a timestamp of “2019-30-06 
15:02:39Z". In fact, it seems as though it will validate / pass on absolutely 
any string in that field. Is there a way to make the validation work?

Thanks,

Dave





Re: ExecuteSQL connected to AWS Athena Driver getting stuck frequently

2019-06-10 Thread Mark Payne
Hi Purushotham,

Thanks for attaching the thread dumps. We can see in all of the thread dumps 
the same pattern:

"Timer-Driven Process Thread-6" Id=155 WAITING  on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4f94fece
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
at com.simba.athena.athena.dataengine.AJStreamResultSet.dequeue(Unknown Source)
at com.simba.athena.athena.dataengine.AJStreamResultSet.(Unknown Source)
at com.simba.athena.athena.dataengine.AJQueryExecutor.execute(Unknown Source)
at com.simba.athena.jdbc.common.SPreparedStatement.executeWithParams(Unknown 
Source)
at com.simba.athena.jdbc.common.SPreparedStatement.executeQuery(Unknown Source)
- waiting on com.simba.athena.jdbc.jdbc42.S42PreparedStatement@4a5576ae
at 
org.apache.commons.dbcp2.PoolableConnection.validate(PoolableConnection.java:287)
at 
org.apache.commons.dbcp2.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:389)
at 
org.apache.commons.dbcp2.PoolableConnectionFactory.validateObject(PoolableConnectionFactory.java:375)
at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:484)
at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:365)
at 
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
at 
org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1563)
at 
org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:470)
at org.apache.nifi.dbcp.DBCPService.getConnection(DBCPService.java:49)
at sun.reflect.GeneratedMethodAccessor481.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:87)
at com.sun.proxy.$Proxy128.getConnection(Unknown Source)
at 
org.apache.nifi.processors.standard.AbstractExecuteSQL.onTrigger(AbstractExecuteSQL.java:222)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of Locked Synchronizers: 1
- java.util.concurrent.ThreadPoolExecutor$Worker@4054f861

So we see here that NiFi is using a Poolable Connection and calling "validate," 
which is attempting to run the query "SELECT 1" in your case. It looks like the 
Athena JDBC driver never returns. Unfortunately, I don't know that there is 
much that can be done about that on the NiFi side. Googling for issues around 
JDBC Connections hanging with Athena did result in some troubleshooting [1] 
documents about queries hanging and resolving it by changing ICMP/MTU sizes. I 
would recommend trying the recommendations there, if you haven't already.


[1] https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-drop-issues.html


On Jun 10, 2019, at 3:24 AM, Purushotham Pushpavanthar 
mailto:pushpavant...@gmail.com>> wrote:

Hi Mark,

I ran into same issue today. It helped me capture thread dumps. Attached are 
the thread dumps of 3 node cluster.
Our nodes are running as docker containers in c5.xlarge instances. Nifi Version 
1.9.2.

Regards,
Purushotham Pushpavanth



On Fri, 7 Jun 2019 at 17:51, Mark Payne 
mailto:marka...@hotmail.com>> wrote:
Purushotham,

Generally if you run into a situation where you have a stuck thread you will 
need to provide a thread dump to understand what is going on. It’s easiest to 
do that by running “bin/nifi.sh dump dump1.txt” and then attaching the created 
dump1.txt to the email.

Thanks
-Mark

Sent from my iPhone

On Jun 7, 2019, at 3:45 AM, Purushotham Pushpavanthar 
mailto:pushpavant...@gmail.com>> wrote:

Hi,

I've been ExecuteSQL to execute some DDL statements whenever 

Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Mark Payne
I don't know that this is actually unexpected. What you observed is that you 
had million of FlowFiles queued up to be processed. NiFi was not processing 
them with 100% CPU utilization. This typically indicates one of two things: a) 
You haven't allocated enough threads, or b) you have a bottleneck other than 
CPU - likely Disk I/O.

Once you restarted NiFi, you were in a situation where you had improved your 
disk I/O. If you were previously not at 100% CPU utilization due to a Disk I/O 
bottleneck, and you then removed that bottleneck by improving disk I/O like you 
mentioned, then it makes sense that NiFi would now start consuming more CPU - 
even up to 100% - to handle those millions of FlowFiles that are queued up.



On Jun 10, 2019, at 9:07 AM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:

buffering flowfiles like that is supported by design and common so it would be 
ideal to figure out what happened.

On Mon, Jun 10, 2019, 9:02 AM Shanker Sneh 
mailto:shanker.s...@zoomcar.com>> wrote:
Flowfiles were close to ~7 million .. 8 threads (as I have 4 vCPU in 1 box). 
Max heap allocated is 12Gb. So the usage was ~60%

Joe, I think it has something to do with what Wookcock suggested. Clearing up 
content & FlowFiles seem to have CPU manageable.
Allow me 1-2 days and I shall report back if it solves the problem.

On Mon, Jun 10, 2019 at 6:23 PM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
how many flowfiles were in queue?  how many threads for nifi to use?   how was 
heap?

On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh 
mailto:shanker.s...@zoomcar.com>> wrote:
Thanks Joe for reading through and helping me. :)


  *   NiFi hasn't been upgraded. its 1.8.0 (community version of Horton works 
data flow).
  *   OS/Kernel is the same. Just that I have added more capacity to disk (with 
better IO).
  *   JVM continues to be the same. Java 8.
  *   When CPU is 100%, top shoes just NiFi java process. When I provided with 
more cores (as high as 16), NiFi used all 16 nodes and throttled at 1600%.

Meanwhile, I am trying to clear up all FlowFiles from disk and start the flows 
afresh.


On Mon, Jun 10, 2019 at 5:42 PM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Sneh

It was stable for months but now is high...

has nifi been upgraded?  what version before vs now?

has the os/kernel been changed?

has the jvm been updated?

when cpu is 100 what does top show?

thanks

On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh 
mailto:shanker.s...@zoomcar.com>> wrote:
Thanks for the suggestions Joe.
Actually the issue is persistent even after reverting to the 
'older-regular-incremental-load' of the data flow (which used to work fine 
since months on similarly-configured hardware a few days back by utilising just 
~50% of resources).

These days, one of the 2-node cluster gets out of NiFi every now and then as 
the CPU peaks 100% for that particular machine. And subsequently the other node 
reaches 100% CPU too.
When I restart NiFi on a particular node, CPU tanks to 0 and then spikes to 
100% within few minutes - the data flowing through the pipeline is just too 
less to throttle my CPU ideally.

The machine config and NiFi config remains untouched - this has left me 
confused where the problem might be. Something which had been running smoothly 
since months, has become a challenge now.

On Fri, Jun 7, 2019 at 8:16 PM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Shanker

It sounds like you've gone through some changes in general and have worked 
through those.  Now you have a flow running with a high volume of data (history 
load) and want to know which parts of the flow are most expensive/consuming the 
CPU.

You should be able to look at the statistics provided on the processors to see 
where the majority of CPU time is spent.  You can usually very easily reason 
over this if it is doing compression/encryption/etc.. and determine if you want 
to give it more threads/less threads/batch data together better, etc..

The configuration of the VMs, the NiFi instance itself, the flow, and the 
nature of the data are all important to see/understand to be of much help here.

THanks

On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
mailto:shanker.s...@zoomcar.com>> wrote:
Hello all,

I am facing strange issue with NiFi 1.8.0 (2 nodes)
My flows had been running fine since months.

Yesterday I had to do some history load which filled up my both disks (I have 
FlowFile repository as separate disk).

I increased the size of the root & flowflile disk both. And 'grow' the disk 
partition and 'extended' the file system (it's an EC2 linux).
But post that my CPU has been spiking to complete 100% - even at regular load 
(earlier it used to be somewhere around 50%)
Also I did no change to the config values or thread count etc.

I upgraded the 2 nodes to see if that solves the problem - from 16 Gb box (4 
core) to 64 Gb (16 core).
But even the larger box is throttling on the CPU at 100%.

I tried clearing all repositories and restarted NiFi application and 

Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Joe Witt
how many flowfiles were in queue?  how many threads for nifi to use?   how
was heap?

On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh  wrote:

> Thanks Joe for reading through and helping me. :)
>
>
>- NiFi hasn't been upgraded. its 1.8.0 (community version of Horton
>works data flow).
>- OS/Kernel is the same. Just that I have added more capacity to disk
>(with better IO).
>- JVM continues to be the same. Java 8.
>- When CPU is 100%, top shoes just NiFi java process. When I provided
>with more cores (as high as 16), NiFi used all 16 nodes and throttled at
>1600%.
>
>
> Meanwhile, I am trying to clear up all FlowFiles from disk and start the
> flows afresh.
>
>
> On Mon, Jun 10, 2019 at 5:42 PM Joe Witt  wrote:
>
>> Sneh
>>
>> It was stable for months but now is high...
>>
>> has nifi been upgraded?  what version before vs now?
>>
>> has the os/kernel been changed?
>>
>> has the jvm been updated?
>>
>> when cpu is 100 what does top show?
>>
>> thanks
>>
>> On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh 
>> wrote:
>>
>>> Thanks for the suggestions Joe.
>>> Actually the issue is persistent even after reverting to the
>>> 'older-regular-incremental-load' of the data flow* (which used to work
>>> fine since months on similarly-configured hardware a few days back by
>>> utilising just ~50% of resources)*.
>>>
>>> These days, one of the 2-node cluster gets out of NiFi every now and
>>> then as the CPU peaks 100% for that particular machine. And subsequently
>>> the other node reaches 100% CPU too.
>>> When I restart NiFi on a particular node, CPU tanks to 0 and then spikes
>>> to 100% within few minutes - the data flowing through the pipeline is *just
>>> too less* to throttle my CPU ideally.
>>>
>>> The machine config and NiFi config remains untouched - this has left me
>>> confused where the problem might be. Something which had been running
>>> smoothly since months, has become a challenge now.
>>>
>>> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt  wrote:
>>>
 Shanker

 It sounds like you've gone through some changes in general and have
 worked through those.  Now you have a flow running with a high volume of
 data (history load) and want to know which parts of the flow are most
 expensive/consuming the CPU.

 You should be able to look at the statistics provided on the processors
 to see where the majority of CPU time is spent.  You can usually very
 easily reason over this if it is doing compression/encryption/etc.. and
 determine if you want to give it more threads/less threads/batch data
 together better, etc..

 The configuration of the VMs, the NiFi instance itself, the flow, and
 the nature of the data are all important to see/understand to be of much
 help here.

 THanks

 On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
 wrote:

> Hello all,
>
> I am facing strange issue with NiFi 1.8.0 (2 nodes)
> My flows had been running fine since months.
>
> Yesterday I had to do some history load which filled up my both disks
> (I have FlowFile repository as separate disk).
>
> I increased the size of the root & flowflile disk both. And 'grow' the
> disk partition and 'extended' the file system (it's an EC2 linux).
> But post that my CPU has been spiking to complete 100% - even at
> regular load (earlier it used to be somewhere around 50%)
> Also I did no change to the config values or thread count etc.
>
> I upgraded the 2 nodes to see if that solves the problem - from 16 Gb
> box (4 core) to 64 Gb (16 core).
> But even the larger box is throttling on the CPU at 100%.
>
> I tried clearing all repositories and restarted NiFi application and
> the EC2 - but no improvement.
>
> Kindly point me in the right direction. I am unable to pinpoint
> anything.
>
> --
> Best,
> Sneh
>

>>>
>>> --
>>> Best,
>>> Sneh
>>>
>>
>
> --
> Best,
> Sneh
>


Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Shanker Sneh
Thanks Joe for reading through and helping me. :)


   - NiFi hasn't been upgraded. its 1.8.0 (community version of Horton
   works data flow).
   - OS/Kernel is the same. Just that I have added more capacity to disk
   (with better IO).
   - JVM continues to be the same. Java 8.
   - When CPU is 100%, top shoes just NiFi java process. When I provided
   with more cores (as high as 16), NiFi used all 16 nodes and throttled at
   1600%.


Meanwhile, I am trying to clear up all FlowFiles from disk and start the
flows afresh.


On Mon, Jun 10, 2019 at 5:42 PM Joe Witt  wrote:

> Sneh
>
> It was stable for months but now is high...
>
> has nifi been upgraded?  what version before vs now?
>
> has the os/kernel been changed?
>
> has the jvm been updated?
>
> when cpu is 100 what does top show?
>
> thanks
>
> On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh 
> wrote:
>
>> Thanks for the suggestions Joe.
>> Actually the issue is persistent even after reverting to the
>> 'older-regular-incremental-load' of the data flow* (which used to work
>> fine since months on similarly-configured hardware a few days back by
>> utilising just ~50% of resources)*.
>>
>> These days, one of the 2-node cluster gets out of NiFi every now and then
>> as the CPU peaks 100% for that particular machine. And subsequently the
>> other node reaches 100% CPU too.
>> When I restart NiFi on a particular node, CPU tanks to 0 and then spikes
>> to 100% within few minutes - the data flowing through the pipeline is *just
>> too less* to throttle my CPU ideally.
>>
>> The machine config and NiFi config remains untouched - this has left me
>> confused where the problem might be. Something which had been running
>> smoothly since months, has become a challenge now.
>>
>> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt  wrote:
>>
>>> Shanker
>>>
>>> It sounds like you've gone through some changes in general and have
>>> worked through those.  Now you have a flow running with a high volume of
>>> data (history load) and want to know which parts of the flow are most
>>> expensive/consuming the CPU.
>>>
>>> You should be able to look at the statistics provided on the processors
>>> to see where the majority of CPU time is spent.  You can usually very
>>> easily reason over this if it is doing compression/encryption/etc.. and
>>> determine if you want to give it more threads/less threads/batch data
>>> together better, etc..
>>>
>>> The configuration of the VMs, the NiFi instance itself, the flow, and
>>> the nature of the data are all important to see/understand to be of much
>>> help here.
>>>
>>> THanks
>>>
>>> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
>>> wrote:
>>>
 Hello all,

 I am facing strange issue with NiFi 1.8.0 (2 nodes)
 My flows had been running fine since months.

 Yesterday I had to do some history load which filled up my both disks
 (I have FlowFile repository as separate disk).

 I increased the size of the root & flowflile disk both. And 'grow' the
 disk partition and 'extended' the file system (it's an EC2 linux).
 But post that my CPU has been spiking to complete 100% - even at
 regular load (earlier it used to be somewhere around 50%)
 Also I did no change to the config values or thread count etc.

 I upgraded the 2 nodes to see if that solves the problem - from 16 Gb
 box (4 core) to 64 Gb (16 core).
 But even the larger box is throttling on the CPU at 100%.

 I tried clearing all repositories and restarted NiFi application and
 the EC2 - but no improvement.

 Kindly point me in the right direction. I am unable to pinpoint
 anything.

 --
 Best,
 Sneh

>>>
>>
>> --
>> Best,
>> Sneh
>>
>

-- 
Best,
Sneh


Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Joe Witt
Sneh

It was stable for months but now is high...

has nifi been upgraded?  what version before vs now?

has the os/kernel been changed?

has the jvm been updated?

when cpu is 100 what does top show?

thanks

On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh  wrote:

> Thanks for the suggestions Joe.
> Actually the issue is persistent even after reverting to the
> 'older-regular-incremental-load' of the data flow* (which used to work
> fine since months on similarly-configured hardware a few days back by
> utilising just ~50% of resources)*.
>
> These days, one of the 2-node cluster gets out of NiFi every now and then
> as the CPU peaks 100% for that particular machine. And subsequently the
> other node reaches 100% CPU too.
> When I restart NiFi on a particular node, CPU tanks to 0 and then spikes
> to 100% within few minutes - the data flowing through the pipeline is *just
> too less* to throttle my CPU ideally.
>
> The machine config and NiFi config remains untouched - this has left me
> confused where the problem might be. Something which had been running
> smoothly since months, has become a challenge now.
>
> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt  wrote:
>
>> Shanker
>>
>> It sounds like you've gone through some changes in general and have
>> worked through those.  Now you have a flow running with a high volume of
>> data (history load) and want to know which parts of the flow are most
>> expensive/consuming the CPU.
>>
>> You should be able to look at the statistics provided on the processors
>> to see where the majority of CPU time is spent.  You can usually very
>> easily reason over this if it is doing compression/encryption/etc.. and
>> determine if you want to give it more threads/less threads/batch data
>> together better, etc..
>>
>> The configuration of the VMs, the NiFi instance itself, the flow, and the
>> nature of the data are all important to see/understand to be of much help
>> here.
>>
>> THanks
>>
>> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
>> wrote:
>>
>>> Hello all,
>>>
>>> I am facing strange issue with NiFi 1.8.0 (2 nodes)
>>> My flows had been running fine since months.
>>>
>>> Yesterday I had to do some history load which filled up my both disks (I
>>> have FlowFile repository as separate disk).
>>>
>>> I increased the size of the root & flowflile disk both. And 'grow' the
>>> disk partition and 'extended' the file system (it's an EC2 linux).
>>> But post that my CPU has been spiking to complete 100% - even at regular
>>> load (earlier it used to be somewhere around 50%)
>>> Also I did no change to the config values or thread count etc.
>>>
>>> I upgraded the 2 nodes to see if that solves the problem - from 16 Gb
>>> box (4 core) to 64 Gb (16 core).
>>> But even the larger box is throttling on the CPU at 100%.
>>>
>>> I tried clearing all repositories and restarted NiFi application and the
>>> EC2 - but no improvement.
>>>
>>> Kindly point me in the right direction. I am unable to pinpoint anything.
>>>
>>> --
>>> Best,
>>> Sneh
>>>
>>
>
> --
> Best,
> Sneh
>


Re: NiFi cluster goes 100% CPU in no time

2019-06-10 Thread Shanker Sneh
Thanks for the suggestions Joe.
Actually the issue is persistent even after reverting to the
'older-regular-incremental-load' of the data flow* (which used to work fine
since months on similarly-configured hardware a few days back by utilising
just ~50% of resources)*.

These days, one of the 2-node cluster gets out of NiFi every now and then
as the CPU peaks 100% for that particular machine. And subsequently the
other node reaches 100% CPU too.
When I restart NiFi on a particular node, CPU tanks to 0 and then spikes to
100% within few minutes - the data flowing through the pipeline is *just
too less* to throttle my CPU ideally.

The machine config and NiFi config remains untouched - this has left me
confused where the problem might be. Something which had been running
smoothly since months, has become a challenge now.

On Fri, Jun 7, 2019 at 8:16 PM Joe Witt  wrote:

> Shanker
>
> It sounds like you've gone through some changes in general and have worked
> through those.  Now you have a flow running with a high volume of data
> (history load) and want to know which parts of the flow are most
> expensive/consuming the CPU.
>
> You should be able to look at the statistics provided on the processors to
> see where the majority of CPU time is spent.  You can usually very easily
> reason over this if it is doing compression/encryption/etc.. and determine
> if you want to give it more threads/less threads/batch data together
> better, etc..
>
> The configuration of the VMs, the NiFi instance itself, the flow, and the
> nature of the data are all important to see/understand to be of much help
> here.
>
> THanks
>
> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh 
> wrote:
>
>> Hello all,
>>
>> I am facing strange issue with NiFi 1.8.0 (2 nodes)
>> My flows had been running fine since months.
>>
>> Yesterday I had to do some history load which filled up my both disks (I
>> have FlowFile repository as separate disk).
>>
>> I increased the size of the root & flowflile disk both. And 'grow' the
>> disk partition and 'extended' the file system (it's an EC2 linux).
>> But post that my CPU has been spiking to complete 100% - even at regular
>> load (earlier it used to be somewhere around 50%)
>> Also I did no change to the config values or thread count etc.
>>
>> I upgraded the 2 nodes to see if that solves the problem - from 16 Gb box
>> (4 core) to 64 Gb (16 core).
>> But even the larger box is throttling on the CPU at 100%.
>>
>> I tried clearing all repositories and restarted NiFi application and the
>> EC2 - but no improvement.
>>
>> Kindly point me in the right direction. I am unable to pinpoint anything.
>>
>> --
>> Best,
>> Sneh
>>
>

-- 
Best,
Sneh


Re: ExecuteSQL connected to AWS Athena Driver getting stuck frequently

2019-06-10 Thread Suman B N
+1
We too are facing the same problem.

On Mon, Jun 10, 2019 at 12:55 PM Purushotham Pushpavanthar <
pushpavant...@gmail.com> wrote:

> Hi Mark,
>
> I ran into same issue today. It helped me capture thread dumps. Attached
> are the thread dumps of 3 node cluster.
> Our nodes are running as docker containers in c5.xlarge instances. Nifi
> Version 1.9.2.
>
> Regards,
> Purushotham Pushpavanth
>
>
>
> On Fri, 7 Jun 2019 at 17:51, Mark Payne  wrote:
>
>> Purushotham,
>>
>> Generally if you run into a situation where you have a stuck thread you
>> will need to provide a thread dump to understand what is going on. It’s
>> easiest to do that by running “bin/nifi.sh dump dump1.txt” and then
>> attaching the created dump1.txt to the email.
>>
>> Thanks
>> -Mark
>>
>> Sent from my iPhone
>>
>> On Jun 7, 2019, at 3:45 AM, Purushotham Pushpavanthar <
>> pushpavant...@gmail.com> wrote:
>>
>> Hi,
>>
>> I've been ExecuteSQL to execute some DDL statements whenever there is an
>> update to my S3.
>> This was working fine for me except for one glitch. It stops processing
>> any incoming flowfiles with running threads. Once the processor gets into
>> this state, it never recovers. It's not possible to stop the processor
>> without forcefully terminating it. It starts working fine once I restart it
>> through forceful termination. I went through the mail thread in the link
>> http://apache-nifi-users-list.2361937.n4.nabble.com/ExecuteSQL-question-how-do-I-stop-long-running-queries-td3039.html
>>  and
>> tried adding Validation Query, but it didn't help. I'm sending very light
>> weight DDL statements like ALTER TABLE ADD PARTITION. I don't this is
>> causing much load on the Athena End.
>> I've attached my ExecuteSQL and DBConnectionPool configuration. Kindly
>> review it and help me resolve/let me know workaround.
>>
>>
>>
>> Regards
>> Purushotham Pushpavanth
>>
>> 
>>
>> 
>>
>> 
>>
>>

-- 
*Suman*
*OlaCabs*


Re: InvokeHTTP with SSL

2019-06-10 Thread Martijn Dekkers
HTTPS doesn’t work without a certificate. It’s just not possible. What is 
possible is for a non-secured webservice to listen on port 443 in which case 
you can use an http://:443 style invocation. However, the most likely 
scenario is that you are using a self-signed certificate, in which case you’ll 
have to add the certificates to the truststore. See 
http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-trust-another-certificate-from-within-nifi-flows-td17950.html
 for details. 


Martijn

On Mon, 10 Jun 2019, at 08:35, Tomislav Novosel wrote:
> Yeah, I was thinking about that. But what if service doesn't have any 
> certificates at alL?
> I think that service listens on K8S cluster without SSL certs and inside our 
> corporate network.
> 
> BR,
> Tom
> 
> On Mon, 3 Jun 2019 at 16:16, Bryan Bende  wrote:
>> Hello,
>> 
>>  You should be specifying an SSL Context Service in the processor which
>>  points to a truststore that trusts the certificate of the service you
>>  are calling.
>> 
>>  Alternatively, if the CA certs system truststore trusts the service
>>  cert then it should also work.
>> 
>>  Thanks,
>> 
>>  Bryan
>> 
>>  On Mon, Jun 3, 2019 at 10:14 AM Tomislav Novosel  
>> wrote:
>>  >
>>  > Hi all,
>>  >
>>  > I have a case where I need to send POST request on one enpoint which is 
>> located
>>  > on K8S cluster and behind reverse proxy. Only HTTPS can be used.
>>  > If I put value of endpoint using https:// I get error 'Unable to find 
>> valid certification path to requested target'.
>>  > I spoke to my admin/devops guy and he says there is no other way to 
>> access that endpoint other than URL he gave me.
>>  >
>>  > Is there a way to bypass SSL verification or something else?
>>  >
>>  > Thanks,
>>  > BR,
>>  > Tom


Re: InvokeHTTP with SSL

2019-06-10 Thread Tomislav Novosel
Yeah, I was thinking about that. But what if service doesn't have any
certificates at alL?
I think that service listens on K8S cluster without SSL certs and inside
our corporate network.

BR,
Tom

On Mon, 3 Jun 2019 at 16:16, Bryan Bende  wrote:

> Hello,
>
> You should be specifying an SSL Context Service in the processor which
> points to a truststore that trusts the certificate of the service you
> are calling.
>
> Alternatively, if the CA certs system truststore trusts the service
> cert then it should also work.
>
> Thanks,
>
> Bryan
>
> On Mon, Jun 3, 2019 at 10:14 AM Tomislav Novosel 
> wrote:
> >
> > Hi all,
> >
> > I have a case where I need to send POST request on one enpoint which is
> located
> > on K8S cluster and behind reverse proxy. Only HTTPS can be used.
> > If I put value of endpoint using https:// I get error 'Unable to find
> valid certification path to requested target'.
> > I spoke to my admin/devops guy and he says there is no other way to
> access that endpoint other than URL he gave me.
> >
> > Is there a way to bypass SSL verification or something else?
> >
> > Thanks,
> > BR,
> > Tom
>