Re: Question regarding krb tgt renewal for Hive processors and connection pools

2018-12-19 Thread Shawn Weeks
It’s nifi-5134 that fixes this issue. Prior to that the hive connection pool 
did not renew its Kerberos ticket correctly.

Sent from my iPhone

On Dec 19, 2018, at 5:15 PM, Pat White 
mailto:patwh...@oath.com>> wrote:

Thanks much Bryan and Shawn, we're currently on 1.6.0 with some cherrypicks 
from 1.8.0 jiras.
Will check the archives as mentioned, thanks again.

patw

On Wed, Dec 19, 2018 at 4:45 PM Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:
There is a bug for this but I’m not sure which release fixed it. Something 
after 1.5 I think. The patch is in the hortonworks hdf 3.1.2 release.

If you go search for me in the archives I mentioned it a few months back.

Thanks
Shawn

Sent from my iPhone

> On Dec 19, 2018, at 3:59 PM, Pat White 
> mailto:patwh...@oath.com>> wrote:
>
> Hi Folks,
>
> Using kerberos auth in Nifi clusters communicating with hdfs and for hive 
> access, the ticket life is 24 hours. Hdfs works fine, however we're seeing 
> issues with hive where the tgt doesn't seem to renew, or fetch a new ticket, 
> as the 24hr limit approaches. Hence, hive access works fine until the 24hrs 
> expires and then fails to authenticate. For example, a SelectHiveQL processor 
> using the Hive Database Connection Pooling Service will work for 24 hours 
> after a cluster restart but then fail with:
>
> org.ietf.jgss.GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)
>
> Enabled krb debugging, which shows the ticket is found but no renew, or new 
> fetch attempt, seems to have been made. Krb docs discuss setting 
> javax.security.auth.useSubjectCredsOnly=false in order to allow the 
> underlying mechanism to obtain credentials, however the bootstrap.conf 
> explicitly sets this to 'true', to inhibit JAAS from using any fallback 
> methods to authenticate.
>
> Trying an experiment with useSubjectCredsOnly=false but would appreciate if 
> anyone has some guidance on this, how to get hive's connection pools to renew 
> tgt or fetch a new ticket ? Thank you.
>
> patw
>
>
>


Re: Question regarding krb tgt renewal for Hive processors and connection pools

2018-12-19 Thread Pat White
Thanks much Bryan and Shawn, we're currently on 1.6.0 with some cherrypicks
from 1.8.0 jiras.
Will check the archives as mentioned, thanks again.

patw

On Wed, Dec 19, 2018 at 4:45 PM Shawn Weeks 
wrote:

> There is a bug for this but I’m not sure which release fixed it. Something
> after 1.5 I think. The patch is in the hortonworks hdf 3.1.2 release.
>
> If you go search for me in the archives I mentioned it a few months back.
>
> Thanks
> Shawn
>
> Sent from my iPhone
>
> > On Dec 19, 2018, at 3:59 PM, Pat White  wrote:
> >
> > Hi Folks,
> >
> > Using kerberos auth in Nifi clusters communicating with hdfs and for
> hive access, the ticket life is 24 hours. Hdfs works fine, however we're
> seeing issues with hive where the tgt doesn't seem to renew, or fetch a new
> ticket, as the 24hr limit approaches. Hence, hive access works fine until
> the 24hrs expires and then fails to authenticate. For example, a
> SelectHiveQL processor using the Hive Database Connection Pooling Service
> will work for 24 hours after a cluster restart but then fail with:
> >
> > org.ietf.jgss.GSSException: No valid credentials provided
> > (Mechanism level: Failed to find any Kerberos tgt)
> >
> > Enabled krb debugging, which shows the ticket is found but no renew, or
> new fetch attempt, seems to have been made. Krb docs discuss setting
> javax.security.auth.useSubjectCredsOnly=false in order to allow the
> underlying mechanism to obtain credentials, however the bootstrap.conf
> explicitly sets this to 'true', to inhibit JAAS from using any fallback
> methods to authenticate.
> >
> > Trying an experiment with useSubjectCredsOnly=false but would appreciate
> if anyone has some guidance on this, how to get hive's connection pools to
> renew tgt or fetch a new ticket ? Thank you.
> >
> > patw
> >
> >
> >
>


Re: Question regarding krb tgt renewal for Hive processors and connection pools

2018-12-19 Thread Bryan Bende
Hi Pat,

I’m personally not that familiar with Hive, but for those that are, they
will probably need to know what version of NiFi you are using since some
bugs have been fixed along the way.

Thanks,

Bryan

On Wed, Dec 19, 2018 at 4:59 PM Pat White  wrote:

> Hi Folks,
>
> Using kerberos auth in Nifi clusters communicating with hdfs and for hive
> access, the ticket life is 24 hours. Hdfs works fine, however we're seeing
> issues with hive where the tgt doesn't seem to renew, or fetch a new
> ticket, as the 24hr limit approaches. Hence, hive access works fine until
> the 24hrs expires and then fails to authenticate. For example, a
> SelectHiveQL processor using the Hive Database Connection Pooling Service
> will work for 24 hours after a cluster restart but then fail with:
>
> org.ietf.jgss.GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)
>
> Enabled krb debugging, which shows the ticket is found but no renew, or
> new fetch attempt, seems to have been made. Krb docs discuss
> setting javax.security.auth.useSubjectCredsOnly=false in order to allow the
> underlying mechanism to obtain credentials, however the bootstrap.conf
> explicitly sets this to 'true', to inhibit JAAS from using any fallback
> methods to authenticate.
>
> Trying an experiment with useSubjectCredsOnly=false but would appreciate
> if anyone has some guidance on this, how to get hive's connection pools to
> renew tgt or fetch a new ticket ? Thank you.
>
> patw
>
>
>
> --
Sent from Gmail Mobile


Question regarding krb tgt renewal for Hive processors and connection pools

2018-12-19 Thread Pat White
Hi Folks,

Using kerberos auth in Nifi clusters communicating with hdfs and for hive
access, the ticket life is 24 hours. Hdfs works fine, however we're seeing
issues with hive where the tgt doesn't seem to renew, or fetch a new
ticket, as the 24hr limit approaches. Hence, hive access works fine until
the 24hrs expires and then fails to authenticate. For example, a
SelectHiveQL processor using the Hive Database Connection Pooling Service
will work for 24 hours after a cluster restart but then fail with:

org.ietf.jgss.GSSException: No valid credentials provided
(Mechanism level: Failed to find any Kerberos tgt)

Enabled krb debugging, which shows the ticket is found but no renew, or new
fetch attempt, seems to have been made. Krb docs discuss
setting javax.security.auth.useSubjectCredsOnly=false in order to allow the
underlying mechanism to obtain credentials, however the bootstrap.conf
explicitly sets this to 'true', to inhibit JAAS from using any fallback
methods to authenticate.

Trying an experiment with useSubjectCredsOnly=false but would appreciate if
anyone has some guidance on this, how to get hive's connection pools to
renew tgt or fetch a new ticket ? Thank you.

patw


Re: Unable to List Queue on a connection

2018-12-19 Thread Vijay Chhipa
Andy

Thank you for the pointers, that worked perfectly. 

Cheers!

Vijay


> On Dec 17, 2018, at 6:23 PM, Andy LoPresto  wrote:
> 
> Hi Vijay,
> 
> Apache NiFi 1.x doesn’t have “roles”, so the “administrators” group doesn’t 
> carry any special significance [1], and connections do not have policies 
> assigned to them. You’ll need to assign the “View the data” and “Modify the 
> data” policies to yourself on the specified resources (the components where 
> the connection originates and terminates) [2]. If you do this on the root 
> process group (or more granularly, the specific process group containing this 
> connection), all child process groups and components will inherit that 
> permission unless you override it [3]. There are some examples of this 
> process to help [4]. Hope this helps. 
> 
> [1] 
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#access-policies
>  
> 
> [2] 
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#component-level-access-policies
>  
> 
> [3] 
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#access-policy-inheritance
>  
> 
> [4] 
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#access-policy-config-examples
>  
> 
> 
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Dec 17, 2018, at 2:45 PM, Vijay Chhipa > > wrote:
>> 
>> Hi 
>> 
>> I have a secure NiFi instance setup. 
>> 
>> I put my self in the Administrator group and created policies for the 
>> administrator. 
>> 
>> When I right click on  a connection and click on the List Queue, I get the 
>> following error:  
>> 
>> Insufficient Permissions:
>>  No Applicable policies could be found. Contact the system 
>> administrator. 
>>   OK
>> 
>> 
>> What permission I need to give myself to view the queue and may be empty the 
>> queue.
>> 
>> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Variables handling in MiNiFi

2018-12-19 Thread Aldrin Piri
Hi Luis,

There was an initial attempt at providing this support in MiNiFi [1] that
would likely be worth renewing discussion on and can certainly appreciate
the utility that bit of functionality would provide.

I would be interested in hearing a bit more about your intended use for the
variables to make sure we think about this the right way.  I think adding
this support in the overarching transformation that currently exists might
still fall short of what is needed.

As an aside, are you also making use of Registry (not specifically for
MiNiFi, but in general)?

Thanks!

[1] https://github.com/apache/nifi-minifi/pull/115

On Wed, Dec 19, 2018 at 12:39 PM Andrew Grande  wrote:

> Luie,
>
> Which version are you looking at? I think there was a recent discussion
> about supporting those in MiNiFi, not sure if it got imoleme ted already,
> though.
>
> Andrew
>
> On Wed, Dec 19, 2018, 9:26 AM luis_size  wrote:
>
>> Hi
>>
>> I am a big fan of NiFi variables registry. This makes instantiating
>> versions of the same workflow easier. However, when I save a PG as a
>> template, export it, convert it to YML for MiNiFi agent, the value of
>> variable is not exported.
>>
>> Is there a support of variable in MiNiFi? any ongoing work on this?
>>
>> Thanks
>> Luis
>>
>


Re: Variables handling in MiNiFi

2018-12-19 Thread Andrew Grande
Luie,

Which version are you looking at? I think there was a recent discussion
about supporting those in MiNiFi, not sure if it got imoleme ted already,
though.

Andrew

On Wed, Dec 19, 2018, 9:26 AM luis_size  wrote:

> Hi
>
> I am a big fan of NiFi variables registry. This makes instantiating
> versions of the same workflow easier. However, when I save a PG as a
> template, export it, convert it to YML for MiNiFi agent, the value of
> variable is not exported.
>
> Is there a support of variable in MiNiFi? any ongoing work on this?
>
> Thanks
> Luis
>


Variables handling in MiNiFi

2018-12-19 Thread luis_size
Hi
I am a big fan of NiFi variables registry. This makes instantiating versions of 
the same workflow easier. However, when I save a PG as a template, export it, 
convert it to YML for MiNiFi agent, the value of variable is not exported.
Is there a support of variable in MiNiFi? any ongoing work on this?
ThanksLuis

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread dan young
Ok, will keep you posted.

On Wed, Dec 19, 2018, 8:22 AM Mark Payne  Hey Dan,
>
> Yes, we will want to get the diagnostics when you're in that state. It's
> probably not worth trying to turn on DEBUG logging
> unless you are in that state either. The thread dump shows all the threads
> with no work to do. Which is what I would expect.
> The question is: "Why does it not think there's work to do?" So the
> diagnostics and DEBUG logs will hopefully answer that,
> once you get back into that state again.
>
> Thanks
> -Mark
>
>
> On Dec 19, 2018, at 10:16 AM, dan young  wrote:
>
> Hello Mark,
>
> I'll try to grab that diagnostics...I assume we want to grab it when we
> see the stuck Flowfile in a queue, correct?
>
> Also, does the nifi thread dump provide anything? This was from the node
> that seemed to have the stuck Flowfile...
>
> Dano
>
> On Wed, Dec 19, 2018, 6:51 AM Mark Payne 
>> Hey Josef, Dano,
>>
>> Firstly, let me assure you that while I may be the only one from the NiFi
>> side who's been engaging on debugging
>> this, I am far from the only one who cares about it! :) This is a pretty
>> big new feature that was added to the latest
>> release, so understandably there are probably not yet a lot of people who
>> understand the code well enough to
>> debug. I have tried replicating the issue, but have not been successful.
>> I have a 3-node cluster that ran for well over
>> a month without a restart, and i've also tried restarting it every few
>> hours for a couple of days. It has about 8 different
>> load-balanced connections, with varying data sizes and volumes. I've not
>> been able to get into this situation, though,
>> unfortunately.
>>
>> But yes, I think that we've seen this issue arise from each of the two of
>> you and one other on the mailing list, so it
>> is certainly something that we need to nail down ASAP. Unfortunately,
>> debugging an issue that involves communication
>> between multiple nodes is often difficult to fully understand, so it may
>> not be a trivial task to debug.
>>
>> Dano, if you are able to get to the diagnostics, as Josef mentioned, that
>> is likely to be pretty helpful. Off the top of my head,
>> there are a few possibilities that are coming to mind, as to what kind of
>> bug could cause such behavior:
>>
>> 1) Perhaps there really is no flowfile in the queue, but we somehow
>> miscalculated the size of the queue. The diagnostics
>> info would tell us whether or not this is the case. It will look into the
>> queues themselves to determine how many FlowFiles are
>> destined for each node in the cluster, rather than just returning the
>> pre-calculated count. Failing that, you could also stop the source
>> and destination of the queue, restart the node, and then see if the
>> FlowFile is entirely gone from the queue on restart, or if it remains
>> in the queue. If it is gone, then that likely indicates that the
>> pre-computed count is somehow off.
>>
>> 2) We are having trouble communicating with the node that we are trying
>> to send the data to. I would expect some sort of ERROR
>> log messages in this case.
>>
>> 3) The node is properly sending the FlowFile to where it needs to go, but
>> for some reason the receiving node is then re-distributing it
>> to another node in the cluster, which then re-distributes it again, so
>> that it never ends in the correct destination. I think this is unlikely
>> and would be easy to verify by looking at the "Summary" table [1] and
>> doing the "Cluster view" and constantly refreshing for a few seconds
>> to see if the queue changes on any node in the cluster.
>>
>> 4) For some entirely unknown reason, there exists a bug that causes the
>> node to simply see the FlowFile and just skip over it
>> entirely.
>>
>> For additional logging, we can enable DEBUG logging on
>> org.apache.nifi.controller.queue.clustered.client.async.nio.
>> NioAsyncLoadBalanceClientTask:
>> > level="DEBUG" />
>>
>> With that DEBUG logging turned on, it may or may not generate a lot of
>> DEBUG logs. If it does not, then that in and of itself tells us something.
>> If it does generate a lot of DEBUG logs, then it would be good to see
>> what it's dumping out in the logs.
>>
>> And a big Thank You to you guys for staying engaged on this and your
>> willingness to dig in!
>>
>> Thanks!
>> -Mark
>>
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Summary_Page
>>
>>
>> On Dec 19, 2018, at 2:18 AM,  <
>> josef.zahn...@swisscom.com> wrote:
>>
>> Hi Dano
>>
>> Seems that the problem has been seen by a few people but until now nobody
>> from NiFi team really cared about it – except Mark Payne. He mentioned the
>> part below with the diagnostics, however in my case this doesn’t even work
>> (tried it on standalone unsecured cluster as well as on secured cluster)!
>> Can you get the diagnostics on your cluster?
>>
>> I guess at the end we have to open a Jira ticket to narrow it down.
>>
>> Cheers Josef
>>
>>
>> One thing that I would 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread Mark Payne
Hey Dan,

Yes, we will want to get the diagnostics when you're in that state. It's 
probably not worth trying to turn on DEBUG logging
unless you are in that state either. The thread dump shows all the threads with 
no work to do. Which is what I would expect.
The question is: "Why does it not think there's work to do?" So the diagnostics 
and DEBUG logs will hopefully answer that,
once you get back into that state again.

Thanks
-Mark


On Dec 19, 2018, at 10:16 AM, dan young 
mailto:danoyo...@gmail.com>> wrote:

Hello Mark,

I'll try to grab that diagnostics...I assume we want to grab it when we see the 
stuck Flowfile in a queue, correct?

Also, does the nifi thread dump provide anything? This was from the node that 
seemed to have the stuck Flowfile...

Dano

On Wed, Dec 19, 2018, 6:51 AM Mark Payne 
mailto:marka...@hotmail.com> wrote:
Hey Josef, Dano,

Firstly, let me assure you that while I may be the only one from the NiFi side 
who's been engaging on debugging
this, I am far from the only one who cares about it! :) This is a pretty big 
new feature that was added to the latest
release, so understandably there are probably not yet a lot of people who 
understand the code well enough to
debug. I have tried replicating the issue, but have not been successful. I have 
a 3-node cluster that ran for well over
a month without a restart, and i've also tried restarting it every few hours 
for a couple of days. It has about 8 different
load-balanced connections, with varying data sizes and volumes. I've not been 
able to get into this situation, though,
unfortunately.

But yes, I think that we've seen this issue arise from each of the two of you 
and one other on the mailing list, so it
is certainly something that we need to nail down ASAP. Unfortunately, debugging 
an issue that involves communication
between multiple nodes is often difficult to fully understand, so it may not be 
a trivial task to debug.

Dano, if you are able to get to the diagnostics, as Josef mentioned, that is 
likely to be pretty helpful. Off the top of my head,
there are a few possibilities that are coming to mind, as to what kind of bug 
could cause such behavior:

1) Perhaps there really is no flowfile in the queue, but we somehow 
miscalculated the size of the queue. The diagnostics
info would tell us whether or not this is the case. It will look into the 
queues themselves to determine how many FlowFiles are
destined for each node in the cluster, rather than just returning the 
pre-calculated count. Failing that, you could also stop the source
and destination of the queue, restart the node, and then see if the FlowFile is 
entirely gone from the queue on restart, or if it remains
in the queue. If it is gone, then that likely indicates that the pre-computed 
count is somehow off.

2) We are having trouble communicating with the node that we are trying to send 
the data to. I would expect some sort of ERROR
log messages in this case.

3) The node is properly sending the FlowFile to where it needs to go, but for 
some reason the receiving node is then re-distributing it
to another node in the cluster, which then re-distributes it again, so that it 
never ends in the correct destination. I think this is unlikely
and would be easy to verify by looking at the "Summary" table [1] and doing the 
"Cluster view" and constantly refreshing for a few seconds
to see if the queue changes on any node in the cluster.

4) For some entirely unknown reason, there exists a bug that causes the node to 
simply see the FlowFile and just skip over it
entirely.

For additional logging, we can enable DEBUG logging on
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask:


With that DEBUG logging turned on, it may or may not generate a lot of DEBUG 
logs. If it does not, then that in and of itself tells us something.
If it does generate a lot of DEBUG logs, then it would be good to see what it's 
dumping out in the logs.

And a big Thank You to you guys for staying engaged on this and your 
willingness to dig in!

Thanks!
-Mark

[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Summary_Page


On Dec 19, 2018, at 2:18 AM, 
mailto:josef.zahn...@swisscom.com>> 
mailto:josef.zahn...@swisscom.com>> wrote:

Hi Dano

Seems that the problem has been seen by a few people but until now nobody from 
NiFi team really cared about it – except Mark Payne. He mentioned the part 
below with the diagnostics, however in my case this doesn’t even work (tried it 
on standalone unsecured cluster as well as on secured cluster)! Can you get the 
diagnostics on your cluster?

I guess at the end we have to open a Jira ticket to narrow it down.

Cheers Josef


One thing that I would recommend, to get more information, is to go to the REST 
endpoint (in your browser is fine)
/nifi-api/processors//diagnostics

Where  is the UUID of either the source or the destination of the 
Connection in question. This gives us
a lot of information 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread dan young
Hello Mark,

I'll try to grab that diagnostics...I assume we want to grab it when we see
the stuck Flowfile in a queue, correct?

Also, does the nifi thread dump provide anything? This was from the node
that seemed to have the stuck Flowfile...

Dano

On Wed, Dec 19, 2018, 6:51 AM Mark Payne  Hey Josef, Dano,
>
> Firstly, let me assure you that while I may be the only one from the NiFi
> side who's been engaging on debugging
> this, I am far from the only one who cares about it! :) This is a pretty
> big new feature that was added to the latest
> release, so understandably there are probably not yet a lot of people who
> understand the code well enough to
> debug. I have tried replicating the issue, but have not been successful. I
> have a 3-node cluster that ran for well over
> a month without a restart, and i've also tried restarting it every few
> hours for a couple of days. It has about 8 different
> load-balanced connections, with varying data sizes and volumes. I've not
> been able to get into this situation, though,
> unfortunately.
>
> But yes, I think that we've seen this issue arise from each of the two of
> you and one other on the mailing list, so it
> is certainly something that we need to nail down ASAP. Unfortunately,
> debugging an issue that involves communication
> between multiple nodes is often difficult to fully understand, so it may
> not be a trivial task to debug.
>
> Dano, if you are able to get to the diagnostics, as Josef mentioned, that
> is likely to be pretty helpful. Off the top of my head,
> there are a few possibilities that are coming to mind, as to what kind of
> bug could cause such behavior:
>
> 1) Perhaps there really is no flowfile in the queue, but we somehow
> miscalculated the size of the queue. The diagnostics
> info would tell us whether or not this is the case. It will look into the
> queues themselves to determine how many FlowFiles are
> destined for each node in the cluster, rather than just returning the
> pre-calculated count. Failing that, you could also stop the source
> and destination of the queue, restart the node, and then see if the
> FlowFile is entirely gone from the queue on restart, or if it remains
> in the queue. If it is gone, then that likely indicates that the
> pre-computed count is somehow off.
>
> 2) We are having trouble communicating with the node that we are trying to
> send the data to. I would expect some sort of ERROR
> log messages in this case.
>
> 3) The node is properly sending the FlowFile to where it needs to go, but
> for some reason the receiving node is then re-distributing it
> to another node in the cluster, which then re-distributes it again, so
> that it never ends in the correct destination. I think this is unlikely
> and would be easy to verify by looking at the "Summary" table [1] and
> doing the "Cluster view" and constantly refreshing for a few seconds
> to see if the queue changes on any node in the cluster.
>
> 4) For some entirely unknown reason, there exists a bug that causes the
> node to simply see the FlowFile and just skip over it
> entirely.
>
> For additional logging, we can enable DEBUG logging on
> org.apache.nifi.controller.queue.clustered.client.async.nio.
> NioAsyncLoadBalanceClientTask:
>  name="org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask"
> level="DEBUG" />
>
> With that DEBUG logging turned on, it may or may not generate a lot of
> DEBUG logs. If it does not, then that in and of itself tells us something.
> If it does generate a lot of DEBUG logs, then it would be good to see what
> it's dumping out in the logs.
>
> And a big Thank You to you guys for staying engaged on this and your
> willingness to dig in!
>
> Thanks!
> -Mark
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Summary_Page
>
>
> On Dec 19, 2018, at 2:18 AM,  <
> josef.zahn...@swisscom.com> wrote:
>
> Hi Dano
>
> Seems that the problem has been seen by a few people but until now nobody
> from NiFi team really cared about it – except Mark Payne. He mentioned the
> part below with the diagnostics, however in my case this doesn’t even work
> (tried it on standalone unsecured cluster as well as on secured cluster)!
> Can you get the diagnostics on your cluster?
>
> I guess at the end we have to open a Jira ticket to narrow it down.
>
> Cheers Josef
>
>
> One thing that I would recommend, to get more information, is to go to the
> REST endpoint (in your browser is fine)
> /nifi-api/processors//diagnostics
>
> Where  is the UUID of either the source or the destination
> of the Connection in question. This gives us
> a lot of information about the internals of Connection. The easiest way to
> get that Processor ID is to just click on the
> processor on the canvas and look at the Operate palette on the left-hand
> side. You can copy & paste from there. If you
> then send the diagnostics information to us, we can analyze that to help
> understand what's happening.
>
>
>
> 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread Boris Tyukin
we were about to start using this feature but I guess we would have to wait
since so many people having issues with it and there are still no comments
from NiFi developers who implemented it...Thanks for the heads up guys

On Tue, Dec 18, 2018 at 11:27 PM dan young  wrote:

> We're seeing this more frequently where flowfiles seem to be stuck in a
> load balanced queue.  The only resolution is to disconnect the node and
> then restart that node.  After this, the flowfile disappears from the
> queue.  Any ideas on what might be going on here or what additional
> information I might be able to provide to debug this?
>
> I've attached another thread dump and some screen shots
>
>
> Regards,
>
> Dano
>
>


Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread Mark Payne
Hey Josef, Dano,

Firstly, let me assure you that while I may be the only one from the NiFi side 
who's been engaging on debugging
this, I am far from the only one who cares about it! :) This is a pretty big 
new feature that was added to the latest
release, so understandably there are probably not yet a lot of people who 
understand the code well enough to
debug. I have tried replicating the issue, but have not been successful. I have 
a 3-node cluster that ran for well over
a month without a restart, and i've also tried restarting it every few hours 
for a couple of days. It has about 8 different
load-balanced connections, with varying data sizes and volumes. I've not been 
able to get into this situation, though,
unfortunately.

But yes, I think that we've seen this issue arise from each of the two of you 
and one other on the mailing list, so it
is certainly something that we need to nail down ASAP. Unfortunately, debugging 
an issue that involves communication
between multiple nodes is often difficult to fully understand, so it may not be 
a trivial task to debug.

Dano, if you are able to get to the diagnostics, as Josef mentioned, that is 
likely to be pretty helpful. Off the top of my head,
there are a few possibilities that are coming to mind, as to what kind of bug 
could cause such behavior:

1) Perhaps there really is no flowfile in the queue, but we somehow 
miscalculated the size of the queue. The diagnostics
info would tell us whether or not this is the case. It will look into the 
queues themselves to determine how many FlowFiles are
destined for each node in the cluster, rather than just returning the 
pre-calculated count. Failing that, you could also stop the source
and destination of the queue, restart the node, and then see if the FlowFile is 
entirely gone from the queue on restart, or if it remains
in the queue. If it is gone, then that likely indicates that the pre-computed 
count is somehow off.

2) We are having trouble communicating with the node that we are trying to send 
the data to. I would expect some sort of ERROR
log messages in this case.

3) The node is properly sending the FlowFile to where it needs to go, but for 
some reason the receiving node is then re-distributing it
to another node in the cluster, which then re-distributes it again, so that it 
never ends in the correct destination. I think this is unlikely
and would be easy to verify by looking at the "Summary" table [1] and doing the 
"Cluster view" and constantly refreshing for a few seconds
to see if the queue changes on any node in the cluster.

4) For some entirely unknown reason, there exists a bug that causes the node to 
simply see the FlowFile and just skip over it
entirely.

For additional logging, we can enable DEBUG logging on
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask:


With that DEBUG logging turned on, it may or may not generate a lot of DEBUG 
logs. If it does not, then that in and of itself tells us something.
If it does generate a lot of DEBUG logs, then it would be good to see what it's 
dumping out in the logs.

And a big Thank You to you guys for staying engaged on this and your 
willingness to dig in!

Thanks!
-Mark

[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Summary_Page


On Dec 19, 2018, at 2:18 AM, 
mailto:josef.zahn...@swisscom.com>> 
mailto:josef.zahn...@swisscom.com>> wrote:

Hi Dano

Seems that the problem has been seen by a few people but until now nobody from 
NiFi team really cared about it – except Mark Payne. He mentioned the part 
below with the diagnostics, however in my case this doesn’t even work (tried it 
on standalone unsecured cluster as well as on secured cluster)! Can you get the 
diagnostics on your cluster?

I guess at the end we have to open a Jira ticket to narrow it down.

Cheers Josef


One thing that I would recommend, to get more information, is to go to the REST 
endpoint (in your browser is fine)
/nifi-api/processors//diagnostics

Where  is the UUID of either the source or the destination of the 
Connection in question. This gives us
a lot of information about the internals of Connection. The easiest way to get 
that Processor ID is to just click on the
processor on the canvas and look at the Operate palette on the left-hand side. 
You can copy & paste from there. If you
then send the diagnostics information to us, we can analyze that to help 
understand what's happening.



From: dan young mailto:danoyo...@gmail.com>>
Reply-To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Date: Wednesday, 19 December 2018 at 05:28
To: NiFi Mailing List mailto:users@nifi.apache.org>>
Subject: flowfiles stuck in load balanced queue; nifi 1.8

We're seeing this more frequently where flowfiles seem to be stuck in a load 
balanced queue.  The only resolution is to disconnect the node and then restart 
that node.  After this, the flowfile disappears from the queue.  Any 

Re: Adding NiFi processors to MiNiFi

2018-12-19 Thread Aldrin Piri
Hi there,

The processor you are referencing was only provided in 1.8.0 as per the
associated JIRA [1].  You would need to use that version of those artifacts
in this scenario.

We are a bit lacking on docs, but the path you took is correct, just be
mindful of the version where a given component was introduced.  The hope is
we can provide some assistance to make this clearer in conjunction with the
work under way with extension registry as well as toward treating MiNiFi
Java as a specialized assembly of NiFi.

[1] https://issues.apache.org/jira/browse/NIFI-5566

On Wed, Dec 19, 2018 at 6:43 AM luis_size  wrote:

> Hi
>
> I am using MiNiFi 0.5 and would like to add CryptographicHashContent
> processor to my MiNiFi flow. I add the following nars from NiFi lib folder
> to minifi lib folder
>
> nifi-ssl-context-service-nar-1.7.0.nar
> nifi-standard-nar-1.7.0.nar
> nifi-standard-services-api-nar-1.7.0.nar
>
> but it is not able to find the processor:
> ^
>
> o.apache.nifi.controller.FlowController Could not create Processor of type
> org.apache.nifi.processors.standard.CryptographicHashContent for ID  ;
> creating "Ghost" implementation
> org.apache.nifi.controller.exception.ProcessorInstantiationException:
> Unable to find bundle for coordinate default:unknown:unversioned
>
> How to know which nar to add for this particular processor but also in
> general? I looked to github and thought that it should be in standard nars.
>
> Also, how to find which version of NiFi libs I should use for a particular
> minifi version?
>
> Thanks
> Luis
>
>


Adding NiFi processors to MiNiFi

2018-12-19 Thread luis_size
Hi 
I am using MiNiFi 0.5 and would like to add CryptographicHashContent processor 
to my MiNiFi flow. I add the following nars from NiFi lib folder to minifi lib 
folder
nifi-ssl-context-service-nar-1.7.0.nar
nifi-standard-nar-1.7.0.nar
nifi-standard-services-api-nar-1.7.0.nar

but it is not able to find the processor:^
o.apache.nifi.controller.FlowController Could not create Processor of type 
org.apache.nifi.processors.standard.CryptographicHashContent for ID  ; 
creating "Ghost" implementation
org.apache.nifi.controller.exception.ProcessorInstantiationException: Unable to 
find bundle for coordinate default:unknown:unversioned
How to know which nar to add for this particular processor but also in general? 
I looked to github and thought that it should be in standard nars.
Also, how to find which version of NiFi libs I should use for a particular 
minifi version?
ThanksLuis