DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
Hi guys,

I have a few questions about DistributedMapCacheServer.

First question, I am confused by "Distributed" part. If I get it, the
server actually runs on a single node and if it fails, it is game over. Is
that right? Why NiFi is not using ZK for that since ZK is already used by
NiFi cluster? I see most of the use cases / examples are about using
DistributedMapCacheServer as a lookup or state store and this is exactly
what ZK was designed for and provides redundancy, scalability and 5-10k ops
per sec on 3 node ZK cluster.

Second question, I did not find any tools to interact with it other than
Matt's groovy tool.

Third question, how DistributedMapCacheServer that persists to file system,
handles concurrency and locking? Is it reliable and can be trusted?

And lastly, is there additional overhead to support
DistributedMapCacheServer as another system or it is pretty much hands off
once a controller is set up?

Thanks!
Boris


Re: DistributedMapCacheServer questions

2018-11-29 Thread Bryan Bende
Boris,

Yes the "distributed" name is confusing... it is referring to the fact
that it is a cache that can be accessed across the cluster, rather
than a local cache on each node, but you are correct that that DMC
server is a single point of failure.

It is important to separate the DMC client and server, there are
multiple implementations of the DMC client that can interact with
different caches (Redis, HBase, etc), the trade-off being you then
have to run/maintain these external systems, instead of the DMC server
which is fully managed by NiFi.

Regarding ZK... I don't think there is a good answer other than the
fact that DMC existed when NiFi was open sourced, and NiFi didn't
start using ZK for clustering until the 1.0.0 release, so originally
ZK wasn't in the picture. I assume we could implement a DMC client
that talked to ZK, just like we have done for Redis, HBase, and
others.

I'm not aware of any issues with the DMC server persisting to file
system or handling concurrent connections, it should be stable.

Thanks,

Bryan

On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin  wrote:
>
> Hi guys,
>
> I have a few questions about DistributedMapCacheServer.
>
> First question, I am confused by "Distributed" part. If I get it, the server 
> actually runs on a single node and if it fails, it is game over. Is that 
> right? Why NiFi is not using ZK for that since ZK is already used by NiFi 
> cluster? I see most of the use cases / examples are about using 
> DistributedMapCacheServer as a lookup or state store and this is exactly what 
> ZK was designed for and provides redundancy, scalability and 5-10k ops per 
> sec on 3 node ZK cluster.
>
> Second question, I did not find any tools to interact with it other than 
> Matt's groovy tool.
>
> Third question, how DistributedMapCacheServer that persists to file system, 
> handles concurrency and locking? Is it reliable and can be trusted?
>
> And lastly, is there additional overhead to support DistributedMapCacheServer 
> as another system or it is pretty much hands off once a controller is set up?
>
> Thanks!
> Boris


Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread David Gallagher
Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 'global' 
DBCPConnectionPool instances at the Nifi Flow level,  then import flows from 
the registry and have them use the global pools, e.g. in a PutDatabaseRecord 
processor. When I try that, though, the processor is invalid and the Database 
Connection Pooling Service shows 'Incompatible Controller Service Configured'. 
If I manually choose the global controller everything is fine, but is there a 
way to have it work so that the matching is automatic?


Thanks,


Dave


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Bryan Bende
Hi Dave,

Currently there isn't a built in way to make this automatic...

The issue is that the versioned flow in registry has the
PutDatabaseRecord with the DBCP Pool property set to a UUID that only
existed in the original environment the flow was created in.

When you import the flow to another environment, that UUID is
obviously not going to exist, but it is also unclear how to select the
appropriate one. What if there were multiple DBCP connection pools
visible to where the versioned flow is being imported? There would be
no way to know which one to use.

I suppose maybe there could be a convention that if there was only one
matching service of the given type, and it came from the root process
group, then use that one, but its still hard to know if this is really
the right service. What if it was for a different database and someone
didn't realize?

-Bryan

On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
 wrote:
>
> Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 'global' 
> DBCPConnectionPool instances at the Nifi Flow level,  then import flows from 
> the registry and have them use the global pools, e.g. in a PutDatabaseRecord 
> processor. When I try that, though, the processor is invalid and the Database 
> Connection Pooling Service shows 'Incompatible Controller Service 
> Configured'. If I manually choose the global controller everything is fine, 
> but is there a way to have it work so that the matching is automatic?
>
>
> Thanks,
>
>
> Dave


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Bryan Bende
I meant to add that for the case where you are using NiFi's UI to
import the flows, we could consider builder a nice user experience
that prompted the user to select the appropriate services that are
missing, and fill in sensitive properties, etc.

For the scenario where you are using the CLI, or some scripts, to
import the flows, then we can probably build some kind of convention
into those commands, or add additional commands to help with the
situation.
On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
>
> Hi Dave,
>
> Currently there isn't a built in way to make this automatic...
>
> The issue is that the versioned flow in registry has the
> PutDatabaseRecord with the DBCP Pool property set to a UUID that only
> existed in the original environment the flow was created in.
>
> When you import the flow to another environment, that UUID is
> obviously not going to exist, but it is also unclear how to select the
> appropriate one. What if there were multiple DBCP connection pools
> visible to where the versioned flow is being imported? There would be
> no way to know which one to use.
>
> I suppose maybe there could be a convention that if there was only one
> matching service of the given type, and it came from the root process
> group, then use that one, but its still hard to know if this is really
> the right service. What if it was for a different database and someone
> didn't realize?
>
> -Bryan
>
> On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
>  wrote:
> >
> > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 
> > 'global' DBCPConnectionPool instances at the Nifi Flow level,  then import 
> > flows from the registry and have them use the global pools, e.g. in a 
> > PutDatabaseRecord processor. When I try that, though, the processor is 
> > invalid and the Database Connection Pooling Service shows 'Incompatible 
> > Controller Service Configured'. If I manually choose the global controller 
> > everything is fine, but is there a way to have it work so that the matching 
> > is automatic?
> >
> >
> > Thanks,
> >
> >
> > Dave


Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
thanks for the explanation, Bryan! it helps!

Boris

On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:

> Boris,
>
> Yes the "distributed" name is confusing... it is referring to the fact
> that it is a cache that can be accessed across the cluster, rather
> than a local cache on each node, but you are correct that that DMC
> server is a single point of failure.
>
> It is important to separate the DMC client and server, there are
> multiple implementations of the DMC client that can interact with
> different caches (Redis, HBase, etc), the trade-off being you then
> have to run/maintain these external systems, instead of the DMC server
> which is fully managed by NiFi.
>
> Regarding ZK... I don't think there is a good answer other than the
> fact that DMC existed when NiFi was open sourced, and NiFi didn't
> start using ZK for clustering until the 1.0.0 release, so originally
> ZK wasn't in the picture. I assume we could implement a DMC client
> that talked to ZK, just like we have done for Redis, HBase, and
> others.
>
> I'm not aware of any issues with the DMC server persisting to file
> system or handling concurrent connections, it should be stable.
>
> Thanks,
>
> Bryan
>
> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin 
> wrote:
> >
> > Hi guys,
> >
> > I have a few questions about DistributedMapCacheServer.
> >
> > First question, I am confused by "Distributed" part. If I get it, the
> server actually runs on a single node and if it fails, it is game over. Is
> that right? Why NiFi is not using ZK for that since ZK is already used by
> NiFi cluster? I see most of the use cases / examples are about using
> DistributedMapCacheServer as a lookup or state store and this is exactly
> what ZK was designed for and provides redundancy, scalability and 5-10k ops
> per sec on 3 node ZK cluster.
> >
> > Second question, I did not find any tools to interact with it other than
> Matt's groovy tool.
> >
> > Third question, how DistributedMapCacheServer that persists to file
> system, handles concurrency and locking? Is it reliable and can be trusted?
> >
> > And lastly, is there additional overhead to support
> DistributedMapCacheServer as another system or it is pretty much hands off
> once a controller is set up?
> >
> > Thanks!
> > Boris
>


Re: DistributedMapCacheServer questions

2018-11-29 Thread Bryan Bende
I also meant to add that NiFi does provide a "state manager" API to
processors, which when clustered will use ZooKeeper.

The difference between this and DMC, is that the state for a processor
is only accessible to the given processor (or all the instances of the
processor across the cluster). It is stored by the processor's UUID.

So if the state doesn't need to be shared across different parts of
the flow, then you can use this instead. You can look at
ProcesContext.getStateManager()

On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin  wrote:
>
> thanks for the explanation, Bryan! it helps!
>
> Boris
>
> On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:
>>
>> Boris,
>>
>> Yes the "distributed" name is confusing... it is referring to the fact
>> that it is a cache that can be accessed across the cluster, rather
>> than a local cache on each node, but you are correct that that DMC
>> server is a single point of failure.
>>
>> It is important to separate the DMC client and server, there are
>> multiple implementations of the DMC client that can interact with
>> different caches (Redis, HBase, etc), the trade-off being you then
>> have to run/maintain these external systems, instead of the DMC server
>> which is fully managed by NiFi.
>>
>> Regarding ZK... I don't think there is a good answer other than the
>> fact that DMC existed when NiFi was open sourced, and NiFi didn't
>> start using ZK for clustering until the 1.0.0 release, so originally
>> ZK wasn't in the picture. I assume we could implement a DMC client
>> that talked to ZK, just like we have done for Redis, HBase, and
>> others.
>>
>> I'm not aware of any issues with the DMC server persisting to file
>> system or handling concurrent connections, it should be stable.
>>
>> Thanks,
>>
>> Bryan
>>
>> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin  wrote:
>> >
>> > Hi guys,
>> >
>> > I have a few questions about DistributedMapCacheServer.
>> >
>> > First question, I am confused by "Distributed" part. If I get it, the 
>> > server actually runs on a single node and if it fails, it is game over. Is 
>> > that right? Why NiFi is not using ZK for that since ZK is already used by 
>> > NiFi cluster? I see most of the use cases / examples are about using 
>> > DistributedMapCacheServer as a lookup or state store and this is exactly 
>> > what ZK was designed for and provides redundancy, scalability and 5-10k 
>> > ops per sec on 3 node ZK cluster.
>> >
>> > Second question, I did not find any tools to interact with it other than 
>> > Matt's groovy tool.
>> >
>> > Third question, how DistributedMapCacheServer that persists to file 
>> > system, handles concurrency and locking? Is it reliable and can be trusted?
>> >
>> > And lastly, is there additional overhead to support 
>> > DistributedMapCacheServer as another system or it is pretty much hands off 
>> > once a controller is set up?
>> >
>> > Thanks!
>> > Boris


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread David Gallagher
Thanks, Bryan! The main thing I'm trying to avoid is that if I leave the 
controller service in the process group, when I import it to a new environment 
(e.g. production) it would overwrite existing settings. Even if the connection 
properties were identical between environments, sensitive information does not 
propagate and the end user would have to re-enter passwords. This will register 
as a change in version control and cause problems later when I have a new 
version of the flow to import. I guess I can work around that by using the 
variable registry?

Dave


From: Bryan Bende 
Sent: Thursday, November 29, 2018 12:55 PM
To: users@nifi.apache.org
Subject: Re: Using [Nifi Flow]-level controllers with NiFi registry flows

I meant to add that for the case where you are using NiFi's UI to
import the flows, we could consider builder a nice user experience
that prompted the user to select the appropriate services that are
missing, and fill in sensitive properties, etc.

For the scenario where you are using the CLI, or some scripts, to
import the flows, then we can probably build some kind of convention
into those commands, or add additional commands to help with the
situation.
On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
>
> Hi Dave,
>
> Currently there isn't a built in way to make this automatic...
>
> The issue is that the versioned flow in registry has the
> PutDatabaseRecord with the DBCP Pool property set to a UUID that only
> existed in the original environment the flow was created in.
>
> When you import the flow to another environment, that UUID is
> obviously not going to exist, but it is also unclear how to select the
> appropriate one. What if there were multiple DBCP connection pools
> visible to where the versioned flow is being imported? There would be
> no way to know which one to use.
>
> I suppose maybe there could be a convention that if there was only one
> matching service of the given type, and it came from the root process
> group, then use that one, but its still hard to know if this is really
> the right service. What if it was for a different database and someone
> didn't realize?
>
> -Bryan
>
> On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
>  wrote:
> >
> > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 
> > 'global' DBCPConnectionPool instances at the Nifi Flow level,  then import 
> > flows from the registry and have them use the global pools, e.g. in a 
> > PutDatabaseRecord processor. When I try that, though, the processor is 
> > invalid and the Database Connection Pooling Service shows 'Incompatible 
> > Controller Service Configured'. If I manually choose the global controller 
> > everything is fine, but is there a way to have it work so that the matching 
> > is automatic?
> >
> >
> > Thanks,
> >
> >
> > Dave


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Ed B
Bryan,
What if we refer controller services not only by UUID, but also by name (at
least in registry).
then, during deployment, if matching CS by UUID doesn't exist, we could
check all available CS by a name, and if there is only one matching by type
and name CS, we could use it, otherwise - current functionality should be
fine.
Thoughts?

On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:

> I meant to add that for the case where you are using NiFi's UI to
> import the flows, we could consider builder a nice user experience
> that prompted the user to select the appropriate services that are
> missing, and fill in sensitive properties, etc.
>
> For the scenario where you are using the CLI, or some scripts, to
> import the flows, then we can probably build some kind of convention
> into those commands, or add additional commands to help with the
> situation.
> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
> >
> > Hi Dave,
> >
> > Currently there isn't a built in way to make this automatic...
> >
> > The issue is that the versioned flow in registry has the
> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
> > existed in the original environment the flow was created in.
> >
> > When you import the flow to another environment, that UUID is
> > obviously not going to exist, but it is also unclear how to select the
> > appropriate one. What if there were multiple DBCP connection pools
> > visible to where the versioned flow is being imported? There would be
> > no way to know which one to use.
> >
> > I suppose maybe there could be a convention that if there was only one
> > matching service of the given type, and it came from the root process
> > group, then use that one, but its still hard to know if this is really
> > the right service. What if it was for a different database and someone
> > didn't realize?
> >
> > -Bryan
> >
> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
> >  wrote:
> > >
> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have
> 'global' DBCPConnectionPool instances at the Nifi Flow level,  then import
> flows from the registry and have them use the global pools, e.g. in a
> PutDatabaseRecord processor. When I try that, though, the processor is
> invalid and the Database Connection Pooling Service shows 'Incompatible
> Controller Service Configured'. If I manually choose the global controller
> everything is fine, but is there a way to have it work so that the matching
> is automatic?
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Dave
>


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Pierre Villard
Hey Dave,

Just want to jump in regarding:
"Even if the connection properties were identical between environments,
sensitive information does not propagate and the end user would have to
re-enter passwords. This will register as a change in version control and
cause problems later when I have a new version of the flow to import."

I believe that is not correct. It is true that during the initial import of
the flow, the user would have to re-enter passwords. However this will not
be considered as a change in version control and the value won't be
changed/removed when upgrading to a new version.

Pierre



Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :

> Bryan,
> What if we refer controller services not only by UUID, but also by name
> (at least in registry).
> then, during deployment, if matching CS by UUID doesn't exist, we could
> check all available CS by a name, and if there is only one matching by type
> and name CS, we could use it, otherwise - current functionality should be
> fine.
> Thoughts?
>
> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:
>
>> I meant to add that for the case where you are using NiFi's UI to
>> import the flows, we could consider builder a nice user experience
>> that prompted the user to select the appropriate services that are
>> missing, and fill in sensitive properties, etc.
>>
>> For the scenario where you are using the CLI, or some scripts, to
>> import the flows, then we can probably build some kind of convention
>> into those commands, or add additional commands to help with the
>> situation.
>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
>> >
>> > Hi Dave,
>> >
>> > Currently there isn't a built in way to make this automatic...
>> >
>> > The issue is that the versioned flow in registry has the
>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
>> > existed in the original environment the flow was created in.
>> >
>> > When you import the flow to another environment, that UUID is
>> > obviously not going to exist, but it is also unclear how to select the
>> > appropriate one. What if there were multiple DBCP connection pools
>> > visible to where the versioned flow is being imported? There would be
>> > no way to know which one to use.
>> >
>> > I suppose maybe there could be a convention that if there was only one
>> > matching service of the given type, and it came from the root process
>> > group, then use that one, but its still hard to know if this is really
>> > the right service. What if it was for a different database and someone
>> > didn't realize?
>> >
>> > -Bryan
>> >
>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
>> >  wrote:
>> > >
>> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have
>> 'global' DBCPConnectionPool instances at the Nifi Flow level,  then import
>> flows from the registry and have them use the global pools, e.g. in a
>> PutDatabaseRecord processor. When I try that, though, the processor is
>> invalid and the Database Connection Pooling Service shows 'Incompatible
>> Controller Service Configured'. If I manually choose the global controller
>> everything is fine, but is there a way to have it work so that the matching
>> is automatic?
>> > >
>> > >
>> > > Thanks,
>> > >
>> > >
>> > > Dave
>>
>


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Bryan Bende
Dave,

That is true that sensitive properties don't propagate to the registry
and will have to be re-entered upon import, however when the values
are entered it should not trigger a change that needs to be committed,
and whatever is entered locally should be retained across future
upgrades of the versioned process group.

Ed,

I think the challenge is how to correctly capture this information. In
NiFi and NiFi Registry, the  property values are a Map
so there would be an entry like "dbcp-connection-pool" ->
"1234-1233-1234-1234", so somewhere in the versioned process group we
would need another map that said "1234-1233-1234-1234" -> "Foo DBCP"
so that we could get the name later during import. I guess every time
a new version is saved you would go through and identify all services
referenced but not included, and then create this map. Not saying it
can't be done, just needs some thought.
On Thu, Nov 29, 2018 at 1:48 PM Pierre Villard
 wrote:
>
> Hey Dave,
>
> Just want to jump in regarding:
> "Even if the connection properties were identical between environments, 
> sensitive information does not propagate and the end user would have to 
> re-enter passwords. This will register as a change in version control and 
> cause problems later when I have a new version of the flow to import."
>
> I believe that is not correct. It is true that during the initial import of 
> the flow, the user would have to re-enter passwords. However this will not be 
> considered as a change in version control and the value won't be 
> changed/removed when upgrading to a new version.
>
> Pierre
>
>
>
> Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :
>>
>> Bryan,
>> What if we refer controller services not only by UUID, but also by name (at 
>> least in registry).
>> then, during deployment, if matching CS by UUID doesn't exist, we could 
>> check all available CS by a name, and if there is only one matching by type 
>> and name CS, we could use it, otherwise - current functionality should be 
>> fine.
>> Thoughts?
>>
>> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:
>>>
>>> I meant to add that for the case where you are using NiFi's UI to
>>> import the flows, we could consider builder a nice user experience
>>> that prompted the user to select the appropriate services that are
>>> missing, and fill in sensitive properties, etc.
>>>
>>> For the scenario where you are using the CLI, or some scripts, to
>>> import the flows, then we can probably build some kind of convention
>>> into those commands, or add additional commands to help with the
>>> situation.
>>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
>>> >
>>> > Hi Dave,
>>> >
>>> > Currently there isn't a built in way to make this automatic...
>>> >
>>> > The issue is that the versioned flow in registry has the
>>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
>>> > existed in the original environment the flow was created in.
>>> >
>>> > When you import the flow to another environment, that UUID is
>>> > obviously not going to exist, but it is also unclear how to select the
>>> > appropriate one. What if there were multiple DBCP connection pools
>>> > visible to where the versioned flow is being imported? There would be
>>> > no way to know which one to use.
>>> >
>>> > I suppose maybe there could be a convention that if there was only one
>>> > matching service of the given type, and it came from the root process
>>> > group, then use that one, but its still hard to know if this is really
>>> > the right service. What if it was for a different database and someone
>>> > didn't realize?
>>> >
>>> > -Bryan
>>> >
>>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
>>> >  wrote:
>>> > >
>>> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 
>>> > > 'global' DBCPConnectionPool instances at the Nifi Flow level,  then 
>>> > > import flows from the registry and have them use the global pools, e.g. 
>>> > > in a PutDatabaseRecord processor. When I try that, though, the 
>>> > > processor is invalid and the Database Connection Pooling Service shows 
>>> > > 'Incompatible Controller Service Configured'. If I manually choose the 
>>> > > global controller everything is fine, but is there a way to have it 
>>> > > work so that the matching is automatic?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Dave


Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
thanks, already looked at state manager but unfortunately need to share
some values between processors in my case.

I am also researching another which is to use our internal MySQL database.
I was thinking to create an indexed table and a few simple groovy
processors around it to put/get/remove values. That database is already set
up for online replication to another MySQL instance and we can set it up
for HA easily. I know it sounds like more work than just using NiFi
distributed cache and I am not sure if MySQL will handle 1000 requests per
second (even though they will be against a tiny table). But HA setup would
be nice for us and since "distributed" cache is not really distributed, I
am not sure I like it.

ZK is an option as well I think since we already have it (for NiFi, Kafka
and HDFS). Looks like I can create some simple groovy processors to use ZK
API. I do not expect a lot of put/get operations - maybe about 1000 per
second max and based on benchmarks I've seen ZK should be able to handle
this.

I've looked at Redis as well and it is awesome but we are not excited to
add another system to maintain - we already have quite a few to keep our
admins busy :)

At least I have choices... :)

Thanks again for your help!

On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende  wrote:

> I also meant to add that NiFi does provide a "state manager" API to
> processors, which when clustered will use ZooKeeper.
>
> The difference between this and DMC, is that the state for a processor
> is only accessible to the given processor (or all the instances of the
> processor across the cluster). It is stored by the processor's UUID.
>
> So if the state doesn't need to be shared across different parts of
> the flow, then you can use this instead. You can look at
> ProcesContext.getStateManager()
>
> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin 
> wrote:
> >
> > thanks for the explanation, Bryan! it helps!
> >
> > Boris
> >
> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:
> >>
> >> Boris,
> >>
> >> Yes the "distributed" name is confusing... it is referring to the fact
> >> that it is a cache that can be accessed across the cluster, rather
> >> than a local cache on each node, but you are correct that that DMC
> >> server is a single point of failure.
> >>
> >> It is important to separate the DMC client and server, there are
> >> multiple implementations of the DMC client that can interact with
> >> different caches (Redis, HBase, etc), the trade-off being you then
> >> have to run/maintain these external systems, instead of the DMC server
> >> which is fully managed by NiFi.
> >>
> >> Regarding ZK... I don't think there is a good answer other than the
> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
> >> start using ZK for clustering until the 1.0.0 release, so originally
> >> ZK wasn't in the picture. I assume we could implement a DMC client
> >> that talked to ZK, just like we have done for Redis, HBase, and
> >> others.
> >>
> >> I'm not aware of any issues with the DMC server persisting to file
> >> system or handling concurrent connections, it should be stable.
> >>
> >> Thanks,
> >>
> >> Bryan
> >>
> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin 
> wrote:
> >> >
> >> > Hi guys,
> >> >
> >> > I have a few questions about DistributedMapCacheServer.
> >> >
> >> > First question, I am confused by "Distributed" part. If I get it, the
> server actually runs on a single node and if it fails, it is game over. Is
> that right? Why NiFi is not using ZK for that since ZK is already used by
> NiFi cluster? I see most of the use cases / examples are about using
> DistributedMapCacheServer as a lookup or state store and this is exactly
> what ZK was designed for and provides redundancy, scalability and 5-10k ops
> per sec on 3 node ZK cluster.
> >> >
> >> > Second question, I did not find any tools to interact with it other
> than Matt's groovy tool.
> >> >
> >> > Third question, how DistributedMapCacheServer that persists to file
> system, handles concurrency and locking? Is it reliable and can be trusted?
> >> >
> >> > And lastly, is there additional overhead to support
> DistributedMapCacheServer as another system or it is pretty much hands off
> once a controller is set up?
> >> >
> >> > Thanks!
> >> > Boris
>


Re: DistributedMapCacheServer questions

2018-11-29 Thread Bryan Bende
Makes sense, you could possibly implement a
DatabaseDistributedMapCacheClient [1] which would then let you use the
Fetch/Put DMC processors against your MySQL DB.

Although it is probably non-trivial to implement, and you could do
everything faster with some Groovy processors :)

[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-distributed-cache-client-service-api/src/main/java/org/apache/nifi/distributed/cache/client/DistributedMapCacheClient.java
On Thu, Nov 29, 2018 at 1:59 PM Boris Tyukin  wrote:
>
> thanks, already looked at state manager but unfortunately need to share some 
> values between processors in my case.
>
> I am also researching another which is to use our internal MySQL database. I 
> was thinking to create an indexed table and a few simple groovy processors 
> around it to put/get/remove values. That database is already set up for 
> online replication to another MySQL instance and we can set it up for HA 
> easily. I know it sounds like more work than just using NiFi distributed 
> cache and I am not sure if MySQL will handle 1000 requests per second (even 
> though they will be against a tiny table). But HA setup would be nice for us 
> and since "distributed" cache is not really distributed, I am not sure I like 
> it.
>
> ZK is an option as well I think since we already have it (for NiFi, Kafka and 
> HDFS). Looks like I can create some simple groovy processors to use ZK API. I 
> do not expect a lot of put/get operations - maybe about 1000 per second max 
> and based on benchmarks I've seen ZK should be able to handle this.
>
> I've looked at Redis as well and it is awesome but we are not excited to add 
> another system to maintain - we already have quite a few to keep our admins 
> busy :)
>
> At least I have choices... :)
>
> Thanks again for your help!
>
> On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende  wrote:
>>
>> I also meant to add that NiFi does provide a "state manager" API to
>> processors, which when clustered will use ZooKeeper.
>>
>> The difference between this and DMC, is that the state for a processor
>> is only accessible to the given processor (or all the instances of the
>> processor across the cluster). It is stored by the processor's UUID.
>>
>> So if the state doesn't need to be shared across different parts of
>> the flow, then you can use this instead. You can look at
>> ProcesContext.getStateManager()
>>
>> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin  wrote:
>> >
>> > thanks for the explanation, Bryan! it helps!
>> >
>> > Boris
>> >
>> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:
>> >>
>> >> Boris,
>> >>
>> >> Yes the "distributed" name is confusing... it is referring to the fact
>> >> that it is a cache that can be accessed across the cluster, rather
>> >> than a local cache on each node, but you are correct that that DMC
>> >> server is a single point of failure.
>> >>
>> >> It is important to separate the DMC client and server, there are
>> >> multiple implementations of the DMC client that can interact with
>> >> different caches (Redis, HBase, etc), the trade-off being you then
>> >> have to run/maintain these external systems, instead of the DMC server
>> >> which is fully managed by NiFi.
>> >>
>> >> Regarding ZK... I don't think there is a good answer other than the
>> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
>> >> start using ZK for clustering until the 1.0.0 release, so originally
>> >> ZK wasn't in the picture. I assume we could implement a DMC client
>> >> that talked to ZK, just like we have done for Redis, HBase, and
>> >> others.
>> >>
>> >> I'm not aware of any issues with the DMC server persisting to file
>> >> system or handling concurrent connections, it should be stable.
>> >>
>> >> Thanks,
>> >>
>> >> Bryan
>> >>
>> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin  
>> >> wrote:
>> >> >
>> >> > Hi guys,
>> >> >
>> >> > I have a few questions about DistributedMapCacheServer.
>> >> >
>> >> > First question, I am confused by "Distributed" part. If I get it, the 
>> >> > server actually runs on a single node and if it fails, it is game over. 
>> >> > Is that right? Why NiFi is not using ZK for that since ZK is already 
>> >> > used by NiFi cluster? I see most of the use cases / examples are about 
>> >> > using DistributedMapCacheServer as a lookup or state store and this is 
>> >> > exactly what ZK was designed for and provides redundancy, scalability 
>> >> > and 5-10k ops per sec on 3 node ZK cluster.
>> >> >
>> >> > Second question, I did not find any tools to interact with it other 
>> >> > than Matt's groovy tool.
>> >> >
>> >> > Third question, how DistributedMapCacheServer that persists to file 
>> >> > system, handles concurrency and locking? Is it reliable and can be 
>> >> > trusted?
>> >> >
>> >> > And lastly, is there additional overhead to support 
>> >> > DistributedMapCacheServer as another system or it is pretty much hands 
>> >> > off once a con

Re: DistributedMapCacheServer questions

2018-11-29 Thread Otto Fowler
Maybe you can open a jira for a ZK client like brian mentions?


On November 29, 2018 at 13:59:36, Boris Tyukin (bo...@boristyukin.com)
wrote:

thanks, already looked at state manager but unfortunately need to share
some values between processors in my case.

I am also researching another which is to use our internal MySQL database.
I was thinking to create an indexed table and a few simple groovy
processors around it to put/get/remove values. That database is already set
up for online replication to another MySQL instance and we can set it up
for HA easily. I know it sounds like more work than just using NiFi
distributed cache and I am not sure if MySQL will handle 1000 requests per
second (even though they will be against a tiny table). But HA setup would
be nice for us and since "distributed" cache is not really distributed, I
am not sure I like it.

ZK is an option as well I think since we already have it (for NiFi, Kafka
and HDFS). Looks like I can create some simple groovy processors to use ZK
API. I do not expect a lot of put/get operations - maybe about 1000 per
second max and based on benchmarks I've seen ZK should be able to handle
this.

I've looked at Redis as well and it is awesome but we are not excited to
add another system to maintain - we already have quite a few to keep our
admins busy :)

At least I have choices... :)

Thanks again for your help!

On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende  wrote:

> I also meant to add that NiFi does provide a "state manager" API to
> processors, which when clustered will use ZooKeeper.
>
> The difference between this and DMC, is that the state for a processor
> is only accessible to the given processor (or all the instances of the
> processor across the cluster). It is stored by the processor's UUID.
>
> So if the state doesn't need to be shared across different parts of
> the flow, then you can use this instead. You can look at
> ProcesContext.getStateManager()
>
> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin 
> wrote:
> >
> > thanks for the explanation, Bryan! it helps!
> >
> > Boris
> >
> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:
> >>
> >> Boris,
> >>
> >> Yes the "distributed" name is confusing... it is referring to the fact
> >> that it is a cache that can be accessed across the cluster, rather
> >> than a local cache on each node, but you are correct that that DMC
> >> server is a single point of failure.
> >>
> >> It is important to separate the DMC client and server, there are
> >> multiple implementations of the DMC client that can interact with
> >> different caches (Redis, HBase, etc), the trade-off being you then
> >> have to run/maintain these external systems, instead of the DMC server
> >> which is fully managed by NiFi.
> >>
> >> Regarding ZK... I don't think there is a good answer other than the
> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
> >> start using ZK for clustering until the 1.0.0 release, so originally
> >> ZK wasn't in the picture. I assume we could implement a DMC client
> >> that talked to ZK, just like we have done for Redis, HBase, and
> >> others.
> >>
> >> I'm not aware of any issues with the DMC server persisting to file
> >> system or handling concurrent connections, it should be stable.
> >>
> >> Thanks,
> >>
> >> Bryan
> >>
> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin 
> wrote:
> >> >
> >> > Hi guys,
> >> >
> >> > I have a few questions about DistributedMapCacheServer.
> >> >
> >> > First question, I am confused by "Distributed" part. If I get it, the
> server actually runs on a single node and if it fails, it is game over. Is
> that right? Why NiFi is not using ZK for that since ZK is already used by
> NiFi cluster? I see most of the use cases / examples are about using
> DistributedMapCacheServer as a lookup or state store and this is exactly
> what ZK was designed for and provides redundancy, scalability and 5-10k ops
> per sec on 3 node ZK cluster.
> >> >
> >> > Second question, I did not find any tools to interact with it other
> than Matt's groovy tool.
> >> >
> >> > Third question, how DistributedMapCacheServer that persists to file
> system, handles concurrency and locking? Is it reliable and can be trusted?
> >> >
> >> > And lastly, is there additional overhead to support
> DistributedMapCacheServer as another system or it is pretty much hands off
> once a controller is set up?
> >> >
> >> > Thanks!
> >> > Boris
>


Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
thanks guys

here is new Jira as requested
https://issues.apache.org/jira/browse/NIFI-5853



On Thu, Nov 29, 2018 at 2:06 PM Otto Fowler  wrote:

> Maybe you can open a jira for a ZK client like brian mentions?
>
>
> On November 29, 2018 at 13:59:36, Boris Tyukin (bo...@boristyukin.com)
> wrote:
>
> thanks, already looked at state manager but unfortunately need to share
> some values between processors in my case.
>
> I am also researching another which is to use our internal MySQL database.
> I was thinking to create an indexed table and a few simple groovy
> processors around it to put/get/remove values. That database is already set
> up for online replication to another MySQL instance and we can set it up
> for HA easily. I know it sounds like more work than just using NiFi
> distributed cache and I am not sure if MySQL will handle 1000 requests per
> second (even though they will be against a tiny table). But HA setup would
> be nice for us and since "distributed" cache is not really distributed, I
> am not sure I like it.
>
> ZK is an option as well I think since we already have it (for NiFi, Kafka
> and HDFS). Looks like I can create some simple groovy processors to use ZK
> API. I do not expect a lot of put/get operations - maybe about 1000 per
> second max and based on benchmarks I've seen ZK should be able to handle
> this.
>
> I've looked at Redis as well and it is awesome but we are not excited to
> add another system to maintain - we already have quite a few to keep our
> admins busy :)
>
> At least I have choices... :)
>
> Thanks again for your help!
>
> On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende  wrote:
>
>> I also meant to add that NiFi does provide a "state manager" API to
>> processors, which when clustered will use ZooKeeper.
>>
>> The difference between this and DMC, is that the state for a processor
>> is only accessible to the given processor (or all the instances of the
>> processor across the cluster). It is stored by the processor's UUID.
>>
>> So if the state doesn't need to be shared across different parts of
>> the flow, then you can use this instead. You can look at
>> ProcesContext.getStateManager()
>>
>> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin 
>> wrote:
>> >
>> > thanks for the explanation, Bryan! it helps!
>> >
>> > Boris
>> >
>> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende  wrote:
>> >>
>> >> Boris,
>> >>
>> >> Yes the "distributed" name is confusing... it is referring to the fact
>> >> that it is a cache that can be accessed across the cluster, rather
>> >> than a local cache on each node, but you are correct that that DMC
>> >> server is a single point of failure.
>> >>
>> >> It is important to separate the DMC client and server, there are
>> >> multiple implementations of the DMC client that can interact with
>> >> different caches (Redis, HBase, etc), the trade-off being you then
>> >> have to run/maintain these external systems, instead of the DMC server
>> >> which is fully managed by NiFi.
>> >>
>> >> Regarding ZK... I don't think there is a good answer other than the
>> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
>> >> start using ZK for clustering until the 1.0.0 release, so originally
>> >> ZK wasn't in the picture. I assume we could implement a DMC client
>> >> that talked to ZK, just like we have done for Redis, HBase, and
>> >> others.
>> >>
>> >> I'm not aware of any issues with the DMC server persisting to file
>> >> system or handling concurrent connections, it should be stable.
>> >>
>> >> Thanks,
>> >>
>> >> Bryan
>> >>
>> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin 
>> wrote:
>> >> >
>> >> > Hi guys,
>> >> >
>> >> > I have a few questions about DistributedMapCacheServer.
>> >> >
>> >> > First question, I am confused by "Distributed" part. If I get it,
>> the server actually runs on a single node and if it fails, it is game over.
>> Is that right? Why NiFi is not using ZK for that since ZK is already used
>> by NiFi cluster? I see most of the use cases / examples are about using
>> DistributedMapCacheServer as a lookup or state store and this is exactly
>> what ZK was designed for and provides redundancy, scalability and 5-10k ops
>> per sec on 3 node ZK cluster.
>> >> >
>> >> > Second question, I did not find any tools to interact with it other
>> than Matt's groovy tool.
>> >> >
>> >> > Third question, how DistributedMapCacheServer that persists to file
>> system, handles concurrency and locking? Is it reliable and can be trusted?
>> >> >
>> >> > And lastly, is there additional overhead to support
>> DistributedMapCacheServer as another system or it is pretty much hands off
>> once a controller is set up?
>> >> >
>> >> > Thanks!
>> >> > Boris
>>
>


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Ed B
Regardless the current implementation... if this is an acceptable idea, I
can open JIRA for new feature and have separate discussion under that
ticket. Does it makes sense?


On Thu, Nov 29, 2018 at 12:50 PM Bryan Bende  wrote:

> Dave,
>
> That is true that sensitive properties don't propagate to the registry
> and will have to be re-entered upon import, however when the values
> are entered it should not trigger a change that needs to be committed,
> and whatever is entered locally should be retained across future
> upgrades of the versioned process group.
>
> Ed,
>
> I think the challenge is how to correctly capture this information. In
> NiFi and NiFi Registry, the  property values are a Map
> so there would be an entry like "dbcp-connection-pool" ->
> "1234-1233-1234-1234", so somewhere in the versioned process group we
> would need another map that said "1234-1233-1234-1234" -> "Foo DBCP"
> so that we could get the name later during import. I guess every time
> a new version is saved you would go through and identify all services
> referenced but not included, and then create this map. Not saying it
> can't be done, just needs some thought.
> On Thu, Nov 29, 2018 at 1:48 PM Pierre Villard
>  wrote:
> >
> > Hey Dave,
> >
> > Just want to jump in regarding:
> > "Even if the connection properties were identical between environments,
> sensitive information does not propagate and the end user would have to
> re-enter passwords. This will register as a change in version control and
> cause problems later when I have a new version of the flow to import."
> >
> > I believe that is not correct. It is true that during the initial import
> of the flow, the user would have to re-enter passwords. However this will
> not be considered as a change in version control and the value won't be
> changed/removed when upgrading to a new version.
> >
> > Pierre
> >
> >
> >
> > Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :
> >>
> >> Bryan,
> >> What if we refer controller services not only by UUID, but also by name
> (at least in registry).
> >> then, during deployment, if matching CS by UUID doesn't exist, we could
> check all available CS by a name, and if there is only one matching by type
> and name CS, we could use it, otherwise - current functionality should be
> fine.
> >> Thoughts?
> >>
> >> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:
> >>>
> >>> I meant to add that for the case where you are using NiFi's UI to
> >>> import the flows, we could consider builder a nice user experience
> >>> that prompted the user to select the appropriate services that are
> >>> missing, and fill in sensitive properties, etc.
> >>>
> >>> For the scenario where you are using the CLI, or some scripts, to
> >>> import the flows, then we can probably build some kind of convention
> >>> into those commands, or add additional commands to help with the
> >>> situation.
> >>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
> >>> >
> >>> > Hi Dave,
> >>> >
> >>> > Currently there isn't a built in way to make this automatic...
> >>> >
> >>> > The issue is that the versioned flow in registry has the
> >>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
> >>> > existed in the original environment the flow was created in.
> >>> >
> >>> > When you import the flow to another environment, that UUID is
> >>> > obviously not going to exist, but it is also unclear how to select
> the
> >>> > appropriate one. What if there were multiple DBCP connection pools
> >>> > visible to where the versioned flow is being imported? There would be
> >>> > no way to know which one to use.
> >>> >
> >>> > I suppose maybe there could be a convention that if there was only
> one
> >>> > matching service of the given type, and it came from the root process
> >>> > group, then use that one, but its still hard to know if this is
> really
> >>> > the right service. What if it was for a different database and
> someone
> >>> > didn't realize?
> >>> >
> >>> > -Bryan
> >>> >
> >>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
> >>> >  wrote:
> >>> > >
> >>> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to
> have 'global' DBCPConnectionPool instances at the Nifi Flow level,  then
> import flows from the registry and have them use the global pools, e.g. in
> a PutDatabaseRecord processor. When I try that, though, the processor is
> invalid and the Database Connection Pooling Service shows 'Incompatible
> Controller Service Configured'. If I manually choose the global controller
> everything is fine, but is there a way to have it work so that the matching
> is automatic?
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > >
> >>> > > Dave
>


Best practices to replicate a large flow from a standalone NiFi to a cluster

2018-11-29 Thread Erik Anderson
We are just starting to play with NiFi clusters. I have a question more about 
migration of a data flow from a standalone instance of NiFi.

Lets say, we create an advanced data flow in a stand alone instance of NiFi in 
a corporate environment. Corporate environments bring in new challenges, like 
“StandardProxyConfiguration”, “StandardRestrictedSSLContext”, Kerberos, 
HIVEConnectionPool, etc.

Whats the best way to “replicate” a very complex flow, setup in standalone 
instances of NiFi, to a cluster setup using ZooKeeper? I would prefer to use 
the NiFi Flows Registry.

Erik Anderson
Bloomberg


Re: Best practices to replicate a large flow from a standalone NiFi to a cluster

2018-11-29 Thread Andy LoPresto
The NiFi Registry is the appropriate tool for this scenario. If both the 
standalone (“dev”) NiFi instance and the clustered (“prod”) NiFi instance have 
shared network access, you can instantiate a single NiFi Registry instance and 
add a Registry Client pointing to that Registry to each NiFi instance. The 
“dev” NiFi can commit a flow to the Registry, and the “prod” NiFi can 
instantiate it via “Add Process Group > Import…”. You can even set event hooks 
to automate this process if desired. 

If you have your “dev” and “prod” instances separated so that they do not have 
access to the same NiFi Registry, you would create multiple Registry instances, 
and use a tool like the NiFi CLI to replicate flow definitions from “dev” 
Registry to “prod” Registry. 

I’ve attached a diagram (originally created by Kevin Doran for Hortonworks, I 
believe) illustrating a possible deployment scenario. 




Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 29, 2018, at 12:03 PM, Erik Anderson  wrote:
> 
> We are just starting to play with NiFi clusters. I have a question more about 
> migration of a data flow from a standalone instance of NiFi.
> 
> Lets say, we create an advanced data flow in a stand alone instance of NiFi 
> in a corporate environment. Corporate environments bring in new challenges, 
> like “StandardProxyConfiguration”, “StandardRestrictedSSLContext”, Kerberos, 
> HIVEConnectionPool, etc.
> 
> Whats the best way to “replicate” a very complex flow, setup in standalone 
> instances of NiFi, to a cluster setup using ZooKeeper? I would prefer to use 
> the NiFi Flows Registry.
> 
> Erik Anderson
> Bloomberg



Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread David Gallagher
@Bryan Bende, @Pierre, you are correct! I'm seeing a 
version change because I changed the database URL, not due to the passwords. 
Hopefully I can move those settings to the database registry instead.

From: Bryan Bende 
Sent: Thursday, November 29, 2018 1:50 PM
To: users@nifi.apache.org
Subject: Re: Using [Nifi Flow]-level controllers with NiFi registry flows

Dave,

That is true that sensitive properties don't propagate to the registry
and will have to be re-entered upon import, however when the values
are entered it should not trigger a change that needs to be committed,
and whatever is entered locally should be retained across future
upgrades of the versioned process group.

Ed,

I think the challenge is how to correctly capture this information. In
NiFi and NiFi Registry, the  property values are a Map
so there would be an entry like "dbcp-connection-pool" ->
"1234-1233-1234-1234", so somewhere in the versioned process group we
would need another map that said "1234-1233-1234-1234" -> "Foo DBCP"
so that we could get the name later during import. I guess every time
a new version is saved you would go through and identify all services
referenced but not included, and then create this map. Not saying it
can't be done, just needs some thought.
On Thu, Nov 29, 2018 at 1:48 PM Pierre Villard
 wrote:
>
> Hey Dave,
>
> Just want to jump in regarding:
> "Even if the connection properties were identical between environments, 
> sensitive information does not propagate and the end user would have to 
> re-enter passwords. This will register as a change in version control and 
> cause problems later when I have a new version of the flow to import."
>
> I believe that is not correct. It is true that during the initial import of 
> the flow, the user would have to re-enter passwords. However this will not be 
> considered as a change in version control and the value won't be 
> changed/removed when upgrading to a new version.
>
> Pierre
>
>
>
> Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :
>>
>> Bryan,
>> What if we refer controller services not only by UUID, but also by name (at 
>> least in registry).
>> then, during deployment, if matching CS by UUID doesn't exist, we could 
>> check all available CS by a name, and if there is only one matching by type 
>> and name CS, we could use it, otherwise - current functionality should be 
>> fine.
>> Thoughts?
>>
>> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:
>>>
>>> I meant to add that for the case where you are using NiFi's UI to
>>> import the flows, we could consider builder a nice user experience
>>> that prompted the user to select the appropriate services that are
>>> missing, and fill in sensitive properties, etc.
>>>
>>> For the scenario where you are using the CLI, or some scripts, to
>>> import the flows, then we can probably build some kind of convention
>>> into those commands, or add additional commands to help with the
>>> situation.
>>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
>>> >
>>> > Hi Dave,
>>> >
>>> > Currently there isn't a built in way to make this automatic...
>>> >
>>> > The issue is that the versioned flow in registry has the
>>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
>>> > existed in the original environment the flow was created in.
>>> >
>>> > When you import the flow to another environment, that UUID is
>>> > obviously not going to exist, but it is also unclear how to select the
>>> > appropriate one. What if there were multiple DBCP connection pools
>>> > visible to where the versioned flow is being imported? There would be
>>> > no way to know which one to use.
>>> >
>>> > I suppose maybe there could be a convention that if there was only one
>>> > matching service of the given type, and it came from the root process
>>> > group, then use that one, but its still hard to know if this is really
>>> > the right service. What if it was for a different database and someone
>>> > didn't realize?
>>> >
>>> > -Bryan
>>> >
>>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
>>> >  wrote:
>>> > >
>>> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 
>>> > > 'global' DBCPConnectionPool instances at the Nifi Flow level,  then 
>>> > > import flows from the registry and have them use the global pools, e.g. 
>>> > > in a PutDatabaseRecord processor. When I try that, though, the 
>>> > > processor is invalid and the Database Connection Pooling Service shows 
>>> > > 'Incompatible Controller Service Configured'. If I manually choose the 
>>> > > global controller everything is fine, but is there a way to have it 
>>> > > work so that the matching is automatic?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > >
>>> > > Dave


Problems with NiFi Registry Conflicts after Processor Upgrades

2018-11-29 Thread Peter Wicks (pwicks)
Ran into a NiFi Registry issue while upgrading our instances to NiFi 1.8.0. 
ExecuteSQL had a number of new properties added to it in 1.8.0, so after 
upgrading our, our versioned processor groups show as having local changes, 
which is good. We went ahead and checked the changes into the registry.

Enter the second instance... we upgraded a second instance. It also see's local 
changes, but now the processor group is in conflict, because we have local 
(identical) changes, and we have a newer version checked in. If you try to 
revert the local changes so you can sync things up... you can't, because these 
are properties on the Processor, and the default values automatically come 
back. So our second processor group is in conflict and we haven't found a way 
to bring it back in sync without deleting it and re loading it from the 
registry. Help would be appreciated.

Thanks,
  Peter


Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Bryan Bende
David, yes putting the DB URL into the variable registry should work.

Ed, I'm ok with opening a JIRA for that, I feel like it may have come
up before, but can't remember if there is already a JIRA.
On Thu, Nov 29, 2018 at 3:13 PM David Gallagher
 wrote:
>
> @Bryan Bende, @Pierre, you are correct! I'm seeing a version change because I 
> changed the database URL, not due to the passwords. Hopefully I can move 
> those settings to the database registry instead.
> 
> From: Bryan Bende 
> Sent: Thursday, November 29, 2018 1:50 PM
> To: users@nifi.apache.org
> Subject: Re: Using [Nifi Flow]-level controllers with NiFi registry flows
>
> Dave,
>
> That is true that sensitive properties don't propagate to the registry
> and will have to be re-entered upon import, however when the values
> are entered it should not trigger a change that needs to be committed,
> and whatever is entered locally should be retained across future
> upgrades of the versioned process group.
>
> Ed,
>
> I think the challenge is how to correctly capture this information. In
> NiFi and NiFi Registry, the  property values are a Map
> so there would be an entry like "dbcp-connection-pool" ->
> "1234-1233-1234-1234", so somewhere in the versioned process group we
> would need another map that said "1234-1233-1234-1234" -> "Foo DBCP"
> so that we could get the name later during import. I guess every time
> a new version is saved you would go through and identify all services
> referenced but not included, and then create this map. Not saying it
> can't be done, just needs some thought.
> On Thu, Nov 29, 2018 at 1:48 PM Pierre Villard
>  wrote:
> >
> > Hey Dave,
> >
> > Just want to jump in regarding:
> > "Even if the connection properties were identical between environments, 
> > sensitive information does not propagate and the end user would have to 
> > re-enter passwords. This will register as a change in version control and 
> > cause problems later when I have a new version of the flow to import."
> >
> > I believe that is not correct. It is true that during the initial import of 
> > the flow, the user would have to re-enter passwords. However this will not 
> > be considered as a change in version control and the value won't be 
> > changed/removed when upgrading to a new version.
> >
> > Pierre
> >
> >
> >
> > Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :
> >>
> >> Bryan,
> >> What if we refer controller services not only by UUID, but also by name 
> >> (at least in registry).
> >> then, during deployment, if matching CS by UUID doesn't exist, we could 
> >> check all available CS by a name, and if there is only one matching by 
> >> type and name CS, we could use it, otherwise - current functionality 
> >> should be fine.
> >> Thoughts?
> >>
> >> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende  wrote:
> >>>
> >>> I meant to add that for the case where you are using NiFi's UI to
> >>> import the flows, we could consider builder a nice user experience
> >>> that prompted the user to select the appropriate services that are
> >>> missing, and fill in sensitive properties, etc.
> >>>
> >>> For the scenario where you are using the CLI, or some scripts, to
> >>> import the flows, then we can probably build some kind of convention
> >>> into those commands, or add additional commands to help with the
> >>> situation.
> >>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende  wrote:
> >>> >
> >>> > Hi Dave,
> >>> >
> >>> > Currently there isn't a built in way to make this automatic...
> >>> >
> >>> > The issue is that the versioned flow in registry has the
> >>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that only
> >>> > existed in the original environment the flow was created in.
> >>> >
> >>> > When you import the flow to another environment, that UUID is
> >>> > obviously not going to exist, but it is also unclear how to select the
> >>> > appropriate one. What if there were multiple DBCP connection pools
> >>> > visible to where the versioned flow is being imported? There would be
> >>> > no way to know which one to use.
> >>> >
> >>> > I suppose maybe there could be a convention that if there was only one
> >>> > matching service of the given type, and it came from the root process
> >>> > group, then use that one, but its still hard to know if this is really
> >>> > the right service. What if it was for a different database and someone
> >>> > didn't realize?
> >>> >
> >>> > -Bryan
> >>> >
> >>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
> >>> >  wrote:
> >>> > >
> >>> > > Hi - I'm using nifi-1.7.1 and nifi-registry-0.2.0. I'd like to have 
> >>> > > 'global' DBCPConnectionPool instances at the Nifi Flow level,  then 
> >>> > > import flows from the registry and have them use the global pools, 
> >>> > > e.g. in a PutDatabaseRecord processor. When I try that, though, the 
> >>> > > processor is invalid and the Database Connection Pooling Service 
> >>> > > shows 'Incompatible Controller 

Re: Problems with NiFi Registry Conflicts after Processor Upgrades

2018-11-29 Thread Bryan Bende
Peter,

I feel like this came up before, and unfortunately I'm not sure there
is currently a solution.

I think ultimately there needs to be some kind of force upgrade so you
can ignore the local changes and take whatever is available.

The only thing I can think of, but haven't tried, is if you had
upgraded the PG in the second instance before upgrading NiFi itself,
it would bring in the new properties that are not valid in that
version and the processor would show as invalid, then upgrade NiFi and
it would be valid again.

-Bryan
On Thu, Nov 29, 2018 at 3:13 PM Peter Wicks (pwicks)  wrote:
>
> Ran into a NiFi Registry issue while upgrading our instances to NiFi 1.8.0. 
> ExecuteSQL had a number of new properties added to it in 1.8.0, so after 
> upgrading our, our versioned processor groups show as having local changes, 
> which is good. We went ahead and checked the changes into the registry.
>
>
>
> Enter the second instance... we upgraded a second instance. It also see's 
> local changes, but now the processor group is in conflict, because we have 
> local (identical) changes, and we have a newer version checked in. If you try 
> to revert the local changes so you can sync things up... you can't, because 
> these are properties on the Processor, and the default values automatically 
> come back. So our second processor group is in conflict and we haven't found 
> a way to bring it back in sync without deleting it and re loading it from the 
> registry. Help would be appreciated.
>
>
>
> Thanks,
>
>   Peter


RE: [EXT] Re: Problems with NiFi Registry Conflicts after Processor Upgrades

2018-11-29 Thread Peter Wicks (pwicks)
Bryan,

I agree, that is probably a solution. Unfortunately, there is no mass upgrade 
option, so we'd have to manually touch ever versioned process group (or 
scripted it).

Thanks,
  Peter

-Original Message-
From: Bryan Bende [mailto:bbe...@gmail.com] 
Sent: Thursday, November 29, 2018 1:29 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Problems with NiFi Registry Conflicts after Processor 
Upgrades

Peter,

I feel like this came up before, and unfortunately I'm not sure there is 
currently a solution.

I think ultimately there needs to be some kind of force upgrade so you can 
ignore the local changes and take whatever is available.

The only thing I can think of, but haven't tried, is if you had upgraded the PG 
in the second instance before upgrading NiFi itself, it would bring in the new 
properties that are not valid in that version and the processor would show as 
invalid, then upgrade NiFi and it would be valid again.

-Bryan
On Thu, Nov 29, 2018 at 3:13 PM Peter Wicks (pwicks)  wrote:
>
> Ran into a NiFi Registry issue while upgrading our instances to NiFi 1.8.0. 
> ExecuteSQL had a number of new properties added to it in 1.8.0, so after 
> upgrading our, our versioned processor groups show as having local changes, 
> which is good. We went ahead and checked the changes into the registry.
>
>
>
> Enter the second instance... we upgraded a second instance. It also see's 
> local changes, but now the processor group is in conflict, because we have 
> local (identical) changes, and we have a newer version checked in. If you try 
> to revert the local changes so you can sync things up... you can't, because 
> these are properties on the Processor, and the default values automatically 
> come back. So our second processor group is in conflict and we haven't found 
> a way to bring it back in sync without deleting it and re loading it from the 
> registry. Help would be appreciated.
>
>
>
> Thanks,
>
>   Peter


Re: hbaseclient service is failed to enable

2018-11-29 Thread Ravi Papisetti (rpapiset)
Hi Joe,



I am able to resolve this issue by changing hbase-client version to 
1.1.8-mapr-1703 in the nifi-hbase_1_1_2-client-service/pom.xml.





org.apache.hbase

hbase-client

1.1.8-mapr-1703





org.slf4j

slf4j-log4j12





com.google.code.findbugs

jsr305









Additionally added dependency:

   

   com.mapr.fs

   mapr-hbase

   6.0.1-mapr

   



Now below line of code in 
nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientService.java
 is returning null pointer exception:

masterAddress = admin.getClusterStatus().getMaster().getHostAndPort();



I see same code in nifi-1.6.0, but this worked fine with writing/reading data 
from maprdb. Appreciate any insights why these issues are with nifi-1.7.1 code 
base.



Thanks,

Ravi Papisetti



On 25/11/18, 7:45 PM, "Ravi Papisetti (rpapiset)"  wrote:



Yes, if I change the version on hbase_client_service_1_1_2 to 
nifi-hbase_1_1_2-client-service-nar-1.6.0-mapr.nar (nifi 1.6 code compiled with 
mapr profile) it just connects fine.



If I change client service to 
nifi-hbase_1_1_2-client-service-nar-1.7.1-mapr.nar (nifi 1.7.1 compiled with 
mapr profile), this error is coming.



See if below strack trace is sent with proper format:

Sun Nov 25 22:36:44 UTC 2018, 
RpcRetryingCaller{globalStartTime=1543185404532, pause=100, retries=1}, 
org.apache.hadoop.hbase.MasterNotRunningException: 
org.apache.hadoop.hbase.MasterNotRunningException: Can't get connection to 
ZooKeeper: KeeperErrorCode = AuthFailed for /hbase

: {}

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=1, exceptions:

Sun Nov 25 22:36:44 UTC 2018, 
RpcRetryingCaller{globalStartTime=1543185404532, pause=100, retries=1}, 
org.apache.hadoop.hbase.MasterNotRunningException: 
org.apache.hadoop.hbase.MasterNotRunningException: Can't get connection to 
ZooKeeper: KeeperErrorCode = AuthFailed for /hbase



at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)

at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917)

at 
org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:413)

at 
org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:397)

at 
org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:264)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:142)

at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:130)

at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:75)

at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:52)

at 
org.apache.nifi.controller.service.StandardControllerServiceNode$2.run(StandardControllerServiceNode.java:433)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hadoop.hbase.MasterNotRunningException: 
org.apache.hadoop.hbase.MasterNotRunningException: Can't get connection to 
ZooKeeper: KeeperErrorCode = AuthFailed for /hbase

at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1533)

at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553)

at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionMa

RedisConnectionPoolService - Incorrectly Attempts to Connect to localhost

2018-11-29 Thread Williams, Jim
Hello,

 

I'm trying to set up the
RedisConnectionPoolService/RedisDistributedMapCacheClientService.

 

Some basic observations:

*   This is a standalone Nifi 1.8.0 server
*   SELinux is disabled on the server
*   There are no iptables rules configured for blocking on the server
*   I am able to resolve the hostname of the Redis server to an IP
address on the Nifi server
*   I can connect to the Redis server to the Nifi server using telnet

 

The stack trace I see when the services are started is:

2018-11-29 21:16:03,527 WARN [Timer-Driven Process Thread-8]
o.a.n.controller.tasks.ConnectableTask Administratively Yielding
PutDistributedMapCache[id=0167105c-4a54-1adf-cb8d-1b45de7f0c99] due to
uncaught Exception: org.springframework.data.redis.RedisConnection

FailureException: Cannot get Jedis connection; nested exception is
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a
resource from the pool

org.springframework.data.redis.RedisConnectionFailureException: Cannot get
Jedis connection; nested exception is
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a
resource from the pool

at
org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetch
JedisConnector(JedisConnectionFactory.java:281)

at
org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getCo
nnection(JedisConnectionFactory.java:464)

at
org.apache.nifi.redis.service.RedisConnectionPoolService.getConnection(Redis
ConnectionPoolService.java:89)

at sun.reflect.GeneratedMethodAccessor580.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandle
r.invoke(StandardControllerServiceInvocationHandler.java:84)

at com.sun.proxy.$Proxy98.getConnection(Unknown Source)

at
org.apache.nifi.redis.service.RedisDistributedMapCacheClientService.withConn
ection(RedisDistributedMapCacheClientService.java:343)

at
org.apache.nifi.redis.service.RedisDistributedMapCacheClientService.put(Redi
sDistributedMapCacheClientService.java:189)

at sun.reflect.GeneratedMethodAccessor579.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandle
r.invoke(StandardControllerServiceInvocationHandler.java:84)

at com.sun.proxy.$Proxy96.put(Unknown Source)

at
org.apache.nifi.processors.standard.PutDistributedMapCache.onTrigger(PutDist
ributedMapCache.java:202)

at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java
:27)

at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessor
Node.java:1165)

at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java
:203)

at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(Timer
DrivenSchedulingAgent.java:117)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
301(ScheduledThreadPoolExecutor.java:180)

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:294)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)

at java.lang.Thread.run(Thread.java:748)

Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Could
not get a resource from the pool

at redis.clients.util.Pool.getResource(Pool.java:53)

at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)

at redis.clients.jedis.JedisPool.getResource(JedisPool.java:16)

at
org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetch
JedisConnector(JedisConnectionFactory.java:271)

... 26 common frames omitted

Caused by: redis.clients.jedis.exceptions.JedisConnectionException:
java.net.ConnectException: Connection refused (Connection refused)

at redis.clients.jedis.Connection.connect(Connection.java:207)

at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93)

at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767)

at
redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106)

at
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.jav
a:868)

at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(

GetMongo - Pass-on Initial FlowFile?

2018-11-29 Thread Ryan Hendrickson
Hi all,
   I'm curious if the GetMongo processor can allow you can pass in a
FlowFile Document and enrich the original FlowFile document with the
results from the Query Result - or more generally store the result as a
NiFi Attribute, instead of replacing the content in the FlowFile.

   I really want to "enrich" my results with data from Mongo, not replace
my FlowFile with the database record that matches.

   I had thought I had figured out a way to do this in the past, but can't
seem to replicate it, so I figured I'd ask the community.

Thanks,
Ryan


Re: How do you use a custom region?

2018-11-29 Thread Andrew McDonald

So I've installed nifi 1.8.0 and endpoint override didn't help

The GetSQS is throwing and exception

   com.amazonaws.services.sqs.model.AmazonSQSException: Credential
   should be scoped to a valid region, not 'us-east-1'.

I'm guessing that the override endpoint is not overriding the region b/c 
'us-east-1' is not my region.


Regards, Andrew

On 11/29/18 02:09, Sivaprasanna wrote:
Yeah. This was added in 1.8.0 for SQS. However, the reason why a 
custom enum was added on the NiFi side[1] was to have a proper 
readable region instead of just the region code i.e., Asia Pacific 
(Singapore) instead of ap-southeast-1. However, I raised a request 
later to the AWS Java SDK team to have a readable name. It was added 
recently, if I remember correctly. So ideally this enum has to be 
removed and the one on the official AWS SDK has to be leveraged 
completely. I have created a Jira[2] and started working on it. I'll 
raise a PR soon.


[1] https://issues.apache.org/jira/browse/NIFI-5129
[2] https://issues.apache.org/jira/browse/NIFI-5850

Thanks,

On Thu, Nov 29, 2018 at 2:20 AM Andrew McDonald > wrote:


zenfenan  added this later to 1.8.0, yay!

On 11/28/18 14:23, Andrew McDonald wrote:



This workaround doesn't work for sqs because it doesn't have
endpoint override URL property


On 11/28/18 12:16, Michael Moser wrote:

Greetings!  This JIRA ticket [1] describes the recommended work
around for AWS regions that aren't in the list.

-- Mike

[1] - https://issues.apache.org/jira/browse/NIFI-4523



On Wed, Nov 28, 2018 at 11:44 AM Jon Logan mailto:jmlo...@buffalo.edu>> wrote:

Andrew,

I know there's a few regions not in the list. I'm not sure
which region you're targeting, but at least for the case of
one of the new regions, I submitted a PR for this. I haven't
dug into it deeply, but it seems like a better way to do
this might be to remove the enum entirely and get the region
list via the AWS API, or allow a free-form entry.

https://github.com/apache/nifi/pull/3187


Jon

On Wed, Nov 28, 2018 at 10:35 AM Andrew McDonald
mailto:amcdon...@ccri.com>> wrote:

I'm trying to upgrade from 1.4.0 to 1.7.1 but the s3
processors can not
be initialized.

Nifi 1.7.1 uses 1.11.319 and is throwing an
IllegalArgumentException: no
region provided

The region I'm using isn't in the enum, so is it
possible to use a
custom region?

Regards,Andrew




Re: GetMongo - Pass-on Initial FlowFile?

2018-11-29 Thread Otto Fowler
Sounds like you want to look at enrichment with the LookupRecord processors
and Mongo.
https://community.hortonworks.com/articles/146198/data-flow-enrichment-with-nifi-part-3-lookuprecord.html


On November 29, 2018 at 17:12:38, Ryan Hendrickson (
ryan.andrew.hendrick...@gmail.com) wrote:

Hi all,
   I'm curious if the GetMongo processor can allow you can pass in a
FlowFile Document and enrich the original FlowFile document with the
results from the Query Result - or more generally store the result as a
NiFi Attribute, instead of replacing the content in the FlowFile.

   I really want to "enrich" my results with data from Mongo, not replace
my FlowFile with the database record that matches.

   I had thought I had figured out a way to do this in the past, but can't
seem to replicate it, so I figured I'd ask the community.

Thanks,
Ryan


Re: Jolt transform question

2018-11-29 Thread Matt Burgess
Yves,

The following Jolt Chain spec should work:

[
  {
"operation": "shift",
"spec": {
  "contact": {
"*": {
  "value": "contact.@(1,label).value",
  "comment": "contact.@(1,label).comment"
}
  }
}
  }
]

Regards,
Matt

On Fri, Nov 23, 2018 at 8:21 AM Yves HAMEL  wrote:
>
>
> Hi all,
> I have a input json :
> {"contact": {
> "0": {"label":"pro",value:"123456","comment":"abc"},
> "1": {"label":"perso",value:"654321","comment":"def"},
> }}
> How can I write jolt transformation to get the following json
> {"contact":{
> "pro":{value:"123456","comment":"abc"},
> "perso":{value:"654321","comment":"def"}
> }}
> Thanks
> Yves HAMEL
>
> (Embedded image moved to file: pic12164.jpg)
>
> Yves HAMEL
> Direction Digital & Systèmes d’Information Groupe
> LM_DATA
>
> MACIF - 2 et 4, rue Pied de Fond - 79037 Niort cedex 9
> Tél. : +33 (0)5 49 09 36 06
> Email : mon.em...@macif.fr / Pré Doyen 2 – bureau 999
>
> www.macif.fr - Appli présente sur Google Play Store & Apple Store
> (Embedded image moved to file: pic25542.jpg)


Re: How do you use a custom region?

2018-11-29 Thread Sivaprasanna
Andrew, Can you please share the complete error message and also show us
how the properties look for the GetSQS processor?

Thanks.

On Fri, Nov 30, 2018 at 3:45 AM Andrew McDonald  wrote:

> So I've installed nifi 1.8.0 and endpoint override didn't help
>
> The GetSQS is throwing and exception
>
> com.amazonaws.services.sqs.model.AmazonSQSException: Credential should be
> scoped to a valid region, not 'us-east-1'.
>
> I'm guessing that the override endpoint is not overriding the region b/c
> 'us-east-1' is not my region.
>
> Regards, Andrew
> On 11/29/18 02:09, Sivaprasanna wrote:
>
> Yeah. This was added in 1.8.0 for SQS. However, the reason why a custom
> enum was added on the NiFi side[1] was to have a proper readable region
> instead of just the region code i.e., Asia Pacific (Singapore) instead of
> ap-southeast-1. However, I raised a request later to the AWS Java SDK
> team to have a readable name. It was added recently, if I remember
> correctly. So ideally this enum has to be removed and the one on the
> official AWS SDK has to be leveraged completely. I have created a Jira[2]
> and started working on it. I'll raise a PR soon.
>
> [1] https://issues.apache.org/jira/browse/NIFI-5129
> [2] https://issues.apache.org/jira/browse/NIFI-5850
>
> Thanks,
>
> On Thu, Nov 29, 2018 at 2:20 AM Andrew McDonald 
> wrote:
>
>> zenfenan  added this later to 1.8.0, yay!
>> On 11/28/18 14:23, Andrew McDonald wrote:
>>
>>
>> This workaround doesn't work for sqs because it doesn't have endpoint
>> override URL property
>>
>>
>> On 11/28/18 12:16, Michael Moser wrote:
>>
>> Greetings!  This JIRA ticket [1] describes the recommended work around
>> for AWS regions that aren't in the list.
>>
>> -- Mike
>>
>> [1] - https://issues.apache.org/jira/browse/NIFI-4523
>>
>>
>>
>> On Wed, Nov 28, 2018 at 11:44 AM Jon Logan  wrote:
>>
>>> Andrew,
>>>
>>> I know there's a few regions not in the list. I'm not sure which region
>>> you're targeting, but at least for the case of one of the new regions, I
>>> submitted a PR for this. I haven't dug into it deeply, but it seems like a
>>> better way to do this might be to remove the enum entirely and get the
>>> region list via the AWS API, or allow a free-form entry.
>>>
>>> https://github.com/apache/nifi/pull/3187
>>>
>>>
>>> Jon
>>>
>>> On Wed, Nov 28, 2018 at 10:35 AM Andrew McDonald 
>>> wrote:
>>>
 I'm trying to upgrade from 1.4.0 to 1.7.1 but the s3 processors can not
 be initialized.

 Nifi 1.7.1 uses 1.11.319 and is throwing an IllegalArgumentException:
 no
 region provided

 The region I'm using isn't in the enum, so is it possible to use a
 custom region?

 Regards,Andrew





Re: Using [Nifi Flow]-level controllers with NiFi registry flows

2018-11-29 Thread Ed B
Bryan,
I couldn't find existing JIRA for such feature. So I opened new one. Would
appreciate if you could review.
https://issues.apache.org/jira/browse/NIFI-5856

On Thu, Nov 29, 2018 at 2:20 PM Bryan Bende  wrote:

> David, yes putting the DB URL into the variable registry should work.
>
> Ed, I'm ok with opening a JIRA for that, I feel like it may have come
> up before, but can't remember if there is already a JIRA.
> On Thu, Nov 29, 2018 at 3:13 PM David Gallagher
>  wrote:
> >
> > @Bryan Bende, @Pierre, you are correct! I'm seeing a version change
> because I changed the database URL, not due to the passwords. Hopefully I
> can move those settings to the database registry instead.
> > 
> > From: Bryan Bende 
> > Sent: Thursday, November 29, 2018 1:50 PM
> > To: users@nifi.apache.org
> > Subject: Re: Using [Nifi Flow]-level controllers with NiFi registry flows
> >
> > Dave,
> >
> > That is true that sensitive properties don't propagate to the registry
> > and will have to be re-entered upon import, however when the values
> > are entered it should not trigger a change that needs to be committed,
> > and whatever is entered locally should be retained across future
> > upgrades of the versioned process group.
> >
> > Ed,
> >
> > I think the challenge is how to correctly capture this information. In
> > NiFi and NiFi Registry, the  property values are a Map
> > so there would be an entry like "dbcp-connection-pool" ->
> > "1234-1233-1234-1234", so somewhere in the versioned process group we
> > would need another map that said "1234-1233-1234-1234" -> "Foo DBCP"
> > so that we could get the name later during import. I guess every time
> > a new version is saved you would go through and identify all services
> > referenced but not included, and then create this map. Not saying it
> > can't be done, just needs some thought.
> > On Thu, Nov 29, 2018 at 1:48 PM Pierre Villard
> >  wrote:
> > >
> > > Hey Dave,
> > >
> > > Just want to jump in regarding:
> > > "Even if the connection properties were identical between
> environments, sensitive information does not propagate and the end user
> would have to re-enter passwords. This will register as a change in version
> control and cause problems later when I have a new version of the flow to
> import."
> > >
> > > I believe that is not correct. It is true that during the initial
> import of the flow, the user would have to re-enter passwords. However this
> will not be considered as a change in version control and the value won't
> be changed/removed when upgrading to a new version.
> > >
> > > Pierre
> > >
> > >
> > >
> > > Le jeu. 29 nov. 2018 à 19:38, Ed B  a écrit :
> > >>
> > >> Bryan,
> > >> What if we refer controller services not only by UUID, but also by
> name (at least in registry).
> > >> then, during deployment, if matching CS by UUID doesn't exist, we
> could check all available CS by a name, and if there is only one matching
> by type and name CS, we could use it, otherwise - current functionality
> should be fine.
> > >> Thoughts?
> > >>
> > >> On Thu, Nov 29, 2018 at 11:55 AM Bryan Bende 
> wrote:
> > >>>
> > >>> I meant to add that for the case where you are using NiFi's UI to
> > >>> import the flows, we could consider builder a nice user experience
> > >>> that prompted the user to select the appropriate services that are
> > >>> missing, and fill in sensitive properties, etc.
> > >>>
> > >>> For the scenario where you are using the CLI, or some scripts, to
> > >>> import the flows, then we can probably build some kind of convention
> > >>> into those commands, or add additional commands to help with the
> > >>> situation.
> > >>> On Thu, Nov 29, 2018 at 12:51 PM Bryan Bende 
> wrote:
> > >>> >
> > >>> > Hi Dave,
> > >>> >
> > >>> > Currently there isn't a built in way to make this automatic...
> > >>> >
> > >>> > The issue is that the versioned flow in registry has the
> > >>> > PutDatabaseRecord with the DBCP Pool property set to a UUID that
> only
> > >>> > existed in the original environment the flow was created in.
> > >>> >
> > >>> > When you import the flow to another environment, that UUID is
> > >>> > obviously not going to exist, but it is also unclear how to select
> the
> > >>> > appropriate one. What if there were multiple DBCP connection pools
> > >>> > visible to where the versioned flow is being imported? There would
> be
> > >>> > no way to know which one to use.
> > >>> >
> > >>> > I suppose maybe there could be a convention that if there was only
> one
> > >>> > matching service of the given type, and it came from the root
> process
> > >>> > group, then use that one, but its still hard to know if this is
> really
> > >>> > the right service. What if it was for a different database and
> someone
> > >>> > didn't realize?
> > >>> >
> > >>> > -Bryan
> > >>> >
> > >>> > On Thu, Nov 29, 2018 at 12:34 PM David Gallagher
> > >>> >  wrote:
> > >>> > >
> > >>> > > Hi - I'm using nifi-1.7.1 and nifi-re

Data provenance screen is always blank with HTTPS.

2018-11-29 Thread Dnyaneshwar Pawar
Hi

  We are not able to see the data provenance events on NiFi UI, especially when 
we moved to secure connections. Access policies have been created for the 
users, but provenance does not show any event on screen.
Is there anything we need to do more?

Regards,
Dnyaneshwar Pawar



Re: Data provenance screen is always blank with HTTPS.

2018-11-29 Thread Andy LoPresto
Is the result that you get an empty table window, or a blank (literally white) 
screen?

Can you perform the same action with the Developer Tools open and the network 
tab activated to see the response from the server, or use curl to perform the 
same request on the command line and examine the JSON response? That should 
indicate if the actual lineage is being returned or it’s an error message for 
unauthorized access. 

This could be a permissions issue, or it could be a coaching issue. Try 
clearing the browser cache and reloading the UI. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 29, 2018, at 22:48, Dnyaneshwar Pawar 
>  wrote:
> 
> Hi
>  
>   We are not able to see the data provenance events on NiFi UI, especially 
> when we moved to secure connections. Access policies have been created for 
> the users, but provenance does not show any event on screen.
> Is there anything we need to do more?
>  
> Regards,
> Dnyaneshwar Pawar
>