Re: SIGSEGV in Jetty

2018-04-25 Thread Phil H
Hi Joe,

Early days, but that change appears to have fixed the issue. I note it
occurred sooner if I used the web UI a lot (for example if I was
configuring processors or checking queues).

Thanks again

Phil.

On Thu, 26 Apr 2018 at 12:47, Joe Witt  wrote:

> Phil,
>
> I'd definitely say not any kind of known issue.  One thing you might
> want to check is whether you're using G1 GC. We have in most cases
> moved away from that in Java8 because of some long standing bugs that
> our Lucene indexing for our provenance data which could trigger it.
> We've switch to using the default GC in many cases.
>
> In your nifi conf/bootstrap.conf look for
>
> java.arg.13=-XX:+UseG1GC
>
> comment that line out by doing
>
> #java.arg.13=-XX:+UseG1GC
>
> And see if that helps.
>
> Thanks
>
> On Wed, Apr 25, 2018 at 10:43 PM, Phil H  wrote:
> > Hi there,
> >
> > I am getting regular (maybe every ten minutes?) crashes in NiFi 1.3.0.
> This
> > just started happening unrelated to any change to the software
> environment
> > (i.e.: we haven't installed new code/processors).
> >
> > It roughly coincides with an increase in flow file throughout and my
> > subsequent use of multiple concurrent tasks on some processors. However I
> > set those back to 1, and the problem has persisted.
> >
> > I can't get the error file off the customer network, but the problematic
> > frame is:
> >
> > org.eclipse.jetty.io.IdleTimeout.deactivate()V+8
> >
> > Is this a known issue, and if so what are my options?
> >
> > TIA,
> > Phil
>


Re: Apache Nifi - How to pass maven contrib-check after adding text file to resources

2018-04-25 Thread Joe Witt
Mans

See here for an example [1]

The Apache RAT Plugin is what actually would detect the files and
check their licenses/etc..

In the provided example we're excluding a couple test files because
they cannot have headers but they are legit. You'd want to do the same
most likely.

Thanks

[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-avro-bundle/nifi-avro-processors/pom.xml

On Wed, Apr 25, 2018 at 10:51 PM, M Singh  wrote:
> Hi:
>
> I am working on a project and would like to add a text resource file but am 
> not sure how to "register" it so that it  passes maven contrib-check ?
>
> Please let me know where I can find documentation on it.
>
> Thanks
>
> Mans


Apache Nifi - How to pass maven contrib-check after adding text file to resources

2018-04-25 Thread M Singh
Hi:

I am working on a project and would like to add a text resource file but am not 
sure how to "register" it so that it  passes maven contrib-check ?

Please let me know where I can find documentation on it.

Thanks

Mans


Re: SIGSEGV in Jetty

2018-04-25 Thread Joe Witt
Phil,

I'd definitely say not any kind of known issue.  One thing you might
want to check is whether you're using G1 GC. We have in most cases
moved away from that in Java8 because of some long standing bugs that
our Lucene indexing for our provenance data which could trigger it.
We've switch to using the default GC in many cases.

In your nifi conf/bootstrap.conf look for

java.arg.13=-XX:+UseG1GC

comment that line out by doing

#java.arg.13=-XX:+UseG1GC

And see if that helps.

Thanks

On Wed, Apr 25, 2018 at 10:43 PM, Phil H  wrote:
> Hi there,
>
> I am getting regular (maybe every ten minutes?) crashes in NiFi 1.3.0. This
> just started happening unrelated to any change to the software environment
> (i.e.: we haven't installed new code/processors).
>
> It roughly coincides with an increase in flow file throughout and my
> subsequent use of multiple concurrent tasks on some processors. However I
> set those back to 1, and the problem has persisted.
>
> I can't get the error file off the customer network, but the problematic
> frame is:
>
> org.eclipse.jetty.io.IdleTimeout.deactivate()V+8
>
> Is this a known issue, and if so what are my options?
>
> TIA,
> Phil


SIGSEGV in Jetty

2018-04-25 Thread Phil H
Hi there,

I am getting regular (maybe every ten minutes?) crashes in NiFi 1.3.0. This
just started happening unrelated to any change to the software environment
(i.e.: we haven't installed new code/processors).

It roughly coincides with an increase in flow file throughout and my
subsequent use of multiple concurrent tasks on some processors. However I
set those back to 1, and the problem has persisted.

I can't get the error file off the customer network, but the problematic
frame is:

org.eclipse.jetty.io.IdleTimeout.deactivate()V+8

Is this a known issue, and if so what are my options?

TIA,
Phil


Re: [DISCUSS] Support for accessing sensitive values safely

2018-04-25 Thread Bryan Bende
The policy model would need more thought, but the point would be that
a user can select variable references they have been given permission
to.

In order to configure the processor that is referencing the variable,
they already need write permissions to that processor, or some parent
in the hierarchy if no specific policy exists.



On Wed, Apr 25, 2018 at 2:42 PM, Otto Fowler  wrote:
>
> "It would provide a list of variables that are readable to the current user
> and one can be selected, just like allowable values or controller services.”
>
> A person may have rights to configure nifi without knowing the “value” of
> the secure db password ( for example ), but that doesn’t mean they
> don’t have there rights to reference it.
>
>
>
> On April 25, 2018 at 14:15:16, Bryan Bende (bbe...@gmail.com) wrote:
>
> There is definitely room for improvement here.
>
> Keep in mind that often the sensitive information is specific to a
> given environment. For example you build a flow in dev with your
> db.password. You don't actually want your dev db password to be
> propagated to the next environment, but you do want to be able to set
> a variable placeholder like ${db.password} and leave that placeholder
> so you can just set that variable in the next environment. So to me
> the goal here is how to handle secure variables.
>
> Andy highlighted many of the issues, my proposal would be the following...
>
> First, we can introduce a concept of a sensitive variable. This would
> be something in the UI where a user can indicate a variable is
> sensitive, maybe a checkbox, and then the framework can store these
> values encrypted (currently all variable values are stored in plain
> text because they aren't meant to be sensitive).
>
> Second, we can introduce policies on sensitive variables so that we
> can restrict who can read them elsewhere, just like policies on
> controller services that determine which controller services show up
> in the drop down of a processor.
>
> Third, we introduce a new kind of PropertyDescriptor that allows
> selecting a variable from the variable registry rather than free-form
> expression language. It would provide a list of variables that are
> readable to the current user and one can be selected, just like
> allowable values or controller services. Ideally we can have a way to
> still allow free form values for people who don't want to use
> variables.
>
> Fourth, anytime variables are evaluated from expression language we
> would prevent evaluating any of these new sensitive variables since we
> have no way of knowing if a user should have access to it from
> free-form EL, so they can only be used from the special
> PropertyDescriptors above.
>
> If we put all this in place then when we save flows to the registry,
> we can leave the variable place-holders in the sensitive properties,
> and then when you import to the next environment you only need to edit
> the variables section and not go through individual processors setting
> sensitive properties.
>
> On Wed, Apr 25, 2018 at 1:06 PM, Andy LoPresto  wrote:
>> Hi Sivaprasanna,
>>
>> This was a topic that was briefly considered earlier in the lifecycle of
>> the
>> project, but was sidelined due to other developments. With the NiFi
>> Registry
>> project, there has been renewed interest in securing sensitive values in
>> the
>> flow and allowing for easier import/export/persistence. There is a
>> placeholder Jira [1] which doesn’t capture significant information about
>> the
>> problem. I think a larger conversation needs to occur which covers the
>> following points (at a minimum, there is plenty of room for additional
>> concerns and use cases):
>>
>> * How the sensitive values are secured (encryption, storage [HSM [2],
>> Hashicorp Vault [3], Square KeyWhiz [4], JCEKS, locally-encrypted file],
>> location)
>> * User access control (granularity, integration with UAC policies in NiFi,
>> Ranger, users/groups, etc.)
>> * Exporting/persistence behavior (should a sensitive value entered in
>> “dev”
>> be exported to “prod” (and more significantly, vice-versa), which
>> instance(s) of the Variable Registry are allowed to be referenced from
>> each
>> NiFi / Registry node, etc.)
>> * Variable references (how does the tool differentiate between
>> “${db.password}” meaning “load the variable db.password” and a literal
>> password like “myPass${word!&”?
>>
>> The original Jira for encrypted configuration files / properties [5] also
>> referenced some of these concepts in the abstract, and there is a rough
>> security roadmap in the wiki [6]. The Variable Registry design document
>> [7]
>> specifically did not allow for sensitive values to be exposed via UI or
>> API.
>>
>> I think there is an appetite for a more complete solution to this problem
>> as
>> you outlined, but I think there needs to be an extensive collection of
>> actual use cases, user expectations, and then technical discussion on the
>> implementation to solve this successfully. 

Re: [DISCUSS] Support for accessing sensitive values safely

2018-04-25 Thread Otto Fowler
"It would provide a list of variables that are readable to the current user
and one can be selected, just like allowable values or controller services.”

A person may have rights to configure nifi without knowing the “value” of
the secure db password ( for example ), but that doesn’t mean they
don’t have there rights to reference it.


On April 25, 2018 at 14:15:16, Bryan Bende (bbe...@gmail.com) wrote:

There is definitely room for improvement here.

Keep in mind that often the sensitive information is specific to a
given environment. For example you build a flow in dev with your
db.password. You don't actually want your dev db password to be
propagated to the next environment, but you do want to be able to set
a variable placeholder like ${db.password} and leave that placeholder
so you can just set that variable in the next environment. So to me
the goal here is how to handle secure variables.

Andy highlighted many of the issues, my proposal would be the following...

First, we can introduce a concept of a sensitive variable. This would
be something in the UI where a user can indicate a variable is
sensitive, maybe a checkbox, and then the framework can store these
values encrypted (currently all variable values are stored in plain
text because they aren't meant to be sensitive).

Second, we can introduce policies on sensitive variables so that we
can restrict who can read them elsewhere, just like policies on
controller services that determine which controller services show up
in the drop down of a processor.

Third, we introduce a new kind of PropertyDescriptor that allows
selecting a variable from the variable registry rather than free-form
expression language. It would provide a list of variables that are
readable to the current user and one can be selected, just like
allowable values or controller services. Ideally we can have a way to
still allow free form values for people who don't want to use
variables.

Fourth, anytime variables are evaluated from expression language we
would prevent evaluating any of these new sensitive variables since we
have no way of knowing if a user should have access to it from
free-form EL, so they can only be used from the special
PropertyDescriptors above.

If we put all this in place then when we save flows to the registry,
we can leave the variable place-holders in the sensitive properties,
and then when you import to the next environment you only need to edit
the variables section and not go through individual processors setting
sensitive properties.

On Wed, Apr 25, 2018 at 1:06 PM, Andy LoPresto 
wrote:
> Hi Sivaprasanna,
>
> This was a topic that was briefly considered earlier in the lifecycle of
the
> project, but was sidelined due to other developments. With the NiFi
Registry
> project, there has been renewed interest in securing sensitive values in
the
> flow and allowing for easier import/export/persistence. There is a
> placeholder Jira [1] which doesn’t capture significant information about
the
> problem. I think a larger conversation needs to occur which covers the
> following points (at a minimum, there is plenty of room for additional
> concerns and use cases):
>
> * How the sensitive values are secured (encryption, storage [HSM [2],
> Hashicorp Vault [3], Square KeyWhiz [4], JCEKS, locally-encrypted file],
> location)
> * User access control (granularity, integration with UAC policies in
NiFi,
> Ranger, users/groups, etc.)
> * Exporting/persistence behavior (should a sensitive value entered in
“dev”
> be exported to “prod” (and more significantly, vice-versa), which
> instance(s) of the Variable Registry are allowed to be referenced from
each
> NiFi / Registry node, etc.)
> * Variable references (how does the tool differentiate between
> “${db.password}” meaning “load the variable db.password” and a literal
> password like “myPass${word!&”?
>
> The original Jira for encrypted configuration files / properties [5] also
> referenced some of these concepts in the abstract, and there is a rough
> security roadmap in the wiki [6]. The Variable Registry design document
[7]
> specifically did not allow for sensitive values to be exposed via UI or
API.
>
> I think there is an appetite for a more complete solution to this problem
as
> you outlined, but I think there needs to be an extensive collection of
> actual use cases, user expectations, and then technical discussion on the
> implementation to solve this successfully. It’s a minefield where
half-steps
> will lead to user confusion, unmet expectations, and potentially severe
> security vulnerabilities.
>
> I changed the subject line to include [DISCUSS] to hopefully generate
some
> more interest here for other community members to weigh in. Thanks for
> getting the conversation started.
>
> [1] https://issues.apache.org/jira/browse/NIFI-2653
> [2] https://en.wikipedia.org/wiki/Hardware_security_module
> [3] https://www.vaultproject.io/
> [4] https://square.github.io/keywhiz/
> [5] https://issues.apache.org/

Re: [DISCUSS] Support for accessing sensitive values safely

2018-04-25 Thread Bryan Bende
There is definitely room for improvement here.

Keep in mind that often the sensitive information is specific to a
given environment. For example you build a flow in dev with your
db.password. You don't actually want your dev db password to be
propagated to the next environment, but you do want to be able to set
a variable placeholder like ${db.password} and leave that placeholder
so you can just set that variable in the next environment. So to me
the goal here is how to handle secure variables.

Andy highlighted many of the issues, my proposal would be the following...

First, we can introduce a concept of a sensitive variable. This would
be something in the UI where a user can indicate a variable is
sensitive, maybe a checkbox, and then the framework can store these
values encrypted (currently all variable values are stored in plain
text because they aren't meant to be sensitive).

Second, we can introduce policies on sensitive variables so that we
can restrict who can read them elsewhere, just like policies on
controller services that determine which controller services show up
in the drop down of a processor.

Third, we introduce a new kind of PropertyDescriptor that allows
selecting a variable from the variable registry rather than free-form
expression language. It would provide a list of variables that are
readable to the current user and one can be selected, just like
allowable values or controller services. Ideally we can have a way to
still allow free form values for people who don't want to use
variables.

Fourth, anytime variables are evaluated from expression language we
would prevent evaluating any of these new sensitive variables since we
have no way of knowing if a user should have access to it from
free-form EL, so they can only be used from the special
PropertyDescriptors above.

If we put all this in place then when we save flows to the registry,
we can leave the variable place-holders in the sensitive properties,
and then when you import to the next environment you only need to edit
the variables section and not go through individual processors setting
sensitive properties.

On Wed, Apr 25, 2018 at 1:06 PM, Andy LoPresto  wrote:
> Hi Sivaprasanna,
>
> This was a topic that was briefly considered earlier in the lifecycle of the
> project, but was sidelined due to other developments. With the NiFi Registry
> project, there has been renewed interest in securing sensitive values in the
> flow and allowing for easier import/export/persistence. There is a
> placeholder Jira [1] which doesn’t capture significant information about the
> problem. I think a larger conversation needs to occur which covers the
> following points (at a minimum, there is plenty of room for additional
> concerns and use cases):
>
> * How the sensitive values are secured (encryption, storage [HSM [2],
> Hashicorp Vault [3], Square KeyWhiz [4], JCEKS, locally-encrypted file],
> location)
> * User access control (granularity, integration with UAC policies in NiFi,
> Ranger, users/groups, etc.)
> * Exporting/persistence behavior (should a sensitive value entered in “dev”
> be exported to “prod” (and more significantly, vice-versa), which
> instance(s) of the Variable Registry are allowed to be referenced from each
> NiFi / Registry node, etc.)
> * Variable references (how does the tool differentiate between
> “${db.password}” meaning “load the variable db.password” and a literal
> password like “myPass${word!&”?
>
> The original Jira for encrypted configuration files / properties [5] also
> referenced some of these concepts in the abstract, and there is a rough
> security roadmap in the wiki [6]. The Variable Registry design document [7]
> specifically did not allow for sensitive values to be exposed via UI or API.
>
> I think there is an appetite for a more complete solution to this problem as
> you outlined, but I think there needs to be an extensive collection of
> actual use cases, user expectations, and then technical discussion on the
> implementation to solve this successfully. It’s a minefield where half-steps
> will lead to user confusion, unmet expectations, and potentially severe
> security vulnerabilities.
>
> I changed the subject line to include [DISCUSS] to hopefully generate some
> more interest here for other community members to weigh in. Thanks for
> getting the conversation started.
>
> [1] https://issues.apache.org/jira/browse/NIFI-2653
> [2] https://en.wikipedia.org/wiki/Hardware_security_module
> [3] https://www.vaultproject.io/
> [4] https://square.github.io/keywhiz/
> [5] https://issues.apache.org/jira/browse/NIFI-1831
> [6]
> https://cwiki.apache.org/confluence/display/NIFI/Security+Feature+Roadmap
> [7] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 25, 2018, at 12:24 PM, Sivaprasanna 
> wrote:
>
> Hi
>
> Since flowfile attribut

Re: Is there a configuration to limit the size of nifi's flowfile repository

2018-04-25 Thread Brandon DeVries
All,

This is something I think we shouldn't dismiss so easily.  While the
FlowFile repo is lighter than the content repo, allowing it to grow too
large can cause major problems.

Specifically, an "overgrown" FlowFile repo may prevent a NiFi instance from
coming back up after a restart due to the way in which records are held in
memory.  If there is more memory available to give to the JVM, this can
sometimes be worked around... but if there isn't you may just be out of
luck.  For that matter, allowing the FlowFile repo to grow so large that it
consumes all the heap isn't going to be good for system health in general
(OOM is probably never where you want to be...).

To Pierre's point "you don't want to limit that repository in size since it
would prevent the workflows to create new flow files"... that's exactly why
I would want to limit the size of the repo.  You do then get into questions
of how exactly to do this.  For example, you may not want to simply block
all transactions that create a FlowFile, because it may remove even more
(e.g. MergeContent).  Additionally, you have to be concerned about
deadlocks (e.g. a "Wait" that hangs forever because its "Notify" is being
starved).  Or, perhaps that's all you can do... freeze everything at some
threshold prior to actual damage being done, and alert operators that
manual intervention is necessary (e.g. bring up the graph with
autoResume=false, and bleed off data in a controlled fashion).

In summary, I believe this is a problem.  Even if it doesn't come up often,
when it does it is significant.  While the solution likely isn't simple,
it's worth putting some thought towards.

Brandon

On Wed, Apr 25, 2018 at 9:43 AM Sivaprasanna 
wrote:

> No, he actually had mentioned “like content repository”. The answer is,
> there aren’t any properties that support this, AFAIK. Pierre’s response
> pretty much sums up why there aren’t any properties.
>
> Thanks,
> Sivaprasanna
>
> On Wed, 25 Apr 2018 at 7:10 PM, Mike Thomsen 
> wrote:
>
> > I have a feeling that what Ben meant was how to limit the content
> > repository size.
> >
> > On Wed, Apr 25, 2018 at 8:26 AM Pierre Villard <
> > pierre.villard...@gmail.com>
> > wrote:
> >
> > > Hi Ben,
> > >
> > > Since the flow file repository contains the information of the flow
> files
> > > currently being processed by NiFi, you don't want to limit that
> > repository
> > > in size since it would prevent the workflows to create new flow files.
> > >
> > > Besides this repository is very lightweight, I'm not sure it'd need to
> be
> > > limited in size.
> > > Do you have a specific use case in mind?
> > >
> > > Pierre
> > >
> > >
> > > 2018-04-25 9:15 GMT+02:00 尹文才 :
> > >
> > > > Hi guys, I checked NIFI's system administrator guide trying to find a
> > > > configuration item so that the size of the flowfile repository could
> be
> > > > limited similar to the other repositories(e.g. content repository),
> > but I
> > > > didn't find such configuration items, is there currently any
> > > configuration
> > > > to limit the flowfile repository's size? thanks.
> > > >
> > > > Regards,
> > > > Ben
> > > >
> > >
> >
>


Re: [DISCUSS] Support for accessing sensitive values safely

2018-04-25 Thread Andy LoPresto
Hi Sivaprasanna,

This was a topic that was briefly considered earlier in the lifecycle of the 
project, but was sidelined due to other developments. With the NiFi Registry 
project, there has been renewed interest in securing sensitive values in the 
flow and allowing for easier import/export/persistence. There is a placeholder 
Jira [1] which doesn’t capture significant information about the problem. I 
think a larger conversation needs to occur which covers the following points 
(at a minimum, there is plenty of room for additional concerns and use cases):

* How the sensitive values are secured (encryption, storage [HSM [2], Hashicorp 
Vault [3], Square KeyWhiz [4], JCEKS, locally-encrypted file], location)
* User access control (granularity, integration with UAC policies in NiFi, 
Ranger, users/groups, etc.)
* Exporting/persistence behavior (should a sensitive value entered in “dev” be 
exported to “prod” (and more significantly, vice-versa), which instance(s) of 
the Variable Registry are allowed to be referenced from each NiFi / Registry 
node, etc.)
* Variable references (how does the tool differentiate between “${db.password}” 
meaning “load the variable db.password” and a literal password like 
“myPass${word!&”?

The original Jira for encrypted configuration files / properties [5] also 
referenced some of these concepts in the abstract, and there is a rough 
security roadmap in the wiki [6]. The Variable Registry design document [7] 
specifically did not allow for sensitive values to be exposed via UI or API.

I think there is an appetite for a more complete solution to this problem as 
you outlined, but I think there needs to be an extensive collection of actual 
use cases, user expectations, and then technical discussion on the 
implementation to solve this successfully. It’s a minefield where half-steps 
will lead to user confusion, unmet expectations, and potentially severe 
security vulnerabilities.

I changed the subject line to include [DISCUSS] to hopefully generate some more 
interest here for other community members to weigh in. Thanks for getting the 
conversation started.

[1] https://issues.apache.org/jira/browse/NIFI-2653 

[2] https://en.wikipedia.org/wiki/Hardware_security_module 

[3] https://www.vaultproject.io/ 
[4] https://square.github.io/keywhiz/ 
[5] https://issues.apache.org/jira/browse/NIFI-1831 

[6] https://cwiki.apache.org/confluence/display/NIFI/Security+Feature+Roadmap 

[7] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 25, 2018, at 12:24 PM, Sivaprasanna  wrote:
> 
> Hi
> 
> Since flowfile attributes and VariableRegistry is not suitable (not safe,
> to be specific), developers have to rely on manually configuring the
> sensitive values on the components (Processors & ControllerServices). And
> during CI/CD (using flow registry), the sensitive information are dropped
> and once imported to the next environment (QA or Prod), the user is
> expected to configure the sensitive information again, although for the
> first time. How about we introduce sort of a 'vault' that holds sensitive
> values which could possibly avoid this unnecessary step completely ?
> 
> -
> Sivaprasanna



signature.asc
Description: Message signed with OpenPGP using GPGMail


Support for accessing sensitive values safely

2018-04-25 Thread Sivaprasanna
Hi

Since flowfile attributes and VariableRegistry is not suitable (not safe,
to be specific), developers have to rely on manually configuring the
sensitive values on the components (Processors & ControllerServices). And
during CI/CD (using flow registry), the sensitive information are dropped
and once imported to the next environment (QA or Prod), the user is
expected to configure the sensitive information again, although for the
first time. How about we introduce sort of a 'vault' that holds sensitive
values which could possibly avoid this unnecessary step completely ?

-
Sivaprasanna


Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
Yes, this was just one idea based on Charlie's solution.

I'm not saying that approach solves the original request in this
email, I was just saying its another nice idea that could be easily
implemented once we make the changes in the JIRA.

There can be as many "dynamic" DBCPService implementations as we want,
and anyone can implement their own, the key is making the API changes
to allow for it.


On Wed, Apr 25, 2018 at 12:05 PM, Sivaprasanna
 wrote:
> Okay.. but two questions:
>
>
>1. We are passing the attribute 'db.id' that means, we'll be using
>'UpdateAttribute' to do add that attribute to flowfile?
>2. If we are to use 'UpdateAttribute' to set the value for 'db.id', we
>need to know before hand, right?
>
> -
>
> Sivaprasanna
>
> On Wed, Apr 25, 2018 at 8:38 PM, Bryan Bende  wrote:
>
>> Charlie,
>>
>> That is a really nice solution, thanks for sharing.
>>
>> If we make the API changes in that JIRA I just sent, I could see
>> having a new implementation of DBCPService that does something very
>> similar.
>>
>> Basically there could be "DelegatingDBCPService" which still
>> implemented the same DBCPService interface but followed a convention
>> where it always looked in the attribute map for an attribute called
>> "db.id", and then it itself has dynamic properties to define many
>> DBCPServices where the property name was the db.id, and it would just
>> return a Connection from the one with the given id.
>>
>> There are definitely other options for how to implement the dynamic
>> connection service, but this would be a good one to have.
>>
>> -Bryan
>>
>>
>> On Wed, Apr 25, 2018 at 10:58 AM, Charlie Meyer
>>  wrote:
>> > Chiming in a bit late on this, but we faced this same issue and got
>> around
>> > it by implementing a custom controller service which acts as a "router"
>> to
>> > different dbcp services. It exposes a method which given a uuid, returns
>> > back the DBCPservice that corresponds with that uuid if it exists using
>> >
>> >  DBCPService dbcpService =
>> > (DBCPService)
>> > getControllerServiceLookup().getControllerService(uuid);
>> >
>> > From there, we created processors we needed based on the stock ones which
>> > relied on our "router" service rather than a single DBCP. Our custom
>> > processors read an attribute from incoming flow files, then send that to
>> > the router which returns back the connection pool.
>> >
>> > On Wed, Apr 25, 2018 at 9:48 AM, Bryan Bende  wrote:
>> >
>> >> Here is a proposal for how to modify the existing API to support both
>> >> scenarios:
>> >>
>> >> https://issues.apache.org/jira/browse/NIFI-5121
>> >>
>> >> The scope of that ticket would be to make the interface change, and
>> >> then update all of NiFi's DB processors to pass in the attribute map.
>> >>
>> >> Then a separate effort to provide a new service implementation that
>> >> used the attribute map to somehow manage multiple connection pools, or
>> >> create connections on the fly, or whatever the desired behavior is.
>> >>
>> >> On Wed, Apr 25, 2018 at 9:34 AM, Bryan Bende  wrote:
>> >> > To Otto's question...
>> >> >
>> >> > For simplicity sake, there is a new implementation of
>> >> > DBCPConnectionPool that behind the scenes has two connection pools,
>> >> > one for DB A and one for DB B, it doesn't matter how these are
>> >> > configured.
>> >> >
>> >> > Now a flow file comes into the ExecuteSQL and it goes to
>> >> > connectionPoolService.getConnection()...
>> >> >
>> >> > How does it know which DB to return a connection for?
>> >> >
>> >> >
>> >> > On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna <
>> sivaprasanna...@gmail.com>
>> >> wrote:
>> >> >> Option 2 and 3 seem to be a probable approach. However creating
>> varying
>> >> >> number of connections based on *each* flowfile still sounds to be
>> >> >> suboptimal. If the requirement still demands to take that road, then
>> >> it’s
>> >> >> better to do some prep-work.. as in the list of probable connections
>> >> that
>> >> >> are required are taken and connection pools are created for them and
>> >> then
>> >> >> based on the flowfiles (which connection it needs), we use the
>> relevant
>> >> >> one.
>> >> >>
>> >> >> Thanks,
>> >> >> Sivaprasanna
>> >> >>
>> >> >> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende 
>> wrote:
>> >> >>
>> >> >>> The issue here is more about the service API and not the
>> >> implementations.
>> >> >>>
>> >> >>> The current API has no way to pass information between the processor
>> >> and
>> >> >>> service.
>> >> >>>
>> >> >>> The options boil down to...
>> >> >>>
>> >> >>> - Make a new API, but then you need all new processors that use the
>> >> new API
>> >> >>>
>> >> >>> - Modify the current API to have a new method, but then we are
>> combing
>> >> two
>> >> >>> concepts into one API and some impls may not implement both
>> >> >>>
>> >> >>> - Modify the processors to use two different service APIs, but
>> enforce
>> >> that
>> >> >>> only one can be used at a time, so it can have 

Re: Custom Controller Service

2018-04-25 Thread Sivaprasanna
Okay.. but two questions:


   1. We are passing the attribute 'db.id' that means, we'll be using
   'UpdateAttribute' to do add that attribute to flowfile?
   2. If we are to use 'UpdateAttribute' to set the value for 'db.id', we
   need to know before hand, right?

-

Sivaprasanna

On Wed, Apr 25, 2018 at 8:38 PM, Bryan Bende  wrote:

> Charlie,
>
> That is a really nice solution, thanks for sharing.
>
> If we make the API changes in that JIRA I just sent, I could see
> having a new implementation of DBCPService that does something very
> similar.
>
> Basically there could be "DelegatingDBCPService" which still
> implemented the same DBCPService interface but followed a convention
> where it always looked in the attribute map for an attribute called
> "db.id", and then it itself has dynamic properties to define many
> DBCPServices where the property name was the db.id, and it would just
> return a Connection from the one with the given id.
>
> There are definitely other options for how to implement the dynamic
> connection service, but this would be a good one to have.
>
> -Bryan
>
>
> On Wed, Apr 25, 2018 at 10:58 AM, Charlie Meyer
>  wrote:
> > Chiming in a bit late on this, but we faced this same issue and got
> around
> > it by implementing a custom controller service which acts as a "router"
> to
> > different dbcp services. It exposes a method which given a uuid, returns
> > back the DBCPservice that corresponds with that uuid if it exists using
> >
> >  DBCPService dbcpService =
> > (DBCPService)
> > getControllerServiceLookup().getControllerService(uuid);
> >
> > From there, we created processors we needed based on the stock ones which
> > relied on our "router" service rather than a single DBCP. Our custom
> > processors read an attribute from incoming flow files, then send that to
> > the router which returns back the connection pool.
> >
> > On Wed, Apr 25, 2018 at 9:48 AM, Bryan Bende  wrote:
> >
> >> Here is a proposal for how to modify the existing API to support both
> >> scenarios:
> >>
> >> https://issues.apache.org/jira/browse/NIFI-5121
> >>
> >> The scope of that ticket would be to make the interface change, and
> >> then update all of NiFi's DB processors to pass in the attribute map.
> >>
> >> Then a separate effort to provide a new service implementation that
> >> used the attribute map to somehow manage multiple connection pools, or
> >> create connections on the fly, or whatever the desired behavior is.
> >>
> >> On Wed, Apr 25, 2018 at 9:34 AM, Bryan Bende  wrote:
> >> > To Otto's question...
> >> >
> >> > For simplicity sake, there is a new implementation of
> >> > DBCPConnectionPool that behind the scenes has two connection pools,
> >> > one for DB A and one for DB B, it doesn't matter how these are
> >> > configured.
> >> >
> >> > Now a flow file comes into the ExecuteSQL and it goes to
> >> > connectionPoolService.getConnection()...
> >> >
> >> > How does it know which DB to return a connection for?
> >> >
> >> >
> >> > On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna <
> sivaprasanna...@gmail.com>
> >> wrote:
> >> >> Option 2 and 3 seem to be a probable approach. However creating
> varying
> >> >> number of connections based on *each* flowfile still sounds to be
> >> >> suboptimal. If the requirement still demands to take that road, then
> >> it’s
> >> >> better to do some prep-work.. as in the list of probable connections
> >> that
> >> >> are required are taken and connection pools are created for them and
> >> then
> >> >> based on the flowfiles (which connection it needs), we use the
> relevant
> >> >> one.
> >> >>
> >> >> Thanks,
> >> >> Sivaprasanna
> >> >>
> >> >> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende 
> wrote:
> >> >>
> >> >>> The issue here is more about the service API and not the
> >> implementations.
> >> >>>
> >> >>> The current API has no way to pass information between the processor
> >> and
> >> >>> service.
> >> >>>
> >> >>> The options boil down to...
> >> >>>
> >> >>> - Make a new API, but then you need all new processors that use the
> >> new API
> >> >>>
> >> >>> - Modify the current API to have a new method, but then we are
> combing
> >> two
> >> >>> concepts into one API and some impls may not implement both
> >> >>>
> >> >>> - Modify the processors to use two different service APIs, but
> enforce
> >> that
> >> >>> only one can be used at a time, so it can have either the original
> >> >>> connection pool service or can have some new dynamic connection
> >> factory,
> >> >>>  but not both, and then modify all DB processors to have logic to
> >> determine
> >> >>> which service to use.
> >> >>>
> >> >>> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler <
> ottobackwa...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Or you could just call every time you needed properties more
> likely.
> >> >>> > This would still be custom unless integrated….
> >> >>> >
> >> >>> >
> >> >>> > On April 25, 2018 at 08:26:57, Otto Fowler (
> ottobackwa...@gmail.com)

Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
Charlie,

That is a really nice solution, thanks for sharing.

If we make the API changes in that JIRA I just sent, I could see
having a new implementation of DBCPService that does something very
similar.

Basically there could be "DelegatingDBCPService" which still
implemented the same DBCPService interface but followed a convention
where it always looked in the attribute map for an attribute called
"db.id", and then it itself has dynamic properties to define many
DBCPServices where the property name was the db.id, and it would just
return a Connection from the one with the given id.

There are definitely other options for how to implement the dynamic
connection service, but this would be a good one to have.

-Bryan


On Wed, Apr 25, 2018 at 10:58 AM, Charlie Meyer
 wrote:
> Chiming in a bit late on this, but we faced this same issue and got around
> it by implementing a custom controller service which acts as a "router" to
> different dbcp services. It exposes a method which given a uuid, returns
> back the DBCPservice that corresponds with that uuid if it exists using
>
>  DBCPService dbcpService =
> (DBCPService)
> getControllerServiceLookup().getControllerService(uuid);
>
> From there, we created processors we needed based on the stock ones which
> relied on our "router" service rather than a single DBCP. Our custom
> processors read an attribute from incoming flow files, then send that to
> the router which returns back the connection pool.
>
> On Wed, Apr 25, 2018 at 9:48 AM, Bryan Bende  wrote:
>
>> Here is a proposal for how to modify the existing API to support both
>> scenarios:
>>
>> https://issues.apache.org/jira/browse/NIFI-5121
>>
>> The scope of that ticket would be to make the interface change, and
>> then update all of NiFi's DB processors to pass in the attribute map.
>>
>> Then a separate effort to provide a new service implementation that
>> used the attribute map to somehow manage multiple connection pools, or
>> create connections on the fly, or whatever the desired behavior is.
>>
>> On Wed, Apr 25, 2018 at 9:34 AM, Bryan Bende  wrote:
>> > To Otto's question...
>> >
>> > For simplicity sake, there is a new implementation of
>> > DBCPConnectionPool that behind the scenes has two connection pools,
>> > one for DB A and one for DB B, it doesn't matter how these are
>> > configured.
>> >
>> > Now a flow file comes into the ExecuteSQL and it goes to
>> > connectionPoolService.getConnection()...
>> >
>> > How does it know which DB to return a connection for?
>> >
>> >
>> > On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna 
>> wrote:
>> >> Option 2 and 3 seem to be a probable approach. However creating varying
>> >> number of connections based on *each* flowfile still sounds to be
>> >> suboptimal. If the requirement still demands to take that road, then
>> it’s
>> >> better to do some prep-work.. as in the list of probable connections
>> that
>> >> are required are taken and connection pools are created for them and
>> then
>> >> based on the flowfiles (which connection it needs), we use the relevant
>> >> one.
>> >>
>> >> Thanks,
>> >> Sivaprasanna
>> >>
>> >> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende  wrote:
>> >>
>> >>> The issue here is more about the service API and not the
>> implementations.
>> >>>
>> >>> The current API has no way to pass information between the processor
>> and
>> >>> service.
>> >>>
>> >>> The options boil down to...
>> >>>
>> >>> - Make a new API, but then you need all new processors that use the
>> new API
>> >>>
>> >>> - Modify the current API to have a new method, but then we are combing
>> two
>> >>> concepts into one API and some impls may not implement both
>> >>>
>> >>> - Modify the processors to use two different service APIs, but enforce
>> that
>> >>> only one can be used at a time, so it can have either the original
>> >>> connection pool service or can have some new dynamic connection
>> factory,
>> >>>  but not both, and then modify all DB processors to have logic to
>> determine
>> >>> which service to use.
>> >>>
>> >>> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
>> >>> wrote:
>> >>>
>> >>> > Or you could just call every time you needed properties more likely.
>> >>> > This would still be custom unless integrated….
>> >>> >
>> >>> >
>> >>> > On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
>> >>> > wrote:
>> >>> >
>> >>> > Can services work with other controller services?
>> >>> > Maybe a PropertiesControllerService, FilePropertiesControllerService
>> >>> could
>> >>> > work with your service?
>> >>> > the PCS could fire events on property changes etc.
>> >>> >
>> >>> >
>> >>> >
>> >>> > On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
>> >>> > wrote:
>> >>> >
>> >>> > Shot in the dark here, but what you try to do is create a custom
>> >>> connection
>> >>> > pool service that uses dynamic properties to build a "pool of
>> connection
>> >>> > pools." You could then use the property names as hints fo

Re: Custom Controller Service

2018-04-25 Thread Charlie Meyer
Chiming in a bit late on this, but we faced this same issue and got around
it by implementing a custom controller service which acts as a "router" to
different dbcp services. It exposes a method which given a uuid, returns
back the DBCPservice that corresponds with that uuid if it exists using

 DBCPService dbcpService =
(DBCPService)
getControllerServiceLookup().getControllerService(uuid);

>From there, we created processors we needed based on the stock ones which
relied on our "router" service rather than a single DBCP. Our custom
processors read an attribute from incoming flow files, then send that to
the router which returns back the connection pool.

On Wed, Apr 25, 2018 at 9:48 AM, Bryan Bende  wrote:

> Here is a proposal for how to modify the existing API to support both
> scenarios:
>
> https://issues.apache.org/jira/browse/NIFI-5121
>
> The scope of that ticket would be to make the interface change, and
> then update all of NiFi's DB processors to pass in the attribute map.
>
> Then a separate effort to provide a new service implementation that
> used the attribute map to somehow manage multiple connection pools, or
> create connections on the fly, or whatever the desired behavior is.
>
> On Wed, Apr 25, 2018 at 9:34 AM, Bryan Bende  wrote:
> > To Otto's question...
> >
> > For simplicity sake, there is a new implementation of
> > DBCPConnectionPool that behind the scenes has two connection pools,
> > one for DB A and one for DB B, it doesn't matter how these are
> > configured.
> >
> > Now a flow file comes into the ExecuteSQL and it goes to
> > connectionPoolService.getConnection()...
> >
> > How does it know which DB to return a connection for?
> >
> >
> > On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna 
> wrote:
> >> Option 2 and 3 seem to be a probable approach. However creating varying
> >> number of connections based on *each* flowfile still sounds to be
> >> suboptimal. If the requirement still demands to take that road, then
> it’s
> >> better to do some prep-work.. as in the list of probable connections
> that
> >> are required are taken and connection pools are created for them and
> then
> >> based on the flowfiles (which connection it needs), we use the relevant
> >> one.
> >>
> >> Thanks,
> >> Sivaprasanna
> >>
> >> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende  wrote:
> >>
> >>> The issue here is more about the service API and not the
> implementations.
> >>>
> >>> The current API has no way to pass information between the processor
> and
> >>> service.
> >>>
> >>> The options boil down to...
> >>>
> >>> - Make a new API, but then you need all new processors that use the
> new API
> >>>
> >>> - Modify the current API to have a new method, but then we are combing
> two
> >>> concepts into one API and some impls may not implement both
> >>>
> >>> - Modify the processors to use two different service APIs, but enforce
> that
> >>> only one can be used at a time, so it can have either the original
> >>> connection pool service or can have some new dynamic connection
> factory,
> >>>  but not both, and then modify all DB processors to have logic to
> determine
> >>> which service to use.
> >>>
> >>> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
> >>> wrote:
> >>>
> >>> > Or you could just call every time you needed properties more likely.
> >>> > This would still be custom unless integrated….
> >>> >
> >>> >
> >>> > On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
> >>> > wrote:
> >>> >
> >>> > Can services work with other controller services?
> >>> > Maybe a PropertiesControllerService, FilePropertiesControllerService
> >>> could
> >>> > work with your service?
> >>> > the PCS could fire events on property changes etc.
> >>> >
> >>> >
> >>> >
> >>> > On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
> >>> > wrote:
> >>> >
> >>> > Shot in the dark here, but what you try to do is create a custom
> >>> connection
> >>> > pool service that uses dynamic properties to build a "pool of
> connection
> >>> > pools." You could then use the property names as hints for where to
> send
> >>> > the queries.
> >>> >
> >>> > On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad <
> rishabprasad...@gmail.com
> >>> >
> >>> > wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > Basically, there are 'n' number of databases that we are dealing
> with.
> >>> We
> >>> > > need to fetch the data from the source database into HDFS. Now
> since we
> >>> > are
> >>> > > dealing with many databases, the source database is not static and
> >>> > changes
> >>> > > every now and then. And every time the source database changes we
> >>> > manually
> >>> > > need to change the value for the connection parameters in
> >>> > > DBCPConnectionPool. Now, people suggest that for 'n' databases
> create
> >>> 'n'
> >>> > > connections for each database, but that is not possible because
> 'n' is
> >>> a
> >>> > > big number and creating that many connections in
> DBCPConnectionPool is
> >>> > not
>

Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
Here is a proposal for how to modify the existing API to support both scenarios:

https://issues.apache.org/jira/browse/NIFI-5121

The scope of that ticket would be to make the interface change, and
then update all of NiFi's DB processors to pass in the attribute map.

Then a separate effort to provide a new service implementation that
used the attribute map to somehow manage multiple connection pools, or
create connections on the fly, or whatever the desired behavior is.

On Wed, Apr 25, 2018 at 9:34 AM, Bryan Bende  wrote:
> To Otto's question...
>
> For simplicity sake, there is a new implementation of
> DBCPConnectionPool that behind the scenes has two connection pools,
> one for DB A and one for DB B, it doesn't matter how these are
> configured.
>
> Now a flow file comes into the ExecuteSQL and it goes to
> connectionPoolService.getConnection()...
>
> How does it know which DB to return a connection for?
>
>
> On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna  
> wrote:
>> Option 2 and 3 seem to be a probable approach. However creating varying
>> number of connections based on *each* flowfile still sounds to be
>> suboptimal. If the requirement still demands to take that road, then it’s
>> better to do some prep-work.. as in the list of probable connections that
>> are required are taken and connection pools are created for them and then
>> based on the flowfiles (which connection it needs), we use the relevant
>> one.
>>
>> Thanks,
>> Sivaprasanna
>>
>> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende  wrote:
>>
>>> The issue here is more about the service API and not the implementations.
>>>
>>> The current API has no way to pass information between the processor and
>>> service.
>>>
>>> The options boil down to...
>>>
>>> - Make a new API, but then you need all new processors that use the new API
>>>
>>> - Modify the current API to have a new method, but then we are combing two
>>> concepts into one API and some impls may not implement both
>>>
>>> - Modify the processors to use two different service APIs, but enforce that
>>> only one can be used at a time, so it can have either the original
>>> connection pool service or can have some new dynamic connection factory,
>>>  but not both, and then modify all DB processors to have logic to determine
>>> which service to use.
>>>
>>> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
>>> wrote:
>>>
>>> > Or you could just call every time you needed properties more likely.
>>> > This would still be custom unless integrated….
>>> >
>>> >
>>> > On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
>>> > wrote:
>>> >
>>> > Can services work with other controller services?
>>> > Maybe a PropertiesControllerService, FilePropertiesControllerService
>>> could
>>> > work with your service?
>>> > the PCS could fire events on property changes etc.
>>> >
>>> >
>>> >
>>> > On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
>>> > wrote:
>>> >
>>> > Shot in the dark here, but what you try to do is create a custom
>>> connection
>>> > pool service that uses dynamic properties to build a "pool of connection
>>> > pools." You could then use the property names as hints for where to send
>>> > the queries.
>>> >
>>> > On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad >> >
>>> > wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > Basically, there are 'n' number of databases that we are dealing with.
>>> We
>>> > > need to fetch the data from the source database into HDFS. Now since we
>>> > are
>>> > > dealing with many databases, the source database is not static and
>>> > changes
>>> > > every now and then. And every time the source database changes we
>>> > manually
>>> > > need to change the value for the connection parameters in
>>> > > DBCPConnectionPool. Now, people suggest that for 'n' databases create
>>> 'n'
>>> > > connections for each database, but that is not possible because 'n' is
>>> a
>>> > > big number and creating that many connections in DBCPConnectionPool is
>>> > not
>>> > > possible. So we were looking for a way where we can specify all the
>>> > > connection parameters in a file present in our local system and then
>>> make
>>> > > the DBCPConnectionPool controller service to read the values from the
>>> > file.
>>> > > In that way we can simply change the value in the file present in the
>>> > local
>>> > > system. No need to alter anything in the dataflow. But it turns out
>>> that
>>> > > FlowFile attributes are not available to the controller services as the
>>> > > expression language is evaluated at the time of service enable.
>>> > >
>>> > > So can you suggest a way where I can achieve my requirement (except
>>> > > 'variable.registry' ) ? I am looking to develop a custom controller
>>> > service
>>> > > that can serve the requirement but how do I make the flowfile
>>> attributes
>>> > > available to the service?
>>> > >
>>> >
>>> --
>>> Sent from Gmail Mobile
>>>


Re: Is there a configuration to limit the size of nifi's flowfile repository

2018-04-25 Thread Sivaprasanna
No, he actually had mentioned “like content repository”. The answer is,
there aren’t any properties that support this, AFAIK. Pierre’s response
pretty much sums up why there aren’t any properties.

Thanks,
Sivaprasanna

On Wed, 25 Apr 2018 at 7:10 PM, Mike Thomsen  wrote:

> I have a feeling that what Ben meant was how to limit the content
> repository size.
>
> On Wed, Apr 25, 2018 at 8:26 AM Pierre Villard <
> pierre.villard...@gmail.com>
> wrote:
>
> > Hi Ben,
> >
> > Since the flow file repository contains the information of the flow files
> > currently being processed by NiFi, you don't want to limit that
> repository
> > in size since it would prevent the workflows to create new flow files.
> >
> > Besides this repository is very lightweight, I'm not sure it'd need to be
> > limited in size.
> > Do you have a specific use case in mind?
> >
> > Pierre
> >
> >
> > 2018-04-25 9:15 GMT+02:00 尹文才 :
> >
> > > Hi guys, I checked NIFI's system administrator guide trying to find a
> > > configuration item so that the size of the flowfile repository could be
> > > limited similar to the other repositories(e.g. content repository),
> but I
> > > didn't find such configuration items, is there currently any
> > configuration
> > > to limit the flowfile repository's size? thanks.
> > >
> > > Regards,
> > > Ben
> > >
> >
>


Re: Is there a configuration to limit the size of nifi's flowfile repository

2018-04-25 Thread Mike Thomsen
I have a feeling that what Ben meant was how to limit the content
repository size.

On Wed, Apr 25, 2018 at 8:26 AM Pierre Villard 
wrote:

> Hi Ben,
>
> Since the flow file repository contains the information of the flow files
> currently being processed by NiFi, you don't want to limit that repository
> in size since it would prevent the workflows to create new flow files.
>
> Besides this repository is very lightweight, I'm not sure it'd need to be
> limited in size.
> Do you have a specific use case in mind?
>
> Pierre
>
>
> 2018-04-25 9:15 GMT+02:00 尹文才 :
>
> > Hi guys, I checked NIFI's system administrator guide trying to find a
> > configuration item so that the size of the flowfile repository could be
> > limited similar to the other repositories(e.g. content repository), but I
> > didn't find such configuration items, is there currently any
> configuration
> > to limit the flowfile repository's size? thanks.
> >
> > Regards,
> > Ben
> >
>


Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
To Otto's question...

For simplicity sake, there is a new implementation of
DBCPConnectionPool that behind the scenes has two connection pools,
one for DB A and one for DB B, it doesn't matter how these are
configured.

Now a flow file comes into the ExecuteSQL and it goes to
connectionPoolService.getConnection()...

How does it know which DB to return a connection for?


On Wed, Apr 25, 2018 at 9:01 AM, Sivaprasanna  wrote:
> Option 2 and 3 seem to be a probable approach. However creating varying
> number of connections based on *each* flowfile still sounds to be
> suboptimal. If the requirement still demands to take that road, then it’s
> better to do some prep-work.. as in the list of probable connections that
> are required are taken and connection pools are created for them and then
> based on the flowfiles (which connection it needs), we use the relevant
> one.
>
> Thanks,
> Sivaprasanna
>
> On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende  wrote:
>
>> The issue here is more about the service API and not the implementations.
>>
>> The current API has no way to pass information between the processor and
>> service.
>>
>> The options boil down to...
>>
>> - Make a new API, but then you need all new processors that use the new API
>>
>> - Modify the current API to have a new method, but then we are combing two
>> concepts into one API and some impls may not implement both
>>
>> - Modify the processors to use two different service APIs, but enforce that
>> only one can be used at a time, so it can have either the original
>> connection pool service or can have some new dynamic connection factory,
>>  but not both, and then modify all DB processors to have logic to determine
>> which service to use.
>>
>> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
>> wrote:
>>
>> > Or you could just call every time you needed properties more likely.
>> > This would still be custom unless integrated….
>> >
>> >
>> > On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
>> > wrote:
>> >
>> > Can services work with other controller services?
>> > Maybe a PropertiesControllerService, FilePropertiesControllerService
>> could
>> > work with your service?
>> > the PCS could fire events on property changes etc.
>> >
>> >
>> >
>> > On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
>> > wrote:
>> >
>> > Shot in the dark here, but what you try to do is create a custom
>> connection
>> > pool service that uses dynamic properties to build a "pool of connection
>> > pools." You could then use the property names as hints for where to send
>> > the queries.
>> >
>> > On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad > >
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Basically, there are 'n' number of databases that we are dealing with.
>> We
>> > > need to fetch the data from the source database into HDFS. Now since we
>> > are
>> > > dealing with many databases, the source database is not static and
>> > changes
>> > > every now and then. And every time the source database changes we
>> > manually
>> > > need to change the value for the connection parameters in
>> > > DBCPConnectionPool. Now, people suggest that for 'n' databases create
>> 'n'
>> > > connections for each database, but that is not possible because 'n' is
>> a
>> > > big number and creating that many connections in DBCPConnectionPool is
>> > not
>> > > possible. So we were looking for a way where we can specify all the
>> > > connection parameters in a file present in our local system and then
>> make
>> > > the DBCPConnectionPool controller service to read the values from the
>> > file.
>> > > In that way we can simply change the value in the file present in the
>> > local
>> > > system. No need to alter anything in the dataflow. But it turns out
>> that
>> > > FlowFile attributes are not available to the controller services as the
>> > > expression language is evaluated at the time of service enable.
>> > >
>> > > So can you suggest a way where I can achieve my requirement (except
>> > > 'variable.registry' ) ? I am looking to develop a custom controller
>> > service
>> > > that can serve the requirement but how do I make the flowfile
>> attributes
>> > > available to the service?
>> > >
>> >
>> --
>> Sent from Gmail Mobile
>>


Re: status bar counts on a cluster

2018-04-25 Thread Mark Bean
This seems to have slipped through the cracks; I haven't seen a response.
Does anyone have input?

Thanks,
Mark

On Fri, Apr 20, 2018 at 10:40 AM, Mark Bean  wrote:

> On a cluster, the status bar reports 4 invalid processors. However, on
> some nodes there are actually 6 invalid processors. The extra two
> processors are invalid because a required configuration file (a property of
> the processor) does not exist on some nodes.
>
> The count on the status bar is not coming from the Cluster Coordinator nor
> the Primary Node. Both of those nodes are missing the configuration file
> and have a count of 6 invalid processors. Also, no matter which node the UI
> is using, the status bar count is always 4. So, the count is not coming
> from the UI's host.
>
> Question: how does the status bar derive its information?
>
> Thanks,
> Mark
>


Search for Controller Service UUID

2018-04-25 Thread Mark Bean
When I search for a Controller Service by UUID using the search on the
toolbar, only processors which reference the service are listed, not the
service itself. Similarly, when selecting the UUID from the Bulletin Board,
it reports "Error: Unable to find the specified component".

Is this by design?

Thanks,
Mark


Re: Custom Controller Service

2018-04-25 Thread Sivaprasanna
Option 2 and 3 seem to be a probable approach. However creating varying
number of connections based on *each* flowfile still sounds to be
suboptimal. If the requirement still demands to take that road, then it’s
better to do some prep-work.. as in the list of probable connections that
are required are taken and connection pools are created for them and then
based on the flowfiles (which connection it needs), we use the relevant
one.

Thanks,
Sivaprasanna

On Wed, 25 Apr 2018 at 6:07 PM, Bryan Bende  wrote:

> The issue here is more about the service API and not the implementations.
>
> The current API has no way to pass information between the processor and
> service.
>
> The options boil down to...
>
> - Make a new API, but then you need all new processors that use the new API
>
> - Modify the current API to have a new method, but then we are combing two
> concepts into one API and some impls may not implement both
>
> - Modify the processors to use two different service APIs, but enforce that
> only one can be used at a time, so it can have either the original
> connection pool service or can have some new dynamic connection factory,
>  but not both, and then modify all DB processors to have logic to determine
> which service to use.
>
> On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
> wrote:
>
> > Or you could just call every time you needed properties more likely.
> > This would still be custom unless integrated….
> >
> >
> > On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
> > wrote:
> >
> > Can services work with other controller services?
> > Maybe a PropertiesControllerService, FilePropertiesControllerService
> could
> > work with your service?
> > the PCS could fire events on property changes etc.
> >
> >
> >
> > On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
> > wrote:
> >
> > Shot in the dark here, but what you try to do is create a custom
> connection
> > pool service that uses dynamic properties to build a "pool of connection
> > pools." You could then use the property names as hints for where to send
> > the queries.
> >
> > On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad  >
> > wrote:
> >
> > > Hi,
> > >
> > > Basically, there are 'n' number of databases that we are dealing with.
> We
> > > need to fetch the data from the source database into HDFS. Now since we
> > are
> > > dealing with many databases, the source database is not static and
> > changes
> > > every now and then. And every time the source database changes we
> > manually
> > > need to change the value for the connection parameters in
> > > DBCPConnectionPool. Now, people suggest that for 'n' databases create
> 'n'
> > > connections for each database, but that is not possible because 'n' is
> a
> > > big number and creating that many connections in DBCPConnectionPool is
> > not
> > > possible. So we were looking for a way where we can specify all the
> > > connection parameters in a file present in our local system and then
> make
> > > the DBCPConnectionPool controller service to read the values from the
> > file.
> > > In that way we can simply change the value in the file present in the
> > local
> > > system. No need to alter anything in the dataflow. But it turns out
> that
> > > FlowFile attributes are not available to the controller services as the
> > > expression language is evaluated at the time of service enable.
> > >
> > > So can you suggest a way where I can achieve my requirement (except
> > > 'variable.registry' ) ? I am looking to develop a custom controller
> > service
> > > that can serve the requirement but how do I make the flowfile
> attributes
> > > available to the service?
> > >
> >
> --
> Sent from Gmail Mobile
>


Re: Custom Controller Service

2018-04-25 Thread Otto Fowler
If any controller service optionally supported this external service ( like
the AWS processors optional support the credentials service )
then there is no need for the processor to change though right?


On April 25, 2018 at 08:37:50, Bryan Bende (bbe...@gmail.com) wrote:

The issue here is more about the service API and not the implementations.

The current API has no way to pass information between the processor and
service.

The options boil down to...

- Make a new API, but then you need all new processors that use the new API

- Modify the current API to have a new method, but then we are combing two
concepts into one API and some impls may not implement both

- Modify the processors to use two different service APIs, but enforce that
only one can be used at a time, so it can have either the original
connection pool service or can have some new dynamic connection factory,
but not both, and then modify all DB processors to have logic to determine
which service to use.

On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler 
wrote:

> Or you could just call every time you needed properties more likely.
> This would still be custom unless integrated….
>
>
> On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> Can services work with other controller services?
> Maybe a PropertiesControllerService, FilePropertiesControllerService
could
> work with your service?
> the PCS could fire events on property changes etc.
>
>
>
> On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
> wrote:
>
> Shot in the dark here, but what you try to do is create a custom
connection
> pool service that uses dynamic properties to build a "pool of connection
> pools." You could then use the property names as hints for where to send
> the queries.
>
> On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad 
> wrote:
>
> > Hi,
> >
> > Basically, there are 'n' number of databases that we are dealing with.
We
> > need to fetch the data from the source database into HDFS. Now since we
> are
> > dealing with many databases, the source database is not static and
> changes
> > every now and then. And every time the source database changes we
> manually
> > need to change the value for the connection parameters in
> > DBCPConnectionPool. Now, people suggest that for 'n' databases create
'n'
> > connections for each database, but that is not possible because 'n' is
a
> > big number and creating that many connections in DBCPConnectionPool is
> not
> > possible. So we were looking for a way where we can specify all the
> > connection parameters in a file present in our local system and then
make
> > the DBCPConnectionPool controller service to read the values from the
> file.
> > In that way we can simply change the value in the file present in the
> local
> > system. No need to alter anything in the dataflow. But it turns out
that
> > FlowFile attributes are not available to the controller services as the
> > expression language is evaluated at the time of service enable.
> >
> > So can you suggest a way where I can achieve my requirement (except
> > 'variable.registry' ) ? I am looking to develop a custom controller
> service
> > that can serve the requirement but how do I make the flowfile
attributes
> > available to the service?
> >
>
-- 
Sent from Gmail Mobile


Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
The issue here is more about the service API and not the implementations.

The current API has no way to pass information between the processor and
service.

The options boil down to...

- Make a new API, but then you need all new processors that use the new API

- Modify the current API to have a new method, but then we are combing two
concepts into one API and some impls may not implement both

- Modify the processors to use two different service APIs, but enforce that
only one can be used at a time, so it can have either the original
connection pool service or can have some new dynamic connection factory,
 but not both, and then modify all DB processors to have logic to determine
which service to use.

On Wed, Apr 25, 2018 at 8:28 AM Otto Fowler  wrote:

> Or you could just call every time you needed properties more likely.
> This would still be custom unless integrated….
>
>
> On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com)
> wrote:
>
> Can services work with other controller services?
> Maybe a PropertiesControllerService, FilePropertiesControllerService could
> work with your service?
> the PCS could fire events on property changes etc.
>
>
>
> On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com)
> wrote:
>
> Shot in the dark here, but what you try to do is create a custom connection
> pool service that uses dynamic properties to build a "pool of connection
> pools." You could then use the property names as hints for where to send
> the queries.
>
> On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad 
> wrote:
>
> > Hi,
> >
> > Basically, there are 'n' number of databases that we are dealing with. We
> > need to fetch the data from the source database into HDFS. Now since we
> are
> > dealing with many databases, the source database is not static and
> changes
> > every now and then. And every time the source database changes we
> manually
> > need to change the value for the connection parameters in
> > DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> > connections for each database, but that is not possible because 'n' is a
> > big number and creating that many connections in DBCPConnectionPool is
> not
> > possible. So we were looking for a way where we can specify all the
> > connection parameters in a file present in our local system and then make
> > the DBCPConnectionPool controller service to read the values from the
> file.
> > In that way we can simply change the value in the file present in the
> local
> > system. No need to alter anything in the dataflow. But it turns out that
> > FlowFile attributes are not available to the controller services as the
> > expression language is evaluated at the time of service enable.
> >
> > So can you suggest a way where I can achieve my requirement (except
> > 'variable.registry' ) ? I am looking to develop a custom controller
> service
> > that can serve the requirement but how do I make the flowfile attributes
> > available to the service?
> >
>
-- 
Sent from Gmail Mobile


Re: Custom Controller Service

2018-04-25 Thread Otto Fowler
Or you could just call every time you needed properties more likely.
This would still be custom unless integrated….


On April 25, 2018 at 08:26:57, Otto Fowler (ottobackwa...@gmail.com) wrote:

Can services work with other controller services?
Maybe a PropertiesControllerService, FilePropertiesControllerService could
work with your service?
the PCS could fire events on property changes etc.



On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com) wrote:

Shot in the dark here, but what you try to do is create a custom connection
pool service that uses dynamic properties to build a "pool of connection
pools." You could then use the property names as hints for where to send
the queries.

On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad 
wrote:

> Hi,
>
> Basically, there are 'n' number of databases that we are dealing with. We
> need to fetch the data from the source database into HDFS. Now since we
are
> dealing with many databases, the source database is not static and changes
> every now and then. And every time the source database changes we manually
> need to change the value for the connection parameters in
> DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> connections for each database, but that is not possible because 'n' is a
> big number and creating that many connections in DBCPConnectionPool is not
> possible. So we were looking for a way where we can specify all the
> connection parameters in a file present in our local system and then make
> the DBCPConnectionPool controller service to read the values from the
file.
> In that way we can simply change the value in the file present in the
local
> system. No need to alter anything in the dataflow. But it turns out that
> FlowFile attributes are not available to the controller services as the
> expression language is evaluated at the time of service enable.
>
> So can you suggest a way where I can achieve my requirement (except
> 'variable.registry' ) ? I am looking to develop a custom controller
service
> that can serve the requirement but how do I make the flowfile attributes
> available to the service?
>


Re: Custom Controller Service

2018-04-25 Thread Otto Fowler
Can services work with other controller services?
Maybe a PropertiesControllerService, FilePropertiesControllerService could
work with your service?
the PCS could fire events on property changes etc.



On April 25, 2018 at 08:05:27, Mike Thomsen (mikerthom...@gmail.com) wrote:

Shot in the dark here, but what you try to do is create a custom connection
pool service that uses dynamic properties to build a "pool of connection
pools." You could then use the property names as hints for where to send
the queries.

On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad 
wrote:

> Hi,
>
> Basically, there are 'n' number of databases that we are dealing with. We
> need to fetch the data from the source database into HDFS. Now since we
are
> dealing with many databases, the source database is not static and
changes
> every now and then. And every time the source database changes we
manually
> need to change the value for the connection parameters in
> DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> connections for each database, but that is not possible because 'n' is a
> big number and creating that many connections in DBCPConnectionPool is
not
> possible. So we were looking for a way where we can specify all the
> connection parameters in a file present in our local system and then make
> the DBCPConnectionPool controller service to read the values from the
file.
> In that way we can simply change the value in the file present in the
local
> system. No need to alter anything in the dataflow. But it turns out that
> FlowFile attributes are not available to the controller services as the
> expression language is evaluated at the time of service enable.
>
> So can you suggest a way where I can achieve my requirement (except
> 'variable.registry' ) ? I am looking to develop a custom controller
service
> that can serve the requirement but how do I make the flowfile attributes
> available to the service?
>


Re: Is there a configuration to limit the size of nifi's flowfile repository

2018-04-25 Thread Pierre Villard
Hi Ben,

Since the flow file repository contains the information of the flow files
currently being processed by NiFi, you don't want to limit that repository
in size since it would prevent the workflows to create new flow files.

Besides this repository is very lightweight, I'm not sure it'd need to be
limited in size.
Do you have a specific use case in mind?

Pierre


2018-04-25 9:15 GMT+02:00 尹文才 :

> Hi guys, I checked NIFI's system administrator guide trying to find a
> configuration item so that the size of the flowfile repository could be
> limited similar to the other repositories(e.g. content repository), but I
> didn't find such configuration items, is there currently any configuration
> to limit the flowfile repository's size? thanks.
>
> Regards,
> Ben
>


Re: Custom Controller Service

2018-04-25 Thread Mike Thomsen
Shot in the dark here, but what you try to do is create a custom connection
pool service that uses dynamic properties to build a "pool of connection
pools." You could then use the property names as hints for where to send
the queries.

On Wed, Apr 25, 2018 at 6:19 AM Rishab Prasad 
wrote:

> Hi,
>
> Basically, there are 'n' number of databases that we are dealing with. We
> need to fetch the data from the source database into HDFS. Now since we are
> dealing with many databases, the source database is not static and changes
> every now and then. And every time the source database changes we manually
> need to change the value for the connection parameters in
> DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> connections for each database, but that is not possible because 'n' is a
> big number and creating that many connections in DBCPConnectionPool is not
> possible. So we were looking for a way where we can specify all the
> connection parameters in a file present in our local system and then make
> the DBCPConnectionPool controller service to read the values from the file.
> In that way we can simply change the value in the file present in the local
> system. No need to alter anything in the dataflow. But it turns out that
> FlowFile attributes are not available to the controller services as the
> expression language is evaluated at the time of service enable.
>
> So can you suggest a way where I can achieve my requirement (except
> 'variable.registry' ) ? I am looking to develop a custom controller service
> that can serve the requirement but how do I make the flowfile attributes
> available to the service?
>


Re: Custom Controller Service

2018-04-25 Thread Bryan Bende
Hello,

Others who have worked on the DB related services and processors can
correct me if I'm wrong here, but...

In general the idea of a connection pool is that creating connections
is somewhat expensive, and for a high-volume of operations you don't
want to create a connection for each DB operation, so the connection
pool creates some number of connections ahead of time and makes them
re-usable across operations, thus being more efficient.

In your case you want dynamically created connections based on flow
file attributes, which means potentially the connection information
could be different for each flow file. At that point it starts to feel
like it isn't really a connection pool and is just a factory to obtain
a one-time use connection because otherwise would end-up needing to
obtain multiple connection pools behind the scenes.

A controller service has an API and then implementations of the API,
and the API just a Java interface.

The Java interface (API) is the contract with processors... a
processor gets a reference to an object that implements the interface,
and the processor can call methods on the interface. So if you want to
pass information from flow files to a controller service, then the
methods in the interface need to somehow accept that information.

The DBCPConnectionPool interface [1] is just a single method
"getConnection()" and it is designed to be a re-usable pool of
connections against a single DB (as I described in the first
paragraph).

You could define your own "DynamicDBCPConnectionPool"  API with a
method like "getConnection(String dbIdentifier)" and have an
implementation that loaded connection information from properties file
and kept a lookup from dbIdentifier to connection info, but then you
also need your own set DB processor because none of the existing DB
processors work against your new API.

Hope this helps.

-Bryan

[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-api/src/main/java/org/apache/nifi/dbcp/DBCPService.java#L33


On Wed, Apr 25, 2018 at 3:24 AM, Rishab Prasad
 wrote:
> Hi,
>
> Basically, there are 'n' number of databases that we are dealing with. We
> need to fetch the data from the source database into HDFS. Now since we are
> dealing with many databases, the source database is not static and changes
> every now and then. And every time the source database changes we manually
> need to change the value for the connection parameters in
> DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> connections for each database, but that is not possible because 'n' is a
> big number and creating that many connections in DBCPConnectionPool is not
> possible. So we were looking for a way where we can specify all the
> connection parameters in a file present in our local system and then make
> the DBCPConnectionPool controller service to read the values from the file.
> In that way we can simply change the value in the file present in the local
> system. No need to alter anything in the dataflow. But it turns out that
> FlowFile attributes are not available to the controller services as the
> expression language is evaluated at the time of service enable.
>
> So can you suggest a way where I can achieve my requirement (except
> 'variable.registry' ) ? I am looking to develop a custom controller service
> that can serve the requirement but how do I make the flowfile attributes
> available to the service?


Re: Pushing flows to Registry with Sensitive Information

2018-04-25 Thread Bryan Bende
Jorge,

Currently variables are not meant to store sensitive information, the
reason has to do with how users access variables...

The way a user accesses a variable is via expression language, and
since EL is just free from text entered into a property descriptor, it
is impossible to restrict which users can access a variable. Imagine a
multi-tenant environment with many teams, say there is variable
"db.password" at the root group... anyone anywhere in the dataflow can
create an UpdateAttribute processor and set foo = ${db.password} and
now they can list the queue and look at the attribute foo and get the
password.

When a flow is saved to registry, all sensitive properties are cleared
out (they shouldn't be variables anyway based on above). When the flow
is imported to the next environment, there is a one-time operation
required to go in and set those values specific for the given
environment. Setting these values will not trigger a local change for
version control, and they will also be retained across updates, so it
is really a one-time setup on import and then never worry about it
again when upgrading to a new versions.

There is probably some room for improvement around the UX of how the
sensitive variables are set during first import. Right now you have to
manually go through and find them and set them, but this could be
presented in a better way to automatically show all the sensitive
properties that need to be filled in.

Hope this helps.

-Bryan


On Wed, Apr 25, 2018 at 4:44 AM, Jorge Machado  wrote:
> Hi Guys,
>
> so I was playing with the registry and If I pushed a Processor that has 
> sensitive information like a password it will be discarded when pulling it 
> from the Registry, which is fine.
>
> Now comes the but. But if I put a variable there IMHO I think it should save 
> it on the registry.
>
> What do you think ?
>
> Jorge
>
>
>
>
>


Custom Controller Service

2018-04-25 Thread Rishab Prasad
Hi,

Basically, there are 'n' number of databases that we are dealing with. We
need to fetch the data from the source database into HDFS. Now since we are
dealing with many databases, the source database is not static and changes
every now and then. And every time the source database changes we manually
need to change the value for the connection parameters in
DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
connections for each database, but that is not possible because 'n' is a
big number and creating that many connections in DBCPConnectionPool is not
possible. So we were looking for a way where we can specify all the
connection parameters in a file present in our local system and then make
the DBCPConnectionPool controller service to read the values from the file.
In that way we can simply change the value in the file present in the local
system. No need to alter anything in the dataflow. But it turns out that
FlowFile attributes are not available to the controller services as the
expression language is evaluated at the time of service enable.

So can you suggest a way where I can achieve my requirement (except
'variable.registry' ) ? I am looking to develop a custom controller service
that can serve the requirement but how do I make the flowfile attributes
available to the service?


Pushing flows to Registry with Sensitive Information

2018-04-25 Thread Jorge Machado
Hi Guys, 

so I was playing with the registry and If I pushed a Processor that has 
sensitive information like a password it will be discarded when pulling it from 
the Registry, which is fine.

Now comes the but. But if I put a variable there IMHO I think it should save it 
on the registry.

What do you think ? 

Jorge 







Is there a configuration to limit the size of nifi's flowfile repository

2018-04-25 Thread 尹文才
Hi guys, I checked NIFI's system administrator guide trying to find a
configuration item so that the size of the flowfile repository could be
limited similar to the other repositories(e.g. content repository), but I
didn't find such configuration items, is there currently any configuration
to limit the flowfile repository's size? thanks.

Regards,
Ben