Hello,

Others who have worked on the DB related services and processors can
correct me if I'm wrong here, but...

In general the idea of a connection pool is that creating connections
is somewhat expensive, and for a high-volume of operations you don't
want to create a connection for each DB operation, so the connection
pool creates some number of connections ahead of time and makes them
re-usable across operations, thus being more efficient.

In your case you want dynamically created connections based on flow
file attributes, which means potentially the connection information
could be different for each flow file. At that point it starts to feel
like it isn't really a connection pool and is just a factory to obtain
a one-time use connection because otherwise would end-up needing to
obtain multiple connection pools behind the scenes.

A controller service has an API and then implementations of the API,
and the API just a Java interface.

The Java interface (API) is the contract with processors... a
processor gets a reference to an object that implements the interface,
and the processor can call methods on the interface. So if you want to
pass information from flow files to a controller service, then the
methods in the interface need to somehow accept that information.

The DBCPConnectionPool interface [1] is just a single method
"getConnection()" and it is designed to be a re-usable pool of
connections against a single DB (as I described in the first
paragraph).

You could define your own "DynamicDBCPConnectionPool"  API with a
method like "getConnection(String dbIdentifier)" and have an
implementation that loaded connection information from properties file
and kept a lookup from dbIdentifier to connection info, but then you
also need your own set DB processor because none of the existing DB
processors work against your new API.

Hope this helps.

-Bryan

[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-api/src/main/java/org/apache/nifi/dbcp/DBCPService.java#L33


On Wed, Apr 25, 2018 at 3:24 AM, Rishab Prasad
<rishabprasad...@gmail.com> wrote:
> Hi,
>
> Basically, there are 'n' number of databases that we are dealing with. We
> need to fetch the data from the source database into HDFS. Now since we are
> dealing with many databases, the source database is not static and changes
> every now and then. And every time the source database changes we manually
> need to change the value for the connection parameters in
> DBCPConnectionPool. Now, people suggest that for 'n' databases create 'n'
> connections for each database, but that is not possible because 'n' is a
> big number and creating that many connections in DBCPConnectionPool is not
> possible. So we were looking for a way where we can specify all the
> connection parameters in a file present in our local system and then make
> the DBCPConnectionPool controller service to read the values from the file.
> In that way we can simply change the value in the file present in the local
> system. No need to alter anything in the dataflow. But it turns out that
> FlowFile attributes are not available to the controller services as the
> expression language is evaluated at the time of service enable.
>
> So can you suggest a way where I can achieve my requirement (except
> 'variable.registry' ) ? I am looking to develop a custom controller service
> that can serve the requirement but how do I make the flowfile attributes
> available to the service?

Reply via email to