Unfortunately, Otto's ASCII art doesn't render nicely for me, but I agree with what I think he's saying. And it aligns nicely with what Joe said too.
Create a custom controller service that has the job of loading your configuration data set. This service can provide methods in its API to do lookups against that data Write your custom processors to do the needed lookup against that controller service. We have made a custom ConfigProvider interface for our custom controller services, with implementations to find configurations on the local file system or RESTfully from a URL. This allows us flexibility to configure controller services to work when NiFi is run in different environments. For example, with a NiFi cluster on bare metal, we might use other tools to distribute configuration files to all nodes of a cluster and the controller service is configured to find those config files on local file system. For NiFi running in a Docker container, we configure the controller services to retrieve config files from a URL (such as from an S3 bucket) on startup. It's a bit like the new ResourceDefinition and ResourceType feature of the nifi-api PropertyDescriptor, but with built-in extra properties to do periodic reloads and some other things. Regards, -- Mike On Wed, Sep 27, 2023 at 5:39 PM Otto Fowler <[email protected]> wrote: > ``` > > ┌──────────────────────┐ > │ │ > │ processor instance │────────────┐ > │ │ │ > └──────────────────────┘ │ > .─────────. > │ > ,─' '─. > │ > ,' `. > │ > ╱ ╲ > │ > ╱ ╲ > │ > ; : > │ > ┌─────────────────────────────┐ │ configuration > │ > │ │ > │ ┌─────▶│ authority │ > │ │ > Configruation service │ │ : ; > ├┬───────────▶│ > │──────┘ ╲ ╱ > ┌──────────────────────┐ ││ │ > │ ╲ ╱ > │ │ ││ > └─────────────────────────────┘ ╲ ╱ > │ processor instance │────────────┘│ > `. ,' > │ │ │ > '─. ,─' > └──────────────────────┘ │ > `───────' > │ > │ > │ > │ > │ > │ > │ > ┌──────────────────────┐ │ > │ │ │ > │ processor instance │────────────┤ > │ │ │ > └──────────────────────┘ │ > │ > │ > │ > │ > ┌──────────────────────┐ │ > │ │ │ > │ processor instance │────────────┘ > │ │ > └──────────────────────┘ > ``` > > You can also make a shared service component that loads the configurations > by some means and serves them to the processors. > The service can get the configurations however makes sense for you ( from > REST API like Joe is saying, to reading from disk or something ). > > > ``` > > > > On September 27, 2023 at 4:04:56 PM, Joe Witt ([email protected]) wrote: > > Russ > > It sounds like what you have is a case of significant reference data you > need made available to various instances of this processor that knows how > to use that reference state to do its function. > > This is similar to cases like IP geo enrichment where the dataset on which > you'd make the decision is larger and more importantly subject to change > over time. In such cases the ideal state is: > (A) The reference dataset(s) is hosted at a RESTful endpoint and can be > periodically pulled and stored some place local/easily accessible. > (B) The processor knows where to look for this reference dataset download > and is able to hot reload it on the fly to include understanding that the > needed datasets might not yet be made available and it should yield until > it sees them and loads them. > > Thanks > > On Wed, Sep 27, 2023 at 11:51 AM Russell Bateman <[email protected]> > wrote: > > > I'm posting this plea for suggestions as I'm short on imagination here. > > > > We have some custom processors that need extraordinary amounts of > > configuration of the sort a flow writer would have to copy and paste > > in--huge amounts of Yaml, regular expressions, etc. This is what our > > flow writers are already doing. It would be easier to insert a filename > > or -path, but... > > > > ...asking a custom processor to perform filesystem I/O is icky because > > of unpredictable filesystem access post installation. Thinking about how > > installation is beyond my control, I don't want to make installation > > messy, etc. Containers, Kubernetes deployment, etc. complicate this. > > > > I thought of wiring /GetFile/ to a subdirectory (problematic, but less > > so?) and accepting files as input to pass on to needy processors who > > would recognize, adopt and incorporate configuration based on > > higher-level and simpler cues posted by flow writers as property values. > > > > Assuming you both grok and are interested in what I'm asking, do you > > have thoughts, cautionary statements or even cat-calls to offer? Maybe > > there are obvious answers I'm just not thinking of. > > > > Profuse thanks, > > > > Russ >
