Re: Custom-processor configuration suggestions
``` ┌──┐ │ │ │ processor instance │┐ │ ││ └──┘│ .─. │ ,─' '─. │ ,' `. │ ╱ ╲ │ ╱ ╲ │ ; : │ ┌─┐ │ configuration │ │ │ │ ┌─▶│authority│ │ │ Configruation service │ │ : ; ├┬───▶│ │──┘ ╲ ╱ ┌──┐│││ │ ╲ ╱ │ │││ └─┘╲ ╱ │ processor instance │┘│ `. ,' │ │ │ '─. ,─' └──┘ │ `───' │ │ │ │ │ │ │ ┌──┐│ │ ││ │ processor instance │┤ │ ││ └──┘│ │ │ │ │ ┌──┐│ │ ││ │ processor instance │┘ │ │ └──┘ ``` You can also make a shared service component that loads the configurations by some means and serves them to the processors. The service can get the configurations however makes sense for you ( from REST API like Joe is saying, to reading from disk or something ). ``` On September 27, 2023 at 4:04:56 PM, Joe Witt (joe.w...@gmail.com) wrote: Russ It sounds like what you have is a case of significant reference data you need made available to various instances of this processor that knows how to use that reference state to do its function. This is similar to cases like IP geo enrichment where the dataset on which you'd make the decision is larger and more importantly subject to change over time. In such cases the ideal state is: (A) The reference dataset(s) is hosted at a RESTful endpoint and can be periodically pulled and stored some place local/easily accessible. (B) The processor knows where to look for this reference dataset download and is able to hot reload it on the fly to include understanding that the needed datasets might not yet be made available and it should yield until it sees them and loads them. Thanks On Wed, Sep 27, 2023 at 11:51 AM Russell Bateman wrote: > I'm posting this plea for suggestions as I'm short on imagination here. > > We have some custom processors that need extraordinary amounts of > configuration of the sort a flow writer would have to copy and paste > in--huge amounts of Yaml, regular expressions, etc. This is what our > flow writers are already doing. It would be easier to insert a filename > or -path, but... > > ...asking a custom processor to perform filesystem I/O is icky because > of unpredictable filesystem access post installation. Thinking about how > installation is beyond my control, I don't want to make installation > messy, etc. Containers, Kubernetes deployment, etc. complicate this. > > I thought of wiring /GetFile/ to a subdirectory (problematic, but less > so?) and accepting files as input to pass on to needy processors who > would recognize, adopt and incorporate configuration based on > higher-level and simpler cues posted by flow writers as property values. > > Assuming you both grok and are interested in what I'm asking, do you > have thoughts, cautionary statements or even cat-calls to
Re: Custom-processor configuration suggestions
Russ It sounds like what you have is a case of significant reference data you need made available to various instances of this processor that knows how to use that reference state to do its function. This is similar to cases like IP geo enrichment where the dataset on which you'd make the decision is larger and more importantly subject to change over time. In such cases the ideal state is: (A) The reference dataset(s) is hosted at a RESTful endpoint and can be periodically pulled and stored some place local/easily accessible. (B) The processor knows where to look for this reference dataset download and is able to hot reload it on the fly to include understanding that the needed datasets might not yet be made available and it should yield until it sees them and loads them. Thanks On Wed, Sep 27, 2023 at 11:51 AM Russell Bateman wrote: > I'm posting this plea for suggestions as I'm short on imagination here. > > We have some custom processors that need extraordinary amounts of > configuration of the sort a flow writer would have to copy and paste > in--huge amounts of Yaml, regular expressions, etc. This is what our > flow writers are already doing. It would be easier to insert a filename > or -path, but... > > ...asking a custom processor to perform filesystem I/O is icky because > of unpredictable filesystem access post installation. Thinking about how > installation is beyond my control, I don't want to make installation > messy, etc. Containers, Kubernetes deployment, etc. complicate this. > > I thought of wiring /GetFile/ to a subdirectory (problematic, but less > so?) and accepting files as input to pass on to needy processors who > would recognize, adopt and incorporate configuration based on > higher-level and simpler cues posted by flow writers as property values. > > Assuming you both grok and are interested in what I'm asking, do you > have thoughts, cautionary statements or even cat-calls to offer? Maybe > there are obvious answers I'm just not thinking of. > > Profuse thanks, > > Russ
Custom-processor configuration suggestions
I'm posting this plea for suggestions as I'm short on imagination here. We have some custom processors that need extraordinary amounts of configuration of the sort a flow writer would have to copy and paste in--huge amounts of Yaml, regular expressions, etc. This is what our flow writers are already doing. It would be easier to insert a filename or -path, but... ...asking a custom processor to perform filesystem I/O is icky because of unpredictable filesystem access post installation. Thinking about how installation is beyond my control, I don't want to make installation messy, etc. Containers, Kubernetes deployment, etc. complicate this. I thought of wiring /GetFile/ to a subdirectory (problematic, but less so?) and accepting files as input to pass on to needy processors who would recognize, adopt and incorporate configuration based on higher-level and simpler cues posted by flow writers as property values. Assuming you both grok and are interested in what I'm asking, do you have thoughts, cautionary statements or even cat-calls to offer? Maybe there are obvious answers I'm just not thinking of. Profuse thanks, Russ
Re: Property management - reducing duplication
Hey Bence and team, I'd definitely be in favor of a better approach here. When removing variables, I found myself with the need to update a lot of copies of nifi.properties as well as other configuration files across many places of the codebase. I don't know what is the best option/approach here but having a single source of truth somewhere and being able to reference this everywhere with customization definitely sounds nice. Pierre Le mar. 26 sept. 2023 à 09:19, Simon Bence a écrit : > Hi Team, > > I was touching some test related code in the other day and it brought to > my attention how much partly duplicated nifi.properties files we do have in > the project in various places. > > While I was searching for the value of a given property in these files, it > got me thinking that when a property is changing (for example related to > the 2.x efforts) or added, it indicates changes in multiple places, which > could lead to oversights and inconsistencies. Additionally, it seems to me > that duplicating whole configuration files might make one reluctant to > create specific properties files for specific tests like in case of the > system tests. > > I would like to propose a discussion about this, being curious if the > community sees any value in improving the configuration management. My > initial thoughts is to maintain one single “source of truth” properties > file and providing some kind of utility, which could generate instances as > needed allowing to override or extend properties when necessary. > > Looking forward to your insights and suggestions. > > Regards, > Bence