Long ago and along these lines, I’d explored the idea of contributing provenance/lineage as a top-level signal to open telemetry. They supported the idea, but I didn’t have the cycles to see it through at the time.
https://github.com/open-telemetry/opentelemetry-specification/issues/3447 While NiFi does support visualizing basic telemetry within the app, I think most elect to externalize it to standard observability tooling. This was my motive for the idea and perhaps it would be a valid venture here as well. Thanks, Mike On Wed, Mar 11, 2026 at 18:39 Pierre Villard <[email protected]> wrote: > Hi David, > > I think this would be a great improvement for NiFi. I have considered a > similar approach in the past but I didn't have the time to pursue it > further, so I ended up using a reporting task instead. I do think that > having extensibility of the provenance repository would be the best > approach for anything production grade. > > I would be very interested in seeing this move forward and agree that this > would need to follow the NIP process. > > Thanks, > Pierre > > > Le mer. 11 mars 2026 à 18:25, David Young <[email protected]> a > écrit : > > > Hello Team! > > > > I've been working with NiFi for a bit now and am seeing a usage pattern > > within my team that I think could be improved. We have thrown around the > > idea of creating an additional provenance repository implementation that > > would allow the storage and retrieval of `ProveanceEventRecords` in an > > external database / service to support more cloud-centric deployments. > > > > Expanding where NiFi can store provenance would allow the > instance/cluster > > itself to offload the storage and management of provenance events to an > > external tool. e.g. Elasticsearch / Opensearch, Solr, etc. > > > > When targeting cloud based deployments of NiFi's, resource constraints > are > > an important consideration. Externalizing some database-like features > would > > allow more resources to be allocated to data processing tasks. Also, in > the > > event that a container or VM needs to be replaced or scaled down, having > > provenance stored in an external service would still allow other nodes in > > the cluster to access those events. > > > > My goal is to refactor some of the existing implementations within the > > nifi-data-provenance-utils module to decouple them from being > disk-centric. > > To go along with that, I'd like to create some new interfaces that > external > > services could be built against. > > > > In my research and prototyping for this, I've run into several situations > > where, while trying to follow the existing patterns, sub-typing some of > the > > existing things doesn't make sense for an external provider. > > > > I don't yet have any complete implementations due to the amount of work I > > think would be involved. So far my research has primarily been with using > > Elasticsearch as a backing store. > > > > I believe this would rise to the level of requiring a NIP and would like > to > > see how the larger dev team feels about this. > > Thank you! > > > > -- > > -David Y. > > >
