Hi David, I also think omitting external content storage from this feature is for the best.
Surfacing all provenance events through the UI would add a third state that a provenance event could have for its data: "present but inaccessible". That's in addition to "deleted" and "present". That way a user can know they can get the content if they start up a given node This is reminding me of the AzureLogAnalyticsProvenanceReportingTask. I've considered using it for when we begin nifi clustering at work. A nifi native solution would be my preference because of the seamless experience I would get. -Eric On Wed, Mar 11, 2026, 11:58 AM David Young <[email protected]> wrote: > Hello Eirc, > > Unless I'm mistaken, a provenance record itself doesn't have any content > attached, it's all metadata and attributes. > Now, that's not to say there isn't a linkage that would potentially be > broken if the event were to be retrieved on a different cluster node. > That could be handled with an external content store, but outside the scope > of this particular bit of work. > > On Wed, Mar 11, 2026 at 2:49 PM Eric Secules <[email protected]> wrote: > > > Hi David, > > > > I'd really like this feature as well, especially for clustered nifi that > > changes size based on load. > > > > How do you envision dealing with flow file content attached to provenance > > records? > > > > Thanks, > > Eric Secules > > > > > > On Wed, Mar 11, 2026, 11:30 AM Mike Hogue <[email protected]> wrote: > > > > > Long ago and along these lines, I’d explored the idea of contributing > > > provenance/lineage as a top-level signal to open telemetry. They > > supported > > > the idea, but I didn’t have the cycles to see it through at the time. > > > > > > > > > https://github.com/open-telemetry/opentelemetry-specification/issues/3447 > > > > > > While NiFi does support visualizing basic telemetry within the app, I > > think > > > most elect to externalize it to standard observability tooling. This > was > > my > > > motive for the idea and perhaps it would be a valid venture here as > well. > > > > > > Thanks, > > > Mike > > > > > > On Wed, Mar 11, 2026 at 18:39 Pierre Villard < > > [email protected]> > > > wrote: > > > > > > > Hi David, > > > > > > > > I think this would be a great improvement for NiFi. I have > considered a > > > > similar approach in the past but I didn't have the time to pursue it > > > > further, so I ended up using a reporting task instead. I do think > that > > > > having extensibility of the provenance repository would be the best > > > > approach for anything production grade. > > > > > > > > I would be very interested in seeing this move forward and agree that > > > this > > > > would need to follow the NIP process. > > > > > > > > Thanks, > > > > Pierre > > > > > > > > > > > > Le mer. 11 mars 2026 à 18:25, David Young <[email protected]> > a > > > > écrit : > > > > > > > > > Hello Team! > > > > > > > > > > I've been working with NiFi for a bit now and am seeing a usage > > pattern > > > > > within my team that I think could be improved. We have thrown > around > > > the > > > > > idea of creating an additional provenance repository implementation > > > that > > > > > would allow the storage and retrieval of `ProveanceEventRecords` in > > an > > > > > external database / service to support more cloud-centric > > deployments. > > > > > > > > > > Expanding where NiFi can store provenance would allow the > > > > instance/cluster > > > > > itself to offload the storage and management of provenance events > to > > an > > > > > external tool. e.g. Elasticsearch / Opensearch, Solr, etc. > > > > > > > > > > When targeting cloud based deployments of NiFi's, resource > > constraints > > > > are > > > > > an important consideration. Externalizing some database-like > features > > > > would > > > > > allow more resources to be allocated to data processing tasks. > Also, > > in > > > > the > > > > > event that a container or VM needs to be replaced or scaled down, > > > having > > > > > provenance stored in an external service would still allow other > > nodes > > > in > > > > > the cluster to access those events. > > > > > > > > > > My goal is to refactor some of the existing implementations within > > the > > > > > nifi-data-provenance-utils module to decouple them from being > > > > disk-centric. > > > > > To go along with that, I'd like to create some new interfaces that > > > > external > > > > > services could be built against. > > > > > > > > > > In my research and prototyping for this, I've run into several > > > situations > > > > > where, while trying to follow the existing patterns, sub-typing > some > > of > > > > the > > > > > existing things doesn't make sense for an external provider. > > > > > > > > > > I don't yet have any complete implementations due to the amount of > > > work I > > > > > think would be involved. So far my research has primarily been with > > > using > > > > > Elasticsearch as a backing store. > > > > > > > > > > I believe this would rise to the level of requiring a NIP and would > > > like > > > > to > > > > > see how the larger dev team feels about this. > > > > > Thank you! > > > > > > > > > > -- > > > > > -David Y. > > > > > > > > > > > > > > > > > -- > -David >
