Hi David,

I'd really like this feature as well, especially for clustered nifi that
changes size based on load.

How do you envision dealing with flow file content attached to provenance
records?

Thanks,
Eric Secules


On Wed, Mar 11, 2026, 11:30 AM Mike Hogue <[email protected]> wrote:

> Long ago and along these lines, I’d explored the idea of contributing
> provenance/lineage as a top-level signal to open telemetry.  They supported
> the idea, but I didn’t have the cycles to see it through at the time.
>
> https://github.com/open-telemetry/opentelemetry-specification/issues/3447
>
> While NiFi does support visualizing basic telemetry within the app, I think
> most elect to externalize it to standard observability tooling. This was my
> motive for the idea and perhaps it would be a valid venture here as well.
>
> Thanks,
> Mike
>
> On Wed, Mar 11, 2026 at 18:39 Pierre Villard <[email protected]>
> wrote:
>
> > Hi David,
> >
> > I think this would be a great improvement for NiFi. I have considered a
> > similar approach in the past but I didn't have the time to pursue it
> > further, so I ended up using a reporting task instead. I do think that
> > having extensibility of the provenance repository would be the best
> > approach for anything production grade.
> >
> > I would be very interested in seeing this move forward and agree that
> this
> > would need to follow the NIP process.
> >
> > Thanks,
> > Pierre
> >
> >
> > Le mer. 11 mars 2026 à 18:25, David Young <[email protected]> a
> > écrit :
> >
> > > Hello Team!
> > >
> > > I've been working with NiFi for a bit now and am seeing a usage pattern
> > > within my team that I think could be improved. We have thrown around
> the
> > > idea of creating an additional provenance repository implementation
> that
> > > would allow the storage and retrieval of `ProveanceEventRecords` in an
> > > external database / service to support more cloud-centric deployments.
> > >
> > > Expanding where NiFi can store provenance would allow the
> > instance/cluster
> > > itself to offload the storage and management of provenance events to an
> > > external tool. e.g. Elasticsearch / Opensearch, Solr, etc.
> > >
> > > When targeting cloud based deployments of NiFi's, resource constraints
> > are
> > > an important consideration. Externalizing some database-like features
> > would
> > > allow more resources to be allocated to data processing tasks. Also, in
> > the
> > > event that a container or VM needs to be replaced or scaled down,
> having
> > > provenance stored in an external service would still allow other nodes
> in
> > > the cluster to access those events.
> > >
> > > My goal is to refactor some of the existing implementations within the
> > > nifi-data-provenance-utils module to decouple them from being
> > disk-centric.
> > > To go along with that, I'd like to create some new interfaces that
> > external
> > > services could be built against.
> > >
> > > In my research and prototyping for this, I've run into several
> situations
> > > where, while trying to follow the existing patterns, sub-typing some of
> > the
> > > existing things doesn't make sense for an external provider.
> > >
> > > I don't yet have any complete implementations due to the amount of
> work I
> > > think would be involved. So far my research has primarily been with
> using
> > > Elasticsearch as a backing store.
> > >
> > > I believe this would rise to the level of requiring a NIP and would
> like
> > to
> > > see how the larger dev team feels about this.
> > > Thank you!
> > >
> > > --
> > > -David Y.
> > >
> >
>

Reply via email to