Related & complementary to Brian's tracing work would be modeling provenance/lineage in OTEL. I chatted with some OTEL maintainers about modeling provenance natively as a top-level Signal [1] at KubeCon and they advised raising an issue in the opentelemetry-specification repo for discussion: https://github.com/open-telemetry/opentelemetry-specification/issues/3447
Their in-person response was largely that OTEL events could be used to model provenance, but I argued that the model should be standardized for use across tooling. This way, the provenance model wouldn't need to be maintained separately. My original thought was that we could have a standard provenance model in OTEL and shift NiFi to emit these instead of its own provenance events. Then we can tie provenance together across heterogeneous tooling using log visualization through something like Loki & Grafana, but there are many options in this space. Thanks, Mike [1] https://opentelemetry.io/docs/concepts/signals/ On Tue, May 23, 2023 at 3:56 AM Brian Putt <puttbr...@gmail.com> wrote: > Hello Joe / All, > > Jaeger or Grafana (w/ tempo) offer comparable tools to visualize the trace > data. I believe additional tools will be needed to get the most out of the > trace data. We've been experimenting with a number of open source products > to see what works best for the amount of trace data that NiFi emits. So > far, Grafana Tempo, Victoria Metrics, and Clickhouse seem to offer a good > set of features to cover searching / viewing the traces along with > summarizing certain flowfile attributes. As long as the trace data is in > OTEL's format, the collector offers flexibility in exporting the data to a > number of services with ease. > > I would expect a PR to OTEL's java auto instrumentation project over the > next few months that adds NiFi to its list of instrumentations. If the NiFi > committers would like a demo / tech exchange to go over the current state > of the tracing agent, we'd be happy to accommodate. As it stands, the agent > utilizes flowfile attributes to pass along the tracestate so trace > propagation can occur across NiFi to NiFi boundaries. > > Thanks, > > Brian > > On Wed, May 17, 2023 at 1:05 PM Joe Witt <joe.w...@gmail.com> wrote: > > > Brian Putt, All > > > > Are you aware of any good tools/services that can ingest the traces and > > provide an interesting view/story/reporting on it? > > > > I could see us emitting otel events instead of our current provenance > > mechanism and using that both internally to do what we already do but > also > > have a clear/spec friendly way of exporting it to others. > > > > Thanks > > > > On Sat, Jul 30, 2022 at 7:43 AM u...@moosheimer.com <u...@moosheimer.com> > > wrote: > > > > > Hello Brian, Bryan, Greg, NiFi devs, > > > > > > Integrating OpenTelemetry is a very good idea, especially since the > major > > > cloud providers also rely on it. This could also be interesting for > > > Stateless NiFi. > > > > > > I have a suggestion that I would like to put up for discussion. > > > > > > Would it be useful to make a list of what extensions or new development > > > would be helpful for a complete integration of OpenTelemetry? > > > > > > I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently > these > > > can do max. MQTT version 3.11, but since version 5 the User Properties > > > exist, which are similar to the HTTP header fields. > > > Thus one could implement OpenTelemetry in the MQTT processors similarly > > as > > > in HTTP. > > > > > > With a list we could make an overview of the "necessary" adjustments > and > > > advertise for support. > > > > > > If what I write is nonsense, then I may not have understood something > and > > > I take it all back :) > > > > > > Mit freundlichen Grüßen / best regards > > > Kay-Uwe Moosheimer > > > > > > > Am 29.07.2022 um 05:09 schrieb Brian Putt <puttbr...@gmail.com>: > > > > > > > > Hello Bryan / Greg / NiFi devs, > > > > > > > > Distributed tracing (DT) is similar to provenance in that it shows > the > > > path > > > > a particular flowfile travels, but its core selling point is that it > > > > supports tracing across multiple systems/services regardless of > what's > > > > receiving the data. Provenance is a fantastic feature and there are > > > > instances where one might want to draw that bigger picture of > > identifying > > > > bottlenecks as data flows from one system to another and that system > > > > may/may not be using NiFi. > > > > > > > > DT utilizes three ids: traceId, parentId, and spanId. While a tree > can > > be > > > > built using two ids, the third id (traceId) helps bring all of the > > > relevant > > > > information out of a datastore more easily. > > > > DT is focused more on performance and identifying bottlenecks in one > or > > > > more systems. Imagine if NiFi were receiving data from various > sources > > > > (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP, > > Kafka, > > > > NiFi). > > > > DT provides a spec that we'd be able to follow and correlate the data > > as > > > it > > > > traverses from system to system. Each system that participates in the > > DT > > > > ecosystem would simply emit information (a trace is made up of one or > > > more > > > > spans) and there'd be a collection system which would aggregate all > of > > > > these spans and would draw a bigger picture of the path that data > went > > > > through and could help identify key bottlenecks. > > > > > > > > OpenTelemetry (OTEL) provides clients (across many languages, > including > > > > java) where developers can instrument their library's APIs and > > > participate > > > > in a DT ecosystem as it adheres to the tracing spec. Egressing trace > > data > > > > is possible without using OTEL, but then we may find ourselves having > > to > > > > recreate the wheel, but could be optimized for NiFi. > > > > > > > > Creating a reporting task could certainly be a path, mainly have a > few > > > > concerns with that: > > > > > > > > 1. If provenance is disabled, will provenance events still be emitted > > and > > > > be collected by a new reporting task? > > > > 2. There'll be an impact on performance, how much is unknown. OTEL is > > > > gaining traction across industry and there are ways to mitigate > > > > performance, mainly sampling and the fact that *tracing is best > > effort*. > > > > Spans would be emitted from NiFi via UDP to a collector on the same > > > network > > > > 3. Would there be any issues with appending a flowfile attribute that > > is > > > > carried throughout the flow where it maintains the traceId, > > parentSpanId, > > > > and trace flags? See below for more details > > > > > > > > There's a W3C spec (Trace context) which includes a formatted string > > that > > > > would be propagated to services (HTTP, Kafka, etc...). So if NiFi > were > > to > > > > put information onto kafka, any consumers of that data would be able > to > > > > continue the trace and help draw the bigger picture. > > > > > > > > W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header > > > > > > > > For #2, since DT is focused on performance, sampling can help > alleviate > > > > chatter over the wire and ideally, 0.01% would draw the same picture > as > > > 1% > > > > or 10%+. This is certainly different from provenance as DT is focused > > on > > > > performance over quality of the data and should not be thought of as > > > > auditing. > > > > > > > > > > https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler > > > > > > > >> On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende <bbe...@gmail.com> > wrote: > > > >> > > > >> Hi Greg, > > > >> > > > >> I don't really know anything about OpenTelemetry, but from the > > > >> perspective of integrating something into the framework, some things > > > >> to consider... > > > >> > > > >> Is there some way to piggy-back on provenance and use a > ReportingTask > > > >> to process provenance events and report something to OpenTelemetry? > > > >> > > > >> If something new does need to be added, it should probably be an > > > >> extension point where there is an interface in the framework-api and > > > >> different implementations can be plugged in. > > > >> Ideally the framework itself wouldn't have any knowledge of > > > >> OpenTelemetry specifically, it would only be reporting some > > > >> information, which could then be used in some way by the > OpenTelemetry > > > >> implementation. > > > >> > > > >> How does NiFi actually communicate with OpenTelemetry? Are you > > > >> expecting to send data to OpenTelemetry in this new method you are > > > >> suggesting? > > > >> That would likely have a significant impact on the performance of > the > > > flow. > > > >> > > > >> Thanks, > > > >> > > > >> Bryan > > > >> > > > >>> On Thu, Jul 28, 2022 at 3:17 PM glma...@uwe.nsa.gov < > > > glma...@uwe.nsa.gov> > > > >>> wrote: > > > >>> > > > >>> Nifi Devs, > > > >>> > > > >>> My team and I are looking for guidance on how we can extend Apache > > > >> Nifi's capabilities. Specifically we're looking to include > distributed > > > >> tracing. We'll approach this effort as if we're the tracing experts > > and > > > >> simply seeking implementation guidance. Our developers have good > > > exposure > > > >> to working with Nifi and creating custom processors. We plan to fork > > the > > > >> project to begin this effort but want to make sure we approach this > > with > > > >> the best possible direction for community adoption. > > > >>> > > > >>> Our initial thoughts on this approach would be to piggyback on how > > > >> Provenance was implemented. We essentially want to include a > > subroutine > > > or > > > >> method that gets implicitly invoked upon a processors 'onTrigger' > > > method. > > > >> From there we would analyze the FlowFiles attributes to check for > the > > > >> existence of 'traceId' and/or propagate one if found. > > > >>> > > > >>> We can expound upon all of these tracing/observability details if > > that > > > >> helps by any means. We're able to provide more detailed scope of > this > > > task > > > >> as well but for now we just want to get feed back for our overall > goal > > > and > > > >> proposed approach. > > > >>> > > > >>> Thanks, > > > >>> Greg Marshall > > > >> > > > > > > > > >