Hello Bryan / Greg / NiFi devs, Distributed tracing (DT) is similar to provenance in that it shows the path a particular flowfile travels, but its core selling point is that it supports tracing across multiple systems/services regardless of what's receiving the data. Provenance is a fantastic feature and there are instances where one might want to draw that bigger picture of identifying bottlenecks as data flows from one system to another and that system may/may not be using NiFi.
DT utilizes three ids: traceId, parentId, and spanId. While a tree can be built using two ids, the third id (traceId) helps bring all of the relevant information out of a datastore more easily. DT is focused more on performance and identifying bottlenecks in one or more systems. Imagine if NiFi were receiving data from various sources (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP, Kafka, NiFi). DT provides a spec that we'd be able to follow and correlate the data as it traverses from system to system. Each system that participates in the DT ecosystem would simply emit information (a trace is made up of one or more spans) and there'd be a collection system which would aggregate all of these spans and would draw a bigger picture of the path that data went through and could help identify key bottlenecks. OpenTelemetry (OTEL) provides clients (across many languages, including java) where developers can instrument their library's APIs and participate in a DT ecosystem as it adheres to the tracing spec. Egressing trace data is possible without using OTEL, but then we may find ourselves having to recreate the wheel, but could be optimized for NiFi. Creating a reporting task could certainly be a path, mainly have a few concerns with that: 1. If provenance is disabled, will provenance events still be emitted and be collected by a new reporting task? 2. There'll be an impact on performance, how much is unknown. OTEL is gaining traction across industry and there are ways to mitigate performance, mainly sampling and the fact that *tracing is best effort*. Spans would be emitted from NiFi via UDP to a collector on the same network 3. Would there be any issues with appending a flowfile attribute that is carried throughout the flow where it maintains the traceId, parentSpanId, and trace flags? See below for more details There's a W3C spec (Trace context) which includes a formatted string that would be propagated to services (HTTP, Kafka, etc...). So if NiFi were to put information onto kafka, any consumers of that data would be able to continue the trace and help draw the bigger picture. W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header For #2, since DT is focused on performance, sampling can help alleviate chatter over the wire and ideally, 0.01% would draw the same picture as 1% or 10%+. This is certainly different from provenance as DT is focused on performance over quality of the data and should not be thought of as auditing. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende <bbe...@gmail.com> wrote: > Hi Greg, > > I don't really know anything about OpenTelemetry, but from the > perspective of integrating something into the framework, some things > to consider... > > Is there some way to piggy-back on provenance and use a ReportingTask > to process provenance events and report something to OpenTelemetry? > > If something new does need to be added, it should probably be an > extension point where there is an interface in the framework-api and > different implementations can be plugged in. > Ideally the framework itself wouldn't have any knowledge of > OpenTelemetry specifically, it would only be reporting some > information, which could then be used in some way by the OpenTelemetry > implementation. > > How does NiFi actually communicate with OpenTelemetry? Are you > expecting to send data to OpenTelemetry in this new method you are > suggesting? > That would likely have a significant impact on the performance of the flow. > > Thanks, > > Bryan > > On Thu, Jul 28, 2022 at 3:17 PM glma...@uwe.nsa.gov <glma...@uwe.nsa.gov> > wrote: > > > > Nifi Devs, > > > > My team and I are looking for guidance on how we can extend Apache > Nifi's capabilities. Specifically we're looking to include distributed > tracing. We'll approach this effort as if we're the tracing experts and > simply seeking implementation guidance. Our developers have good exposure > to working with Nifi and creating custom processors. We plan to fork the > project to begin this effort but want to make sure we approach this with > the best possible direction for community adoption. > > > > Our initial thoughts on this approach would be to piggyback on how > Provenance was implemented. We essentially want to include a subroutine or > method that gets implicitly invoked upon a processors 'onTrigger' method. > From there we would analyze the FlowFiles attributes to check for the > existence of 'traceId' and/or propagate one if found. > > > > We can expound upon all of these tracing/observability details if that > helps by any means. We're able to provide more detailed scope of this task > as well but for now we just want to get feed back for our overall goal and > proposed approach. > > > > Thanks, > > Greg Marshall >