Hello Brian, Bryan, Greg, NiFi devs, Integrating OpenTelemetry is a very good idea, especially since the major cloud providers also rely on it. This could also be interesting for Stateless NiFi.
I have a suggestion that I would like to put up for discussion. Would it be useful to make a list of what extensions or new development would be helpful for a complete integration of OpenTelemetry? I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently these can do max. MQTT version 3.11, but since version 5 the User Properties exist, which are similar to the HTTP header fields. Thus one could implement OpenTelemetry in the MQTT processors similarly as in HTTP. With a list we could make an overview of the "necessary" adjustments and advertise for support. If what I write is nonsense, then I may not have understood something and I take it all back :) Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer > Am 29.07.2022 um 05:09 schrieb Brian Putt <puttbr...@gmail.com>: > > Hello Bryan / Greg / NiFi devs, > > Distributed tracing (DT) is similar to provenance in that it shows the path > a particular flowfile travels, but its core selling point is that it > supports tracing across multiple systems/services regardless of what's > receiving the data. Provenance is a fantastic feature and there are > instances where one might want to draw that bigger picture of identifying > bottlenecks as data flows from one system to another and that system > may/may not be using NiFi. > > DT utilizes three ids: traceId, parentId, and spanId. While a tree can be > built using two ids, the third id (traceId) helps bring all of the relevant > information out of a datastore more easily. > DT is focused more on performance and identifying bottlenecks in one or > more systems. Imagine if NiFi were receiving data from various sources > (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP, Kafka, > NiFi). > DT provides a spec that we'd be able to follow and correlate the data as it > traverses from system to system. Each system that participates in the DT > ecosystem would simply emit information (a trace is made up of one or more > spans) and there'd be a collection system which would aggregate all of > these spans and would draw a bigger picture of the path that data went > through and could help identify key bottlenecks. > > OpenTelemetry (OTEL) provides clients (across many languages, including > java) where developers can instrument their library's APIs and participate > in a DT ecosystem as it adheres to the tracing spec. Egressing trace data > is possible without using OTEL, but then we may find ourselves having to > recreate the wheel, but could be optimized for NiFi. > > Creating a reporting task could certainly be a path, mainly have a few > concerns with that: > > 1. If provenance is disabled, will provenance events still be emitted and > be collected by a new reporting task? > 2. There'll be an impact on performance, how much is unknown. OTEL is > gaining traction across industry and there are ways to mitigate > performance, mainly sampling and the fact that *tracing is best effort*. > Spans would be emitted from NiFi via UDP to a collector on the same network > 3. Would there be any issues with appending a flowfile attribute that is > carried throughout the flow where it maintains the traceId, parentSpanId, > and trace flags? See below for more details > > There's a W3C spec (Trace context) which includes a formatted string that > would be propagated to services (HTTP, Kafka, etc...). So if NiFi were to > put information onto kafka, any consumers of that data would be able to > continue the trace and help draw the bigger picture. > > W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header > > For #2, since DT is focused on performance, sampling can help alleviate > chatter over the wire and ideally, 0.01% would draw the same picture as 1% > or 10%+. This is certainly different from provenance as DT is focused on > performance over quality of the data and should not be thought of as > auditing. > https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler > >> On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende <bbe...@gmail.com> wrote: >> >> Hi Greg, >> >> I don't really know anything about OpenTelemetry, but from the >> perspective of integrating something into the framework, some things >> to consider... >> >> Is there some way to piggy-back on provenance and use a ReportingTask >> to process provenance events and report something to OpenTelemetry? >> >> If something new does need to be added, it should probably be an >> extension point where there is an interface in the framework-api and >> different implementations can be plugged in. >> Ideally the framework itself wouldn't have any knowledge of >> OpenTelemetry specifically, it would only be reporting some >> information, which could then be used in some way by the OpenTelemetry >> implementation. >> >> How does NiFi actually communicate with OpenTelemetry? Are you >> expecting to send data to OpenTelemetry in this new method you are >> suggesting? >> That would likely have a significant impact on the performance of the flow. >> >> Thanks, >> >> Bryan >> >>> On Thu, Jul 28, 2022 at 3:17 PM glma...@uwe.nsa.gov <glma...@uwe.nsa.gov> >>> wrote: >>> >>> Nifi Devs, >>> >>> My team and I are looking for guidance on how we can extend Apache >> Nifi's capabilities. Specifically we're looking to include distributed >> tracing. We'll approach this effort as if we're the tracing experts and >> simply seeking implementation guidance. Our developers have good exposure >> to working with Nifi and creating custom processors. We plan to fork the >> project to begin this effort but want to make sure we approach this with >> the best possible direction for community adoption. >>> >>> Our initial thoughts on this approach would be to piggyback on how >> Provenance was implemented. We essentially want to include a subroutine or >> method that gets implicitly invoked upon a processors 'onTrigger' method. >> From there we would analyze the FlowFiles attributes to check for the >> existence of 'traceId' and/or propagate one if found. >>> >>> We can expound upon all of these tracing/observability details if that >> helps by any means. We're able to provide more detailed scope of this task >> as well but for now we just want to get feed back for our overall goal and >> proposed approach. >>> >>> Thanks, >>> Greg Marshall >>