Hello Brian, Bryan, Greg, NiFi devs,

Integrating OpenTelemetry is a very good idea, especially since the major cloud 
providers also rely on it. This could also be interesting for Stateless NiFi.

I have a suggestion that I would like to put up for discussion.

Would it be useful to make a list of what extensions or new development would 
be helpful for a complete integration of OpenTelemetry?

I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently these can 
do max. MQTT version 3.11, but since version 5 the User Properties exist, which 
are similar to the HTTP header fields.
Thus one could implement OpenTelemetry in the MQTT processors similarly as in 
HTTP.

With a list we could make an overview of the "necessary" adjustments and 
advertise for support.

If what I write is nonsense, then I may not have understood something and I 
take it all back :)

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 29.07.2022 um 05:09 schrieb Brian Putt <puttbr...@gmail.com>:
> 
> Hello Bryan / Greg / NiFi devs,
> 
> Distributed tracing (DT) is similar to provenance in that it shows the path
> a particular flowfile travels, but its core selling point is that it
> supports tracing across multiple systems/services regardless of what's
> receiving the data. Provenance is a fantastic feature and there are
> instances where one might want to draw that bigger picture of identifying
> bottlenecks as data flows from one system to another and that system
> may/may not be using NiFi.
> 
> DT utilizes three ids: traceId, parentId, and spanId. While a tree can be
> built using two ids, the third id (traceId) helps bring all of the relevant
> information out of a datastore more easily.
> DT is focused more on performance and identifying bottlenecks in one or
> more systems. Imagine if NiFi were receiving data from various sources
> (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP, Kafka,
> NiFi).
> DT provides a spec that we'd be able to follow and correlate the data as it
> traverses from system to system. Each system that participates in the DT
> ecosystem would simply emit information (a trace is made up of one or more
> spans) and there'd be a collection system which would aggregate all of
> these spans and would draw a bigger picture of the path that data went
> through and could help identify key bottlenecks.
> 
> OpenTelemetry (OTEL) provides clients (across many languages, including
> java) where developers can instrument their library's APIs and participate
> in a DT ecosystem as it adheres to the tracing spec. Egressing trace data
> is possible without using OTEL, but then we may find ourselves having to
> recreate the wheel, but could be optimized for NiFi.
> 
> Creating a reporting task could certainly be a path, mainly have a few
> concerns with that:
> 
> 1. If provenance is disabled, will provenance events still be emitted and
> be collected by a new reporting task?
> 2. There'll be an impact on performance, how much is unknown. OTEL is
> gaining traction across industry and there are ways to mitigate
> performance, mainly sampling and the fact that *tracing is best effort*.
> Spans would be emitted from NiFi via UDP to a collector on the same network
> 3. Would there be any issues with appending a flowfile attribute that is
> carried throughout the flow where it maintains the traceId, parentSpanId,
> and trace flags? See below for more details
> 
> There's a W3C spec (Trace context) which includes a formatted string that
> would be propagated to services (HTTP, Kafka, etc...). So if NiFi were to
> put information onto kafka, any consumers of that data would be able to
> continue the trace and help draw the bigger picture.
> 
> W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header
> 
> For #2, since DT is focused on performance, sampling can help alleviate
> chatter over the wire and ideally, 0.01% would draw the same picture as 1%
> or 10%+. This is certainly different from provenance as DT is focused on
> performance over quality of the data and should not be thought of as
> auditing.
> https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler
> 
>> On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende <bbe...@gmail.com> wrote:
>> 
>> Hi Greg,
>> 
>> I don't really know anything about OpenTelemetry, but from the
>> perspective of integrating something into the framework, some things
>> to consider...
>> 
>> Is there some way to piggy-back on provenance and use a ReportingTask
>> to process provenance events and report something to OpenTelemetry?
>> 
>> If something new does need to be added, it should probably be an
>> extension point where there is an interface in the framework-api and
>> different implementations can be plugged in.
>> Ideally the framework itself wouldn't have any knowledge of
>> OpenTelemetry specifically, it would only be reporting some
>> information, which could then be used in some way by the OpenTelemetry
>> implementation.
>> 
>> How does NiFi actually communicate with OpenTelemetry? Are you
>> expecting to send data to OpenTelemetry in this new method you are
>> suggesting?
>> That would likely have a significant impact on the performance of the flow.
>> 
>> Thanks,
>> 
>> Bryan
>> 
>>> On Thu, Jul 28, 2022 at 3:17 PM glma...@uwe.nsa.gov <glma...@uwe.nsa.gov>
>>> wrote:
>>> 
>>> Nifi Devs,
>>> 
>>> My team and I are looking for guidance on how we can extend Apache
>> Nifi's capabilities. Specifically we're looking to include distributed
>> tracing. We'll approach this effort as if we're the tracing experts and
>> simply seeking implementation guidance. Our developers have good exposure
>> to working with Nifi and creating custom processors. We plan to fork the
>> project to begin this effort but want to make sure we approach this with
>> the best possible direction for community adoption.
>>> 
>>> Our initial thoughts on this approach would be to piggyback on how
>> Provenance was implemented. We essentially want to include a subroutine or
>> method that gets implicitly invoked upon a processors 'onTrigger' method.
>> From there we would analyze the FlowFiles attributes to check for the
>> existence of 'traceId' and/or propagate one if found.
>>> 
>>> We can expound upon all of these tracing/observability details if that
>> helps by any means. We're able to provide more detailed scope of this task
>> as well but for now we just want to get feed back for our overall goal and
>> proposed approach.
>>> 
>>> Thanks,
>>> Greg Marshall
>> 

Reply via email to