[ 
https://issues.apache.org/jira/browse/CAMEL-22385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017623#comment-18017623
 ] 

Pasquale Congiusti commented on CAMEL-22385:
--------------------------------------------

I further update on this, I managed to replicate the problem.

As I commented last time, the issue is related to the third party client used 
by aws2-sqs (and likely any other client instrumented with Otel). These are 
http based calls and are using as parent context the latest one available in 
Camel.  When we're in async mode, the call toward SQS to poll a message (but 
also to list or delete) can go to any other available thread on Camel context.

This is happening also using the older opentelemetry implementation (though, in 
that case I got other operations, ie, delete). Although I understand it's 
bothering situation, this is really not affecting directly the other traces, 
so, beside the noise, the camel traces are consistent.

I'm continuing to work on this to try to find a solution. The ideal situation 
is to move those poll CLI calls outside the regular Camel processing tree, but 
it may be complicated to do that due to the way Camel share threads for 
optimization reasons.

> OpenTelemetry: Unclosed Span Scope with Parallel Multicast
> ----------------------------------------------------------
>
>                 Key: CAMEL-22385
>                 URL: https://issues.apache.org/jira/browse/CAMEL-22385
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-opentelemetry
>    Affects Versions: 4.14.0
>         Environment: Here's my configuration:
> ```
> @Override
> public void configure() {
>   // 1. Consume messages from a starting endpoint
>   from(RouteEndpoints.INBOUND_ENDPOINT.getValue())
>     .routeId("MyBusinessLogicRoute")
>     // 2. Validate the incoming message
>     .choice()
>       // A generic predicate checks for validity
>       .when(header("isValid").isEqualTo(false))
>         .log("Invalid message received. Routing to dead-letter queue.")
>         .to(RouteEndpoints.DLQ_ENDPOINT.getValue()) // Send to Dead Letter 
> Queue
>       .otherwise()
>         // 3. Route the message based on its type
>         .choice()
>           // Check for a specific "type" field in the JSON body
>           .when().jsonpath("$[?(@.messageType == 'Confirmation')]")
>             .to(RouteEndpoints.STORE_ENDPOINT.getValue()) // Store 
> confirmation events
>           .otherwise()
>             // 4. Process other events in parallel
>             .multicast()
>               .parallelProcessing()
>               // Send to two different endpoints simultaneously
>               .to(RouteEndpoints.SEND_ENDPOINT.getValue(),
>                   RouteEndpoints.STORE_ENDPOINT.getValue())
>             .end() // Closes the multicast
>         .end() // Closes the inner choice
>     .end(); // Closes the outer choice
> ```
>            Reporter: Sergi Mola
>            Assignee: Pasquale Congiusti
>            Priority: Major
>              Labels: opentelemetry, telemetry, trace, tracing
>         Attachments: simpler-multicast-otel2.png, 
> with_parallel_processing.png, without_parallel_processing.png
>
>
> I'm encountering a tracing issue with the {{camel-opentelemetry2}} component, 
> specifically when using a {{multicast}} EIP with {{{}parallelProcessing{}}}.
> *Problem:* The root {{Span}} for my SQS-triggered route is failing to close. 
> I believe this is because the {{Scope}} associated with this span is not 
> being properly closed and detached from the thread after the {{multicast}} 
> with {{parallelProcessing}} completes.
> *Observed Behavior:*
>  # *Leaked Trace Context:* Due to the unclosed scope, subsequent, unrelated 
> spans (in this case SQS reads ) are incorrectly attached to the trace of the 
> completed route.
>  # *Cycle Repetition:* This behavior persists until a new message is consumed 
> from the SQS queue, which starts a new trace. However, this new trace then 
> suffers from the same issue.
> *Key Finding:* The tracing behaves as expected (the root span's scope is 
> closed correctly) if {{parallelProcessing}} is removed from the 
> {{{}multicast{}}}.
> This suggests an issue with how the OpenTelemetry tracer handles the thread 
> context lifecycle during asynchronous, parallel processing. I've attached 
> screenshots illustrating the difference.
>  
> See attached screenshots.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to