Alanxtl opened a new issue, #3338: URL: https://github.com/apache/dubbo-go/issues/3338
## Background Dubbo-go already supports OpenTelemetry tracing through client/server filters, propagators, samplers, and multiple exporters such as stdout, Jaeger, Zipkin, OTLP HTTP, and OTLP gRPC. The current tracing path is usable, but production troubleshooting needs richer span names, attributes, events, and error information. ## Goals Enhance OpenTelemetry tracing so Dubbo-go traces are not only connected end-to-end, but also useful for diagnosing latency, routing, retry, timeout, serialization, registry, and downstream failure problems. ## Proposed Scope ### 1. Standardize span names and semantic attributes Review and standardize client/server span naming. Candidate format: - Consumer span: `dubbo.consumer <service>/<method>` - Provider span: `dubbo.provider <service>/<method>` Review and enrich span attributes, including: - `rpc.system=apache_dubbo` - `rpc.service` - `rpc.method` - `rpc.grpc.status_code` or Dubbo-specific equivalent where applicable - `network.peer.address` / `server.address` where available - Dubbo-specific attributes such as side, protocol, group, version, and error type The final attribute names should align with OpenTelemetry semantic conventions where available and use a stable `dubbo.*` namespace for Dubbo-specific data. ### 2. Add diagnostic span events Add span events for key runtime decisions and failure points where the information is available: - Retry attempts and final retry result. - Load-balance selection result. - Router or tag-route decision. - Timeout and cancellation. - Rate limit or circuit-breaking rejection. - Serialization/codec failure. - Registry lookup or provider-empty cases. These events should be low-noise and should not expose sensitive request payloads. ### 3. Improve error recording Current spans set error status from `result.Error()`. This can be made more useful by: - Recording a stable error type/category, reusing the RPC metrics error taxonomy where possible. - Adding error code attributes for Triple/gRPC and Dubbo protocol errors. - Avoiding lossy conversion of structured errors into plain strings where a typed error is available. - Ensuring span status, span events, and logs can be correlated. ### 4. Clarify propagation behavior Document and test propagation behavior for: - W3C trace context. - B3 propagation. - Dubbo attachments used as carriers. - Interaction with business attachments and baggage. - Consumer -> provider propagation across Triple and Dubbo protocol paths where supported. ### 5. Improve trace/log/metric correlation Coordinate tracing with existing observability features: - Correlate with `CtxLogger` trace fields (`trace_id`, `span_id`, `trace_flags`). - Reuse metrics error classification where applicable. - Document how a user moves from a Grafana panel to a trace and then to related logs. ## Acceptance Criteria - Client and server spans have documented, stable names and attributes. - Span events exist for at least retry/timeout/rejection/codec or equivalent diagnostic points where supported. - Error spans include stable category/code attributes in addition to human-readable messages. - Propagation behavior is covered by tests for supported propagators. - Documentation and samples explain how tracing connects with metrics and context-aware logging. ## Related Context - Existing OpenTelemetry implementation: `otel/trace/*`, `filter/otel/trace/*`, `config/otel_config.go` - Logger trace correlation: https://github.com/apache/dubbo-go/pull/3195 - Logger sample: https://github.com/apache/dubbo-go-samples/pull/1030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
