corgy-w opened a new issue, #10305: URL: https://github.com/apache/seatunnel/issues/10305
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description Currently, SeaTunnel Metrics are generally global. We cannot distinguish metrics for a specific subset of data (e.g., sampled traffic, heartbeat packets, or specific tenants) without aggressively modifying the connector/transform code to add tags manually. This proposal aims to introduce a **System-Level Traffic Dyeing (Sampling)** mechanism. It allows: 1. Marking specific `SeaTunnelRow`s as "Sampled" or "Dyed" at the Source. 2. Propagating this "Color" context implicitly throughout the execution engine (Source -> Transform -> Sink). 3. Automatically routing metrics to different counters (e.g., `sink_write_count` vs `sink_write_count_sampled`) based on the context, without changing existing Connector code. **Proposed Solution / Architecture:** 1. **Data Protocol**: Add a `long flags` field to `SeaTunnelRow` (Core Data Structure). This uses a bitmask to carry system signals (e.g., `IS_SAMPLED`, `IS_HEARTBEAT`) with minimal serialization overhead compared to the existing `options` Map. 2. **Context Propagation**: Introduce `MetricTraceContext` (based on `ThreadLocal`) in `seatunnel-engine`. - When `SeaTunnelSourceCollector` receives a row, it checks the `flags` and sets the `MetricTraceContext`. - The context is cleared after the row is processed to ensure safety. - For async operations, the context must be captured and replayed. 3. **Metrics Integration**: Enhance `MetricsContext` (or `AbstractMetricsContext`) to support "Context-Aware" or "Routing" metrics. - The Metric object (e.g., Counter) acts as a proxy. - It checks `MetricTraceContext` on every update (`inc()`) and routes the value to the appropriate underlying counter (e.g., standard vs. sampled). ### Usage Scenario 1. **Sampling Observation**: In high-throughput scenarios (e.g., 1M QPS), calculating precise latency/success-rate for every record is expensive. Users can configure the Source to "sample" 1% of traffic (set the flag). The Metrics system will then automatically track metrics for this 1% separately, providing health visibility with low overhead. 2. **Heartbeat Monitoring**: Distinguish "Heartbeat" rows (synthetic data for keep-alive) from actual business data in metrics. 3. **Trace/Debug**: "Dye" specific rows to trace their flow and performance through the pipeline without mixing them with normal traffic stats. ### Related issues None. ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
