corgy-w opened a new issue, #10305:
URL: https://github.com/apache/seatunnel/issues/10305

   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   ### Description
   
   Currently, SeaTunnel Metrics are generally global. We cannot distinguish 
metrics for a specific subset of data (e.g., sampled traffic, heartbeat 
packets, or specific tenants) without aggressively modifying the 
connector/transform code to add tags manually.
   
   This proposal aims to introduce a **System-Level Traffic Dyeing (Sampling)** 
mechanism. It allows:
   1.  Marking specific `SeaTunnelRow`s as "Sampled" or "Dyed" at the Source.
   2.  Propagating this "Color" context implicitly throughout the execution 
engine (Source -> Transform -> Sink).
   3.  Automatically routing metrics to different counters (e.g., 
`sink_write_count` vs `sink_write_count_sampled`) based on the context, without 
changing existing Connector code.
   
   **Proposed Solution / Architecture:**
   
   1.  **Data Protocol**: Add a `long flags` field to `SeaTunnelRow` (Core Data 
Structure). This uses a bitmask to carry system signals (e.g., `IS_SAMPLED`, 
`IS_HEARTBEAT`) with minimal serialization overhead compared to the existing 
`options` Map.
   2.  **Context Propagation**: Introduce `MetricTraceContext` (based on 
`ThreadLocal`) in `seatunnel-engine`. 
       - When `SeaTunnelSourceCollector` receives a row, it checks the `flags` 
and sets the `MetricTraceContext`.
       - The context is cleared after the row is processed to ensure safety.
       - For async operations, the context must be captured and replayed.
   3.  **Metrics Integration**: Enhance `MetricsContext` (or 
`AbstractMetricsContext`) to support "Context-Aware" or "Routing" metrics. 
       - The Metric object (e.g., Counter) acts as a proxy.
       - It checks `MetricTraceContext` on every update (`inc()`) and routes 
the value to the appropriate underlying counter (e.g., standard vs. sampled).
   
   ### Usage Scenario
   
   1.  **Sampling Observation**: In high-throughput scenarios (e.g., 1M QPS), 
calculating precise latency/success-rate for every record is expensive. Users 
can configure the Source to "sample" 1% of traffic (set the flag). The Metrics 
system will then automatically track metrics for this 1% separately, providing 
health visibility with low overhead.
   2.  **Heartbeat Monitoring**: Distinguish "Heartbeat" rows (synthetic data 
for keep-alive) from actual business data in metrics.
   3.  **Trace/Debug**: "Dye" specific rows to trace their flow and performance 
through the pipeline without mixing them with normal traffic stats.
   
   ### Related issues
   
   None.
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to