wu-sheng commented on issue #13634:
URL: https://github.com/apache/skywalking/issues/13634#issuecomment-3707851552

   ## đź§­ Additional Proposal: Metrics-Driven Sampling and Caching Design
   
   ### **Design Intent**
   To improve the precision and adaptability of trace sampling, BanyanDB should 
enable the trace pipeline to make decisions based on live system and service 
performance metrics.  
   By aligning sampling logic with real-time latency, error characteristics, 
and resource utilization, the system can intelligently balance observability 
quality and data volume.  
   
   This approach moves the system from rigid, threshold-based sampling toward a 
**context-aware, self-adjusting mechanism** that continuously reflects the 
actual operating conditions of distributed systems.
   
   ---
   
   ### **Core Design Tenets**
   
   1. **Metrics-Centric Sampling Logic**  
      Introduce a metrics-aware decision layer within the trace pipeline. 
Sampling will factor in telemetry data such as latency percentiles, throughput, 
and failure ratios.  
      This enables the database to prioritize traces that highlight critical 
performance states or anomalies rather than relying on static numeric 
thresholds.
   
   2. **Unified Metrics Access Layer**  
      Incorporate a unified layer for reading aggregated metrics that can serve 
multiple purposes within the pipeline.  
      This layer provides a consistent interface to access metric data, whether 
it originates from BanyanDB’s internal metric storage or external observability 
systems, ensuring flexibility and coherence in decision-making.
   
   3. **Caching**  
      Introduce a caching mechanism above the metrics access layer to store 
recently used or recurrently needed metric values. This is different from the 
flagged
      The cache will maintain lightweight persistence across trace-processing 
batches, improving performance and stability while ensuring the data remains 
sufficiently fresh for high-throughput, streaming environments.
   
   4. **Adaptive Sampling Policy**  
      Shift from static threshold-based policies to **context-aware 
evaluators** that can respond to live performance data.  
      For instance, the policy might choose to:
      - Retain traces contributing to current P95 or P99 latency outliers.
      - Emphasize the slowest or most error-prone API endpoints.
   
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to