[ 
https://issues.apache.org/jira/browse/YUNIKORN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200663#comment-17200663
 ] 

Weihao Zheng commented on YUNIKORN-387:
---------------------------------------

Hi [~wilfreds] and [~adam.antal]! Thanks for your further comments.

We agree that we don't need to implement the UI and REST for showing traces. 
Existing tools are enough for our requirements. Currently we only keep the REST 
to control on-demand tracing.

We must use extra analysis tools to aggregate the traces we collected and get 
metrics results. I think metrics from tracing are more flexible than Prometheus 
metrics because we can change the configuration of analysis tools dynamically, 
such as the query we use to get another aspect of these traces, and get 
aggregation data base on the historical traces we stored. So it can be use in 
some on-demand monitoring situation.

We think tracing the scheduling process will provide much detail for the core 
part so we focus on it in current design. Tracing objects and states in the 
core is also an important topic. We regard it as a natural continuation of 
tracing shim’s objects and states because all these resources’ requests and 
corresponding traces begin from the shim and the way we trace resources in the 
core and the shim will not have many differences. So we don’t mention resource 
tracer in the core.

We can set span’s SamplingPriority to 1 to force jaeger to collect this trace. 
We can develop the on-demand feature based on that. Sampling is conflict with 
the counter metrics. We can use Prometheus to collect these counter metrics if 
we don’t use the const sampler. Sampling is still useful in average metrics or 
metrics to draw distribution graph.

> Use Tracing to Improve YuniKorn's Observability
> -----------------------------------------------
>
>                 Key: YUNIKORN-387
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-387
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler, shim - kubernetes
>            Reporter: Weihao Zheng
>            Priority: Major
>
> We can use existing tracing framework to collect tracing information in a 
> standardized format for scheduling and resource management. It will improve 
> YuniKorn's observability significantly with less work. Here are our design 
> ideas: 
> [https://docs.google.com/document/d/1MKL9SfTH8Pjw6kBM0vRnyv_ctnxBHAz-iuA7Zbux60E/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to