Re: Observability of actually cpu/memory usage

2021-12-20 Thread Wilfred Spiegelenburg
Hi Bowen,

Maybe a strange question but what do you consider "actually
used" resources? Anything the scheduler sees is used. The scheduler has no
information on what the container really occupies: it asked for 100GB but
it only allocated 50GB etc. If you need that YuniKorn cannot help you. If
it is just a looking at allocation over time YuniKorn is capable of giving
you the information.

Second point to make is that normally applications do not provide any
information on what they expect to use before they use it. Let's take a
Spark application. The driver creates pods as it needs new executors. The
Spark config drives those requests and the limitations. The scheduler only
sees the pods that are really requested. It does not know, and should not
know, if that is limited by what is configured or that the job uses only
part or more than what is configured.

The only time the scheduler would have any idea about a "maximum" is when a
gang request is made. For gang scheduling we can track if the gang
request is completely used or not. We could add metrics for it on an
application. We can also track the number of containers allocated for an
application or queue, the time from start to finish for containers etc. We
could even track the maximum resource allocation for an application or a
queue over a time interval. Prometheus should give us a number of
possibilities, we just need to hook them into the scheduling cycle.

As far as I know we currently do not have application metrics but that can
always be added. Some queue metrics are there already. I think one of those
is what you are looking for to fill a number of the gaps that you see. I
have added YUNIKORN-829 as a subtask to YUNIKORN-720 [1] which is already
referencing a number of metrics to improve. With the release of v0.12.1 I
moved that jira to v1.0.0. A major improvement to the metrics would be a
nice addition for the v1.0.0.

I would not see anything that is blocking enhancing metrics: it is a part
that can be improved without a major impact on other functionality. We do
need to make sure that we measure the impact on performance and memory
usage.

Wilfred

[1] https://issues.apache.org/jira/browse/YUNIKORN-720

On Tue, 21 Dec 2021 at 16:18, Bowen Li  wrote:

> Hi community,
>
> Reviving https://issues.apache.org/jira/browse/YUNIKORN-829 . We are
> running Spark on YuniKorn, and have a requirement to provide more
> observability of *actual* resource usage for our customers, data
> engineers/scientists who wrote Spark jobs who may not have deep expertise
> in Spark job optimization.
>
> - requirement:
>
> - have actual resource usage metrics at both job level and queue level (YK
> already have requested resource usage metrics)
>
> - key use case:
>
> - as indicators of job optimization for ICs like data engineers/scientists,
> to show users how much resources they requested v.s. how much resources
> their jobs actually used
>
> - as indicator for managers on their team's resource utilization. In our
> setup or a typical YK setup, each customer team has their own YuniKorn
> queue in a shared, multi tenant environment. Managers of the team would
> want high level (queue) metrics rather than low level (job) ones
>
> Currently we haven't found a good product on the market to do this, so
> would be great if YuniKorn can support it. Would like your input here on
> feasibility (seems feasible according Weiwei's comment in Jira), priority,
> and timeline/complexity of the projects.
>
> Thanks,
> Bowen
>


Re: Observability of actually cpu/memory usage

2021-12-21 Thread Weiwei Yang
Thank you Bowen to raise this up, this is an interesting topic. Bear with
me this long reply : )

Like Wilfred mentioned, YK doesn't know about the actual used resources in
terms of CPU and memory for each pod, or application, at least not today. I
understand the requirements about tracking this info in order to give users
some feedback or even recommendations on how to tune their jobs more
properly. It would be good to have something in our view as "Allocated" vs
"Used" for each app/queue. We could further introduce some penalties if
people keep over-requesting resources.

However, most likely we will need to do this outside of YK. The major
reason is all data YK is consuming are from api-server, backed by etcd. Non
of such metrics will be stored in etcd, as per design in metrics-server
. Second, YK doesn't
have any per-node agent running that we can facilitate to collect actual
resource usages, we still need to leverage a 3rd party tool to do so. Maybe
we can do some integration with metrics-server, aggregating app/queue used
info from those fragmented metrics, and then plug that into our
yunikorn-web UI. We have the flexibility to do this I believe, which could
be an option.

On Mon, Dec 20, 2021 at 10:28 PM Wilfred Spiegelenburg 
wrote:

> Hi Bowen,
>
> Maybe a strange question but what do you consider "actually
> used" resources? Anything the scheduler sees is used. The scheduler has no
> information on what the container really occupies: it asked for 100GB but
> it only allocated 50GB etc. If you need that YuniKorn cannot help you. If
> it is just a looking at allocation over time YuniKorn is capable of giving
> you the information.
>
> Second point to make is that normally applications do not provide any
> information on what they expect to use before they use it. Let's take a
> Spark application. The driver creates pods as it needs new executors. The
> Spark config drives those requests and the limitations. The scheduler only
> sees the pods that are really requested. It does not know, and should not
> know, if that is limited by what is configured or that the job uses only
> part or more than what is configured.
>
> The only time the scheduler would have any idea about a "maximum" is when a
> gang request is made. For gang scheduling we can track if the gang
> request is completely used or not. We could add metrics for it on an
> application. We can also track the number of containers allocated for an
> application or queue, the time from start to finish for containers etc. We
> could even track the maximum resource allocation for an application or a
> queue over a time interval. Prometheus should give us a number of
> possibilities, we just need to hook them into the scheduling cycle.
>
> As far as I know we currently do not have application metrics but that can
> always be added. Some queue metrics are there already. I think one of those
> is what you are looking for to fill a number of the gaps that you see. I
> have added YUNIKORN-829 as a subtask to YUNIKORN-720 [1] which is already
> referencing a number of metrics to improve. With the release of v0.12.1 I
> moved that jira to v1.0.0. A major improvement to the metrics would be a
> nice addition for the v1.0.0.
>
> I would not see anything that is blocking enhancing metrics: it is a part
> that can be improved without a major impact on other functionality. We do
> need to make sure that we measure the impact on performance and memory
> usage.
>
> Wilfred
>
> [1] https://issues.apache.org/jira/browse/YUNIKORN-720
>
> On Tue, 21 Dec 2021 at 16:18, Bowen Li  wrote:
>
> > Hi community,
> >
> > Reviving https://issues.apache.org/jira/browse/YUNIKORN-829 . We are
> > running Spark on YuniKorn, and have a requirement to provide more
> > observability of *actual* resource usage for our customers, data
> > engineers/scientists who wrote Spark jobs who may not have deep expertise
> > in Spark job optimization.
> >
> > - requirement:
> >
> > - have actual resource usage metrics at both job level and queue level
> (YK
> > already have requested resource usage metrics)
> >
> > - key use case:
> >
> > - as indicators of job optimization for ICs like data
> engineers/scientists,
> > to show users how much resources they requested v.s. how much resources
> > their jobs actually used
> >
> > - as indicator for managers on their team's resource utilization. In our
> > setup or a typical YK setup, each customer team has their own YuniKorn
> > queue in a shared, multi tenant environment. Managers of the team would
> > want high level (queue) metrics rather than low level (job) ones
> >
> > Currently we haven't found a good product on the market to do this, so
> > would be great if YuniKorn can support it. Would like your input here on
> > feasibility (seems feasible according Weiwei's comment in Jira),
> priority,
> > and timeline/complexity of the projects.
> >
> > Thanks,
> > Bowen
> >
>


Re: Observability of actually cpu/memory usage

2021-12-21 Thread Chaoran Yu
Previously when doing research on this topic, I saw that the metrics-server
documentation says:"*Metrics Server is not meant for non-autoscaling
purposes. For example, don't use it to forward metrics to monitoring
solutions, or as a source of monitoring solution metrics. In such cases
please collect metrics from Kubelet /metrics/resource endpoint directly*."
But the Kubelet APIs
that
the statement refers to are not documented, meaning they are hidden APIs
that can change or be deprecated at any future Kubernetes release.
Integrating with these APIs doesn't sound promising. But besides Kubelet,
the actual utilization info of workloads is not readily available anywhere
else. We'll need to explore other ideas.

On Tue, Dec 21, 2021 at 12:51 PM Weiwei Yang  wrote:

> Thank you Bowen to raise this up, this is an interesting topic. Bear with
> me this long reply : )
>
> Like Wilfred mentioned, YK doesn't know about the actual used resources in
> terms of CPU and memory for each pod, or application, at least not today. I
> understand the requirements about tracking this info in order to give users
> some feedback or even recommendations on how to tune their jobs more
> properly. It would be good to have something in our view as "Allocated" vs
> "Used" for each app/queue. We could further introduce some penalties if
> people keep over-requesting resources.
>
> However, most likely we will need to do this outside of YK. The major
> reason is all data YK is consuming are from api-server, backed by etcd. Non
> of such metrics will be stored in etcd, as per design in metrics-server
> . Second, YK doesn't
> have any per-node agent running that we can facilitate to collect actual
> resource usages, we still need to leverage a 3rd party tool to do so. Maybe
> we can do some integration with metrics-server, aggregating app/queue used
> info from those fragmented metrics, and then plug that into our
> yunikorn-web UI. We have the flexibility to do this I believe, which could
> be an option.
>
> On Mon, Dec 20, 2021 at 10:28 PM Wilfred Spiegelenburg <
> wilfr...@apache.org>
> wrote:
>
> > Hi Bowen,
> >
> > Maybe a strange question but what do you consider "actually
> > used" resources? Anything the scheduler sees is used. The scheduler has
> no
> > information on what the container really occupies: it asked for 100GB but
> > it only allocated 50GB etc. If you need that YuniKorn cannot help you. If
> > it is just a looking at allocation over time YuniKorn is capable of
> giving
> > you the information.
> >
> > Second point to make is that normally applications do not provide any
> > information on what they expect to use before they use it. Let's take a
> > Spark application. The driver creates pods as it needs new executors. The
> > Spark config drives those requests and the limitations. The scheduler
> only
> > sees the pods that are really requested. It does not know, and should not
> > know, if that is limited by what is configured or that the job uses only
> > part or more than what is configured.
> >
> > The only time the scheduler would have any idea about a "maximum" is
> when a
> > gang request is made. For gang scheduling we can track if the gang
> > request is completely used or not. We could add metrics for it on an
> > application. We can also track the number of containers allocated for an
> > application or queue, the time from start to finish for containers etc.
> We
> > could even track the maximum resource allocation for an application or a
> > queue over a time interval. Prometheus should give us a number of
> > possibilities, we just need to hook them into the scheduling cycle.
> >
> > As far as I know we currently do not have application metrics but that
> can
> > always be added. Some queue metrics are there already. I think one of
> those
> > is what you are looking for to fill a number of the gaps that you see. I
> > have added YUNIKORN-829 as a subtask to YUNIKORN-720 [1] which is already
> > referencing a number of metrics to improve. With the release of v0.12.1 I
> > moved that jira to v1.0.0. A major improvement to the metrics would be a
> > nice addition for the v1.0.0.
> >
> > I would not see anything that is blocking enhancing metrics: it is a part
> > that can be improved without a major impact on other functionality. We do
> > need to make sure that we measure the impact on performance and memory
> > usage.
> >
> > Wilfred
> >
> > [1] https://issues.apache.org/jira/browse/YUNIKORN-720
> >
> > On Tue, 21 Dec 2021 at 16:18, Bowen Li  wrote:
> >
> > > Hi community,
> > >
> > > Reviving https://issues.apache.org/jira/browse/YUNIKORN-829 . We are
> > > running Spark on YuniKorn, and have a requirement to provide more
> > > observability of *actual* resource usage for our customers, data
> > > engineers/scientists who wrote Spark jobs who may not have deep
> ex

Re: Observability of actually cpu/memory usage

2021-12-21 Thread Weiwei Yang
K8s dashboard did some integration with metrics-server, maybe we can
investigate and see how that was done.
Essentially we just need to pull these metrics somewhere.

On Tue, Dec 21, 2021 at 2:42 PM Chaoran Yu  wrote:

> Previously when doing research on this topic, I saw that the metrics-server
> documentation says:"*Metrics Server is not meant for non-autoscaling
> purposes. For example, don't use it to forward metrics to monitoring
> solutions, or as a source of monitoring solution metrics. In such cases
> please collect metrics from Kubelet /metrics/resource endpoint directly*."
> But the Kubelet APIs
> <
> https://github.com/kubernetes/kubernetes/blob/v1.21.5/pkg/kubelet/server/server.go#L236
> >that
> the statement refers to are not documented, meaning they are hidden APIs
> that can change or be deprecated at any future Kubernetes release.
> Integrating with these APIs doesn't sound promising. But besides Kubelet,
> the actual utilization info of workloads is not readily available anywhere
> else. We'll need to explore other ideas.
>
> On Tue, Dec 21, 2021 at 12:51 PM Weiwei Yang  wrote:
>
> > Thank you Bowen to raise this up, this is an interesting topic. Bear with
> > me this long reply : )
> >
> > Like Wilfred mentioned, YK doesn't know about the actual used resources
> in
> > terms of CPU and memory for each pod, or application, at least not
> today. I
> > understand the requirements about tracking this info in order to give
> users
> > some feedback or even recommendations on how to tune their jobs more
> > properly. It would be good to have something in our view as "Allocated"
> vs
> > "Used" for each app/queue. We could further introduce some penalties if
> > people keep over-requesting resources.
> >
> > However, most likely we will need to do this outside of YK. The major
> > reason is all data YK is consuming are from api-server, backed by etcd.
> Non
> > of such metrics will be stored in etcd, as per design in metrics-server
> > . Second, YK doesn't
> > have any per-node agent running that we can facilitate to collect actual
> > resource usages, we still need to leverage a 3rd party tool to do so.
> Maybe
> > we can do some integration with metrics-server, aggregating app/queue
> used
> > info from those fragmented metrics, and then plug that into our
> > yunikorn-web UI. We have the flexibility to do this I believe, which
> could
> > be an option.
> >
> > On Mon, Dec 20, 2021 at 10:28 PM Wilfred Spiegelenburg <
> > wilfr...@apache.org>
> > wrote:
> >
> > > Hi Bowen,
> > >
> > > Maybe a strange question but what do you consider "actually
> > > used" resources? Anything the scheduler sees is used. The scheduler has
> > no
> > > information on what the container really occupies: it asked for 100GB
> but
> > > it only allocated 50GB etc. If you need that YuniKorn cannot help you.
> If
> > > it is just a looking at allocation over time YuniKorn is capable of
> > giving
> > > you the information.
> > >
> > > Second point to make is that normally applications do not provide any
> > > information on what they expect to use before they use it. Let's take a
> > > Spark application. The driver creates pods as it needs new executors.
> The
> > > Spark config drives those requests and the limitations. The scheduler
> > only
> > > sees the pods that are really requested. It does not know, and should
> not
> > > know, if that is limited by what is configured or that the job uses
> only
> > > part or more than what is configured.
> > >
> > > The only time the scheduler would have any idea about a "maximum" is
> > when a
> > > gang request is made. For gang scheduling we can track if the gang
> > > request is completely used or not. We could add metrics for it on an
> > > application. We can also track the number of containers allocated for
> an
> > > application or queue, the time from start to finish for containers etc.
> > We
> > > could even track the maximum resource allocation for an application or
> a
> > > queue over a time interval. Prometheus should give us a number of
> > > possibilities, we just need to hook them into the scheduling cycle.
> > >
> > > As far as I know we currently do not have application metrics but that
> > can
> > > always be added. Some queue metrics are there already. I think one of
> > those
> > > is what you are looking for to fill a number of the gaps that you see.
> I
> > > have added YUNIKORN-829 as a subtask to YUNIKORN-720 [1] which is
> already
> > > referencing a number of metrics to improve. With the release of
> v0.12.1 I
> > > moved that jira to v1.0.0. A major improvement to the metrics would be
> a
> > > nice addition for the v1.0.0.
> > >
> > > I would not see anything that is blocking enhancing metrics: it is a
> part
> > > that can be improved without a major impact on other functionality. We
> do
> > > need to make sure that we measure the impact on performance and memory
> > > usage.
> > >
> > > Wilfred

Re: Observability of actually cpu/memory usage

2021-12-21 Thread Chenya Zhang
>From metrics server's documentation,

Don't use Metrics Server when you need:
- Non-Kubernetes clusters
- An accurate source of resource usage metrics
- Horizontal autoscaling based on other resources than CPU/Memory

I think they have some concerns on metrics accuracy. We may need to
understand what are some possible risks here.

For example, if a user is trying to tune an application but gets
conflicting information in different runs, it could be confusing for them.
If there is a good range of consistency or any potential areas of
inaccuracy that can be documented, it would be a helpful source of
information for application tuning.


On Tue, Dec 21, 2021 at 3:19 PM Weiwei Yang  wrote:

> K8s dashboard did some integration with metrics-server, maybe we can
> investigate and see how that was done.
> Essentially we just need to pull these metrics somewhere.
>
> On Tue, Dec 21, 2021 at 2:42 PM Chaoran Yu 
> wrote:
>
> > Previously when doing research on this topic, I saw that the
> metrics-server
> > documentation says:"*Metrics Server is not meant for non-autoscaling
> > purposes. For example, don't use it to forward metrics to monitoring
> > solutions, or as a source of monitoring solution metrics. In such cases
> > please collect metrics from Kubelet /metrics/resource endpoint
> directly*."
> > But the Kubelet APIs
> > <
> >
> https://github.com/kubernetes/kubernetes/blob/v1.21.5/pkg/kubelet/server/server.go#L236
> > >that
> > the statement refers to are not documented, meaning they are hidden APIs
> > that can change or be deprecated at any future Kubernetes release.
> > Integrating with these APIs doesn't sound promising. But besides Kubelet,
> > the actual utilization info of workloads is not readily available
> anywhere
> > else. We'll need to explore other ideas.
> >
> > On Tue, Dec 21, 2021 at 12:51 PM Weiwei Yang  wrote:
> >
> > > Thank you Bowen to raise this up, this is an interesting topic. Bear
> with
> > > me this long reply : )
> > >
> > > Like Wilfred mentioned, YK doesn't know about the actual used resources
> > in
> > > terms of CPU and memory for each pod, or application, at least not
> > today. I
> > > understand the requirements about tracking this info in order to give
> > users
> > > some feedback or even recommendations on how to tune their jobs more
> > > properly. It would be good to have something in our view as "Allocated"
> > vs
> > > "Used" for each app/queue. We could further introduce some penalties if
> > > people keep over-requesting resources.
> > >
> > > However, most likely we will need to do this outside of YK. The major
> > > reason is all data YK is consuming are from api-server, backed by etcd.
> > Non
> > > of such metrics will be stored in etcd, as per design in metrics-server
> > > . Second, YK
> doesn't
> > > have any per-node agent running that we can facilitate to collect
> actual
> > > resource usages, we still need to leverage a 3rd party tool to do so.
> > Maybe
> > > we can do some integration with metrics-server, aggregating app/queue
> > used
> > > info from those fragmented metrics, and then plug that into our
> > > yunikorn-web UI. We have the flexibility to do this I believe, which
> > could
> > > be an option.
> > >
> > > On Mon, Dec 20, 2021 at 10:28 PM Wilfred Spiegelenburg <
> > > wilfr...@apache.org>
> > > wrote:
> > >
> > > > Hi Bowen,
> > > >
> > > > Maybe a strange question but what do you consider "actually
> > > > used" resources? Anything the scheduler sees is used. The scheduler
> has
> > > no
> > > > information on what the container really occupies: it asked for 100GB
> > but
> > > > it only allocated 50GB etc. If you need that YuniKorn cannot help
> you.
> > If
> > > > it is just a looking at allocation over time YuniKorn is capable of
> > > giving
> > > > you the information.
> > > >
> > > > Second point to make is that normally applications do not provide any
> > > > information on what they expect to use before they use it. Let's
> take a
> > > > Spark application. The driver creates pods as it needs new executors.
> > The
> > > > Spark config drives those requests and the limitations. The scheduler
> > > only
> > > > sees the pods that are really requested. It does not know, and should
> > not
> > > > know, if that is limited by what is configured or that the job uses
> > only
> > > > part or more than what is configured.
> > > >
> > > > The only time the scheduler would have any idea about a "maximum" is
> > > when a
> > > > gang request is made. For gang scheduling we can track if the gang
> > > > request is completely used or not. We could add metrics for it on an
> > > > application. We can also track the number of containers allocated for
> > an
> > > > application or queue, the time from start to finish for containers
> etc.
> > > We
> > > > could even track the maximum resource allocation for an application
> or
> > a
> > > > queue over a time interval. Prometheus should give

Re: Observability of actually cpu/memory usage

2021-12-22 Thread Wilfred Spiegelenburg
We should be careful adding functionality to the scheduler that is not part
of the scheduling cycle. Monitoring the real usage of a pod is not part of
scheduling. It is part of the node metrics that the pod runs on. YuniKorn
is a scheduler, it does not have a presence on the nodes. We should not
create a presence on the nodes from this project. We have to rely on what
the current system can provide.

The metrics server readme [1] clearly states that it should *not* be used
as a source for monitoring solutions. Instead they should be using the
kubelet's /metrics/resource, or /metrics/cadvisor, endpoints. That would
mean that each node would need to be polled to get the metric details. That
kind of monitoring is outside of a scheduler's core tasks. Monitoring
nodes places a different set of requirements on the scheduler for
networking etc. Monitoring solutions, like Prometheus [2], already provide
this kind of functionality as an out of the box option, adding that to
YuniKorn is not the correct solution.

I completely agree that we need to provide as many details and metrics
around the scheduling as we can. Queues, Applications and Nodes should all
expose metrics from a scheduling point of view. We should provide enough
detail in the metrics to allow analysis of an application's life cycle.

Wilfred

[1]
https://github.com/kubernetes-sigs/metrics-server#kubernetes-metrics-server
[2]
https://github.com/prometheus/prometheus/blob/10e72596b95db8fa0fe5f7472691930a3393cf45/documentation/examples/prometheus-kubernetes.yml#L96

On Wed, 22 Dec 2021 at 11:54, Chenya Zhang 
wrote:

> From metrics server's documentation,
>
> Don't use Metrics Server when you need:
> - Non-Kubernetes clusters
> - An accurate source of resource usage metrics
> - Horizontal autoscaling based on other resources than CPU/Memory
>
> I think they have some concerns on metrics accuracy. We may need to
> understand what are some possible risks here.
>
> For example, if a user is trying to tune an application but gets
> conflicting information in different runs, it could be confusing for them.
> If there is a good range of consistency or any potential areas of
> inaccuracy that can be documented, it would be a helpful source of
> information for application tuning.
>
>
> On Tue, Dec 21, 2021 at 3:19 PM Weiwei Yang  wrote:
>
> > K8s dashboard did some integration with metrics-server, maybe we can
> > investigate and see how that was done.
> > Essentially we just need to pull these metrics somewhere.
> >
> > On Tue, Dec 21, 2021 at 2:42 PM Chaoran Yu 
> > wrote:
> >
> > > Previously when doing research on this topic, I saw that the
> > metrics-server
> > > documentation says:"*Metrics Server is not meant for non-autoscaling
> > > purposes. For example, don't use it to forward metrics to monitoring
> > > solutions, or as a source of monitoring solution metrics. In such cases
> > > please collect metrics from Kubelet /metrics/resource endpoint
> > directly*."
> > > But the Kubelet APIs
> > > <
> > >
> >
> https://github.com/kubernetes/kubernetes/blob/v1.21.5/pkg/kubelet/server/server.go#L236
> > > >that
> > > the statement refers to are not documented, meaning they are hidden
> APIs
> > > that can change or be deprecated at any future Kubernetes release.
> > > Integrating with these APIs doesn't sound promising. But besides
> Kubelet,
> > > the actual utilization info of workloads is not readily available
> > anywhere
> > > else. We'll need to explore other ideas.
> > >
> > > On Tue, Dec 21, 2021 at 12:51 PM Weiwei Yang  wrote:
> > >
> > > > Thank you Bowen to raise this up, this is an interesting topic. Bear
> > with
> > > > me this long reply : )
> > > >
> > > > Like Wilfred mentioned, YK doesn't know about the actual used
> resources
> > > in
> > > > terms of CPU and memory for each pod, or application, at least not
> > > today. I
> > > > understand the requirements about tracking this info in order to give
> > > users
> > > > some feedback or even recommendations on how to tune their jobs more
> > > > properly. It would be good to have something in our view as
> "Allocated"
> > > vs
> > > > "Used" for each app/queue. We could further introduce some penalties
> if
> > > > people keep over-requesting resources.
> > > >
> > > > However, most likely we will need to do this outside of YK. The major
> > > > reason is all data YK is consuming are from api-server, backed by
> etcd.
> > > Non
> > > > of such metrics will be stored in etcd, as per design in
> metrics-server
> > > > . Second, YK
> > doesn't
> > > > have any per-node agent running that we can facilitate to collect
> > actual
> > > > resource usages, we still need to leverage a 3rd party tool to do so.
> > > Maybe
> > > > we can do some integration with metrics-server, aggregating app/queue
> > > used
> > > > info from those fragmented metrics, and then plug that into our
> > > > yunikorn-web UI. We have the flexibility to do this I believe,

Re: Observability of actually cpu/memory usage

2022-01-05 Thread Bowen Li
Thanks all for your inputs. To clarify, the goal of the conversation is to
get some consensus and common understanding on motivation and business
needs first, though seems the discussion starts to diverge to
implementations.

Maybe let's retake a look at the business needs first - the use case is for
data engineers/scientists to find optimization room upon
over/under-allocation of cpu/memory, and thus need a way to look into
container cpu/memory usage both at an aggregated level of all containers in
a Spark job, or navigate to just one container as part of a job. What's
missing is the association. Say a Spark job has 2 executor pods (a, b),
users basically wanna see 1) aggregated cpu/memory usage, like sum(runtime
cpu(a, b)) v.s. sum(requested cpu(a, b)), 2) identify executor pods are
a+b, not a+c nor d+e, and quickly navigate to runtime-cpu(a) v.s.
requested-cpu(a).

We are pay close-source vendors big bills just to do this single thing,
while it's much better to have an open source solution, and YK is in the
best position to connect scheduling info with metrics. It's a very common
requirement once the scale of workload goes beyond hundreds of cpus, not
just from us, and we'll see it coming up as YK adoption grows.

There seems to be different ideas and implementations, e.g. how to do it?
shall it be a pluggable model? where it should live? but I hope to keep it
for later discussion.

Can you share your thoughts on the motivation? If it looks good, and the
YIP proposal (see another thread I just sent) passes, we can start a formal
design discussion as YIP-1.

Thanks,
Bowen


On Wed, Dec 22, 2021 at 7:48 PM Wilfred Spiegelenburg 
wrote:

> We should be careful adding functionality to the scheduler that is not part
> of the scheduling cycle. Monitoring the real usage of a pod is not part of
> scheduling. It is part of the node metrics that the pod runs on. YuniKorn
> is a scheduler, it does not have a presence on the nodes. We should not
> create a presence on the nodes from this project. We have to rely on what
> the current system can provide.
>
> The metrics server readme [1] clearly states that it should *not* be used
> as a source for monitoring solutions. Instead they should be using the
> kubelet's /metrics/resource, or /metrics/cadvisor, endpoints. That would
> mean that each node would need to be polled to get the metric details. That
> kind of monitoring is outside of a scheduler's core tasks. Monitoring
> nodes places a different set of requirements on the scheduler for
> networking etc. Monitoring solutions, like Prometheus [2], already provide
> this kind of functionality as an out of the box option, adding that to
> YuniKorn is not the correct solution.
>
> I completely agree that we need to provide as many details and metrics
> around the scheduling as we can. Queues, Applications and Nodes should all
> expose metrics from a scheduling point of view. We should provide enough
> detail in the metrics to allow analysis of an application's life cycle.
>
> Wilfred
>
> [1]
> https://github.com/kubernetes-sigs/metrics-server#kubernetes-metrics-server
> [2]
>
> https://github.com/prometheus/prometheus/blob/10e72596b95db8fa0fe5f7472691930a3393cf45/documentation/examples/prometheus-kubernetes.yml#L96
>
> On Wed, 22 Dec 2021 at 11:54, Chenya Zhang 
> wrote:
>
> > From metrics server's documentation,
> >
> > Don't use Metrics Server when you need:
> > - Non-Kubernetes clusters
> > - An accurate source of resource usage metrics
> > - Horizontal autoscaling based on other resources than CPU/Memory
> >
> > I think they have some concerns on metrics accuracy. We may need to
> > understand what are some possible risks here.
> >
> > For example, if a user is trying to tune an application but gets
> > conflicting information in different runs, it could be confusing for
> them.
> > If there is a good range of consistency or any potential areas of
> > inaccuracy that can be documented, it would be a helpful source of
> > information for application tuning.
> >
> >
> > On Tue, Dec 21, 2021 at 3:19 PM Weiwei Yang  wrote:
> >
> > > K8s dashboard did some integration with metrics-server, maybe we can
> > > investigate and see how that was done.
> > > Essentially we just need to pull these metrics somewhere.
> > >
> > > On Tue, Dec 21, 2021 at 2:42 PM Chaoran Yu 
> > > wrote:
> > >
> > > > Previously when doing research on this topic, I saw that the
> > > metrics-server
> > > > documentation says:"*Metrics Server is not meant for non-autoscaling
> > > > purposes. For example, don't use it to forward metrics to monitoring
> > > > solutions, or as a source of monitoring solution metrics. In such
> cases
> > > > please collect metrics from Kubelet /metrics/resource endpoint
> > > directly*."
> > > > But the Kubelet APIs
> > > > <
> > > >
> > >
> >
> https://github.com/kubernetes/kubernetes/blob/v1.21.5/pkg/kubelet/server/server.go#L236
> > > > >that
> > > > the statement refers to are not documented, meaning they are hid