[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background

Craig Condit (Jira) Wed, 16 Mar 2022 08:46:05 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507689#comment-17507689
 ]


Craig Condit commented on YUNIKORN-1107:
----------------------------------------

[~lowc1012], on a large cluster, the health check can take a considerable 
amount of time as it has to walk all the internal data structures, acquiring 
locks along the way that can block scheduler progress. An attacker would only 
need to spam lots of health check requests in a short period of time to 
essentially block the scheduler from making forward progress. We really only 
need to run the check maybe every 30-60 seconds.

The liveness probe doesn't really make sense for YuniKorn, as if the service is 
running, it is "live". The health check, in part because it needs to acquire 
and release many locks, can sometimes report incorrect information depending 
upon the timing of operations. It also may report issues that are really more 
relevant for the K8s cluster health as a whole and do not indicate a problem 
with YK itself. This is useful for diagnostics, but is not a reliable indicator 
that YK should be terminated and restarted.



> Make health check occur in the background
> -----------------------------------------
>
>                 Key: YUNIKORN-1107
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1107
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Craig Condit
>            Assignee: Ryan Lo
>            Priority: Major
>
> Currently, the health check endpoint in the REST API performs a lengthy 
> process that could be used as a denial-of-service vector. We should schedule 
> the health check in the background periodically, and have the REST API simply 
> report the results of the latest check.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

[jira] [Commented] (YUNIKORN-1107) Make health check occur in the background

Reply via email to