FWIW, we use
irate(container_memory_failcnt{kubernetes_container_name!=""}[5m]),
I am not sure about the semantics of this with OOMs of
not-top-of-the-container processes, you'll have to check.
/MR
On Fri, Aug 19, 2016 at 7:50 AM Romain Vrignaud <[email protected]> wrote:
> Thank you for your answers.
> I'll have a look at cAdvisor API.
>
> 2016-08-18 20:01 GMT+02:00 'Vishnu Kannan' via Kubernetes user discussion
> and Q&A <[email protected]>:
>
>>
>>
>> On Thu, Aug 18, 2016 at 2:20 AM, Romain Vrignaud <[email protected]>
>> wrote:
>>
>>> Hi Vishnu,
>>>
>>> To be more clear, I'm talking about application that try to allocate
>>> more memory thant granted by limits.
>>> I did managed to get the oom status in the pod event with a simple
>>> container allocating more memory than available.
>>> Unfortunately all my containers are using an init manager (supervisord).
>>> When application is eating too much memory, OOM-killer kills the
>>> application but supervisord is still alive and restart it. So my container
>>> is never ended which seems to be mandatory to get OOM information from
>>> kubelet.
>>>
>> To Kubernetes, this scenario is not considered as a failure of the
>> container. Is supervisord necessary for your application?
>>
>>>
>>> Am-I missing something ?
>>>
>> Not really.
>>
>>> Would that be possible to get OOM event event if pod do not stop ?
>>>
>> No. K8s purposefully ignores OOMs that are gracefully handled. Ideally
>> supervisord should be exposing metrics that expose OOMs.
>>
>>
>>> Would that be possible that kubelet exposes it as a prometheus metric ?
>>>
>> I don't think so.
>>
>> If you want to continue using supervisord, take a look at cAdvisor
>> events API
>> <https://github.com/google/cadvisor/blob/master/docs/api.md#events> which
>> should give you access to OOM events for your container. You can pass in
>> the output of `cat /proc/self/cgroup | grep cpu: | cut -d ':' -f 3` as the
>> container name to cAdvisor.
>>
>>>
>>> Regards,
>>>
>>> 2016-08-17 18:39 GMT+02:00 'Vishnu Kannan' via Kubernetes user
>>> discussion and Q&A <[email protected]>:
>>>
>>>> Kubelet detects OOMs and surfaces it as part of container status.
>>>> `kubectl describe pods` should show OOM as the termination reason whenever
>>>> a container is OOM killed.
>>>> Kubelet also detects System OOMs automatically and surfaces them as
>>>> node events.
>>>>
>>>> Does that meet your requirements?
>>>>
>>>> On Wed, Aug 17, 2016 at 9:03 AM, Romain Vrignaud <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Today I have a hacky grok of my logging DaemonSet that parse kernel
>>>>> logs to detect OOM inside pods due to memory limits.
>>>>>
>>>>> This is a bit hacky and I would like to improve that. I think that
>>>>> cAdvisor already detect OOMs but I'm not sur what is the recommended way
>>>>> to
>>>>> monitor that. I'm using prometheus, so a prometheus exposed metric would
>>>>> be
>>>>> awesome.
>>>>>
>>>>> How do you guys manage to monitor oom ?
>>>>>
>>>>> Regards,
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Kubernetes user discussion and Q&A" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected]
>>>>> .
>>>>> Visit this group at https://groups.google.com/group/kubernetes-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Kubernetes user discussion and Q&A" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/kubernetes-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Kubernetes user discussion and Q&A" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/kubernetes-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Kubernetes user discussion and Q&A" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/kubernetes-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>
--
Matthias Rampke
Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173
6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with
Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.