[ 
https://issues.apache.org/jira/browse/FELIX-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451327#comment-17451327
 ] 

Georg Henzler commented on FELIX-6475:
--------------------------------------

[~cschneider] so in theory you can easily add the bundles check vor both 
readiness and liveness - is the problem that reporting critical for liveness 
during startup kills the pod before it ever gets ready? (reading up about it 
also the relatively new startup probes would be an option, a HC tag 
"systemstartup" could work here, compare 
[kubeadm#2137|https://github.com/kubernetes/kubeadm/issues/2137] and [startup 
probes|https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes])

bq. I wonder if we should track java errors separately and make sure they are 
reported as alive check failure. Can health check core report such an error? 

We could keep track of system errors like OOM in the HC executor and make it 
provide a HealthCheck with a customisable tag - but in a way it's duplicate 
because the existing checks correctly return HEALTH_CHECK_ERROR for such cases

> How to handle OutOfMemoryError in health check
> ----------------------------------------------
>
>                 Key: FELIX-6475
>                 URL: https://issues.apache.org/jira/browse/FELIX-6475
>             Project: Felix
>          Issue Type: Bug
>          Components: Health Checks
>    Affects Versions: healthcheck.core 2.0.10
>            Reporter: Christian Schneider
>            Priority: Critical
>
> Currently a java Error lets during a health check returns a HealthCheck ERROR 
> state.
> This is especially problematic when the health check is a k8s readiness check 
> as then the pod is take out of the load balancer but not necessarily 
> restarted.
> After digging more the error happens inside the BundlesStartedCheck. I don't 
> think the check is causing the OutOfMemoryError but it repeatedly shows it.
> So the question is how should felix healthcheck code handle such a java error?
>  
> {code:java}
> 10.04.2021 08:00:10.181 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> 10.04.2021 08:00:10.231 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to