[ 
https://issues.apache.org/jira/browse/FELIX-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450286#comment-17450286
 ] 

Christian Schneider commented on FELIX-6475:
--------------------------------------------

[~henzlerg] The issue is that the check that fails with OOM error is a 
readiness check. So failing this check will only take the load off the system 
but not restart it. In addition to not restarting the pod is kept alive longer 
as without the load it survives longer before eventually leading to a restart.

I wonder if we should track java errors separately and make sure they are 
reported as alive check failure. Can health check core report such an error? 

> How to handle OutOfMemoryError in health check
> ----------------------------------------------
>
>                 Key: FELIX-6475
>                 URL: https://issues.apache.org/jira/browse/FELIX-6475
>             Project: Felix
>          Issue Type: Bug
>          Components: Health Checks
>    Affects Versions: healthcheck.core 2.0.10
>            Reporter: Christian Schneider
>            Priority: Critical
>
> Currently a java Error lets during a health check returns a HealthCheck ERROR 
> state.
> This is especially problematic when the health check is a k8s readiness check 
> as then the pod is take out of the load balancer but not necessarily 
> restarted.
> After digging more the error happens inside the BundlesStartedCheck. I don't 
> think the check is causing the OutOfMemoryError but it repeatedly shows it.
> So the question is how should felix healthcheck code handle such a java error?
>  
> {code:java}
> 10.04.2021 08:00:10.181 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> 10.04.2021 08:00:10.231 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to