[ 
https://issues.apache.org/jira/browse/FELIX-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450264#comment-17450264
 ] 

Georg Henzler commented on FELIX-6475:
--------------------------------------

I think we should stick to 
[https://github.com/apache/felix-dev/tree/master/healthcheck#semantic-meaning-of-health-check-results]
 - currently {{HEALTH_CHECK_ERROR}} is returned (see 
[HealthCheckExecutorImpl.java#L434|https://github.com/apache/felix-dev/blob/3e5671ae7e5107f4f849ef9d5f0a89b1ba9d7439/healthcheck/core/src/main/java/org/apache/felix/hc/core/impl/executor/HealthCheckExecutorImpl.java#L434])
 - this means instance cannot be used anymore.

For an OutOfMemoryError no "in-JVM handling" could reliably work (as anything 
yo do might also fail due to the missing memory) - so I think the handling / 
orchestration really should happen in k8s. 

bq. ...  a k8s readiness check as then the pod is take out of the load balancer 
but not necessarily restarted.

can this be configured on k8s side to ensure a minimum number of pods?


> How to handle OutOfMemoryError in health check
> ----------------------------------------------
>
>                 Key: FELIX-6475
>                 URL: https://issues.apache.org/jira/browse/FELIX-6475
>             Project: Felix
>          Issue Type: Bug
>          Components: Health Checks
>    Affects Versions: healthcheck.core 2.0.10
>            Reporter: Christian Schneider
>            Priority: Critical
>
> Currently a java Error lets during a health check returns a HealthCheck ERROR 
> state.
> This is especially problematic when the health check is a k8s readiness check 
> as then the pod is take out of the load balancer but not necessarily 
> restarted.
> After digging more the error happens inside the BundlesStartedCheck. I don't 
> think the check is causing the OutOfMemoryError but it repeatedly shows it.
> So the question is how should felix healthcheck code handle such a java error?
>  
> {code:java}
> 10.04.2021 08:00:10.181 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> 10.04.2021 08:00:10.231 *WARN* [hc-monitor-15-systemalive,systemready] 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl Unexpected 
> Exception during future.get(): java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultFromFuture(HealthCheckExecutorImpl.java:430)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.collectResultsFromFutures(HealthCheckExecutorImpl.java:408)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.createResultsForDescriptors(HealthCheckExecutorImpl.java:268)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:211)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:181)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.executor.HealthCheckExecutorImpl.execute(HealthCheckExecutorImpl.java:168)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthState.update(HealthState.java:123)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$null$2(HealthCheckMonitor.java:264)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>       at 
> java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
>       at 
> java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
>       at 
> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
>       at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>       at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>       at 
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.lambda$run$3(HealthCheckMonitor.java:263)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.runWithThreadNameContext(HealthCheckMonitor.java:321)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> org.apache.felix.hc.core.impl.monitor.HealthCheckMonitor.run(HealthCheckMonitor.java:259)
>  [org.apache.felix.healthcheck.core:2.0.8]
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to