[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230243#comment-17230243 ] Yifan Cai commented on CASSANDRA-15214: --- Thanks [~jwest]. Addressed you comments and run CI (unit, jvm dtest and dtest) after rebasing. There are a few test failures, but do not look related to the change. CI result: https://app.circleci.com/pipelines/github/yifan-c/cassandra/159/workflows/a37b8a85-b705-479e-b7ca-846bb71b36dc > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228865#comment-17228865 ] David Capwell commented on CASSANDRA-15214: --- +1 Need second reviewer, can merge after. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > Time Spent: 50m > Remaining Estimate: 0h > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227604#comment-17227604 ] David Capwell commented on CASSANDRA-15214: --- +1 from me with small comment, see PR. I tested this patch by breaking byte buffer allocation to run out of direct memory, in doing so found an edge case on client (.transport package) code, so once that is fixed client and internode shut down on OOM. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > Time Spent: 0.5h > Remaining Estimate: 0h > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204348#comment-17204348 ] Yifan Cai commented on CASSANDRA-15214: --- Talked with Benedict on Slack and cleaned up my confusion. So the {{JVMStabilityInspector}} is able to inspect the OOM error. But after it re-throws, Netty catches all throwables and simply logs. It happens [here|https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L303-L316]. Therefore, the {{propagateOutOfMemory}} parameter was added. I submitted a PR that allows to produce a heap space OOM error forcefully when catching a direct buffer OOM. The PR also removes the parameter {{propagateOutOfMemory}} in the {{JVMStabilityInspector}}. Because it makes sure the instance can crash/exit properly on OOM. (see the gist below) PR: https://github.com/apache/cassandra/pull/761 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra/112/workflows/293a4334-d2df-43f9-b532-1d79876701c1 I have also created a separate demo to prove that JVM invokes the OOM handler even if such OOM error (not including the direct buffer one) is to be swallowed by a catch block. The code and the output can be found at the gist: https://gist.github.com/yifan-c/82ff4fd7fbe83fe41113f6f14cba4907. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > Time Spent: 10m > Remaining Estimate: 0h > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198670#comment-17198670 ] Benedict Elliott Smith commented on CASSANDRA-15214: As I have said, they do not - unless you are confident I am wrong. That is the reason this ticket was filed, and I ascertained this at a time when I was intimately familiar with Netty’s workings. The non-propagation of OOM by inspectThrowable is irrelevant. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198585#comment-17198585 ] Yifan Cai commented on CASSANDRA-15214: --- > where would that exception end up if it were rethrown {{JVMStabilityInspector}} only re-throws {{OutOfMemoryError}}. Depending on the presence of those OOM-related JVM options, {{OnOutOfMemoryError}}, {{ExitOnOutOfMemoryError}} or {{HeapDumpOnOutOfMemoryError}}, the JVM exits and trigger a heap dump if it is a heap space OOM error. However, the call-sites indicate to not re-throw OOM error (e.g. [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/InboundMessageHandler.java#L647-L659]), which I'd like to learn why we do not let the JVM to exit. Netty by default just logs the exception, when {{exceptionCaught()}} is _not_ implemented in any of the handler in the inbound direction. For the outbound, client code handles exception by adding listener to {{ChannelFuture}} or {{ChannelPromise}}. We have the handling in both directions. Besides the inbound/outbound pathes, it looks like that Netty does do a lot of catch-{{Throwable}}-and-swallow things in its code base. So it is possible that errors from Netty internal are not bubbled up. For example, this [issue|https://github.com/netty/netty/issues/6096]. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198233#comment-17198233 ] Benedict Elliott Smith commented on CASSANDRA-15214: Zoom out a bit - where would that exception end up if it were rethrown? I can't remember precisely, but it is caught by Netty's default exception handling and iirc simply logged. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198034#comment-17198034 ] Yifan Cai commented on CASSANDRA-15214: --- > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. Running a code inspection, the exception/throwable from Netty is already handled. For inbound, the {{InboundMessageHandler}} implements {{exceptionCaught()}} which invokes {{JVMStabilityInspector}}. The message handler is the last one in the inbound direction, and there is no previous handler that handles exceptions. So the message handler should handle all exceptions from that direction. However, the {{exceptionCaught()}} override in {{StreamingInboundHandler}} does not invoke {{JVMStabilityInspector}}. It could swallow OOM errors. For outbound, {{JVMStabilityInspector}} is invoked when the channel future fails, and several other places. All the above callsites call {{JVMStabilityInspector}} with {{propagateOutOfMemory}} disabled. So the inspector just swallows the OOM errors and not let JVM to handle. [~benedict], what is the reason for doing so in the inbound/outbound connections? > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150869#comment-17150869 ] Robert Stupp commented on CASSANDRA-15214: -- Just read this ticket and the approach looks absolutely reasonable to me. One thing though is that the the (off-heap) row-cache isn't covered here - let me know whether it's reasonable to add some support regarding this ticket. IMHO, people shouldn't use the row-cache, but I'm not sure whether there are reasonable use cases out there in the wild. Don't want to start a discussion about the row-cache in this, just a heads-up. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103019#comment-17103019 ] Yifan Cai commented on CASSANDRA-15214: --- Sounds good [~jolynch]. So for this ticket, the goal is to force JVM to trigger a Heap OOM upon receiving the direct buffer OOM. (I can work on it.) Do you want to the jvmquake integration be addressed in a different ticket? > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100330#comment-17100330 ] Joey Lynch commented on CASSANDRA-15214: > Since one should be able to trigger the OOM by looping allocating large chunk > of memory, e.g. array, in the java code. What is the benefit of doing it so > using jvmquake? I can see that in the killer_thread callback function, it > also does long array allocation once notified by the gc callback. Ah sorry I was not clear. I think the JVMStabilityDetector (which we call into via inspectThrowable all over the place) should allocate the long array if we see an OutOfMemoryError with message "Direct buffer memory", in turn triggering a Heap OOM (which will trigger the normal resource exhausted mechanism). Since we're not out of _heap_ memory we can trust that JVMStatbilityDetector can run. I guess my proposal is to include jvmquake by default for linux deployments (I can add more architectures if we want more, easy to opt out), and if JVMStabilityDetector sees a "Direct buffer memory" OOM it should force the JVM into a heap OOM, triggering jvmquake's resource exhausted handler. This setup would guarantee that C* dies (and produces a heap dump) if any of the following conditions hold: * The JVM is out of heap memory * The JVM has accumulated 30s of GC debt with 1:5 runtime weight (meaning that we had <85% throughput for at least 30s): aka "GC spirals of death" * The JVM is out of metaspace memory * The JVM is out of threads * (best effort, likely true) The JVM is out of native memory (so basically C* is using 2x the heap size) -> triggers a heap oom -> triggers the first case Unlike the built in JVM options jvmquake really actually works in these edge cases (not only is there a test suite to prove it that the built in Java options don't work but if you run inside the heap you fundamentally can't guarantee you will run, e.g. why the kill -9 approach never really works). > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100193#comment-17100193 ] Yifan Cai commented on CASSANDRA-15214: --- Got it. I did not look closely enough at the discussions in CASSANDRA-13006. I agree that leaving it to JVM is a more clean and general solution. Also as you mentioned, "It's relatively easy to ignore the "sacrificial" long array in a heap dump and we could log clearly what is happening." Since one should be able to trigger the OOM by looping allocating large chunk of memory, e.g. array, in the java code. What is the benefit of doing it so using jvmquake? I can see that in the killer_thread callback function, it also does long array allocation once notified by the gc callback. The comment of the callback says {quote}the only way to reliably trigger OutOfMemory when we are not actually out of memory (e.g. due to GC behavior) that I could find was to make JNI calls that allocate large blobs of memory which can only be done from outside of the GC callbacks. {quote} Can you elaborate more about preferring jvmquake? > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100063#comment-17100063 ] Joey Lynch commented on CASSANDRA-15214: > An alternative way could be programmatically grab the heap dump via > [JMX|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java#L75] > and exit. I believe that was more or less what C* was doing before CASSANDRA-13006 if I'm reading the patch in [02aba73|https://github.com/apache/cassandra/commit/02aba73] correctly, and Eric Evans pointed out this approach in general can cause the C*'s jmap heap dump to race with the JVM heap dump and advocated for just letting the JVM handle it with built in options. The nice thing about the jvmquake technique of just running the heap out of memory is all the normal JVM options work as expected (logging and dumping heap to a particular location on disk mostly). That being said, I think that for the direct buffer issue in particular this won't be a problem since as we've established the JVM OOM report_java_out_of_memory isn't triggered on direct memory allocation failures. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099582#comment-17099582 ] Yifan Cai commented on CASSANDRA-15214: --- Thanks [~jolynch] for the update. {quote}force the JVM into a "normal" OOM by [allocating large long arrays|https://github.com/Netflix-Skunkworks/jvmquake/blob/master/src/jvmquake.c#L103] {quote} An alternative way could be programmatically grab the heap dump via [JMX|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java#L75] and exit. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099546#comment-17099546 ] Joey Lynch commented on CASSANDRA-15214: Quick update on this from the jvmquake side we are now building [architecture specific artifacts|https://github.com/Netflix-Skunkworks/jvmquake/releases] that will work with any JVM newer than Java 8, they link only against the platform specific libc (we're also now testing on Java 8 and 11, on both zulu and openjdk JVMs). I think this means it would be plausible to include the {{libjvmquake-linux-x86_64.so}} in {{libs}} and then have a switch on uname -s -m to determine to pick it up or not. Right now we're only building for linux amd64 but if there is interest I can generate more architectures (linux arm probably makes sense, and could do osx). I also still like the idea of having a agents/available and agents/enabled folder like apache does for modules, users can just symlink agents from one to the other to include them (and we can symlink jamm and jvmquake by default). [~yifanc] I agree that the OutOfMemory conditions that do not result in "true" JVM OOM (meaning that it would cause a heapdump via {{HeapDumpOnOutOfMemory}}) will not get caught by jvmquake, my testing confirms your findings, although the jvmquake GC instability algorithm will still trigger in various real world scenarios I've run into. I feel like the right move mightly be to walk back a small bit of CASSANDRA-13006 where we stopped forcibly killing the JVM ourselves and let the JVM do it. Specifically if the OOM message contains "Direct buffer memory" we could do what jvmquake does and force the JVM into a "normal" OOM by [allocating large long arrays|https://github.com/Netflix-Skunkworks/jvmquake/blob/master/src/jvmquake.c#L103]. This will then trigger a proper OOM and get us heap dumping. It's relatively easy to ignore the "sacrificial" long array in a heap dump and we could log clearly what is happening. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077743#comment-17077743 ] Manish Ghildiyal commented on CASSANDRA-15214: -- Please let me know if I can contribute here. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059535#comment-17059535 ] Dinesh Joshi commented on CASSANDRA-15214: -- Followed up with [~jolynch] regarding his original comment about including C JVMTI agents in C*. If we build the agent for the officially supported JVMs, we should be good. We need to detect the platform, JVM combo and load it up. If the agent is unavailable for the specific VM/Platform combination, it can be disabled with a warning in the logs much like what we do with `NativeLibrary` except this will need to happen as part of the startup script. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901558#comment-16901558 ] Yifan Cai commented on CASSANDRA-15214: --- [~jolynch], you are welcome. Please use them. The test cases attached are more on the `Unsafe.allocateMemory` path. As far as I can see, they are different from the ones included in the jvmquake's test cases that only check the heap OOM. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901555#comment-16901555 ] Joseph Lynch commented on CASSANDRA-15214: -- [~yifanc] If you are ok with it I can add your test cases to [jvmquake|https://github.com/Netflix-Skunkworks/jvmquake/tree/master/tests] to ensure it handles all edge cases. For what it's worth jvmquake is a strict superset of jvmkill and I wouldn't advocate for using jvmkill (I'm biased though). In my production experience jvmquake actually works at detecting GC spirals of death that C* runs into while jvmkill simply doesn't work as C* doesn't actually go OOM, it just death spirals. See the "hard oom" [test cases|https://github.com/Netflix-Skunkworks/jvmquake/blob/master/tests/test_hard_ooms.py] for example where jvmkill won't work while jvmquake will work. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901497#comment-16901497 ] Yifan Cai commented on CASSANDRA-15214: --- Several experiments of the OOM scenario are made to check if the HotSpot handlers work as expected, namely kill the process. The result shows that the handlers, OnOutOfMemoryError and ExitOnOutOfMemoryError, are only effective for heap OOM. *Experiments* The experiments are designed to emulate what happens in C* while being minimal. They have the Thread.setDefaultUncaughtExceptionHandler installed and just re-throw the OOM error hoping the handlers can take care. OpenJDK 8 was used. You can find all the 5 experiments in the attached [^oom-experiments.zip]. {code:java} ├── OomExperimentExceedsDirectBuffer.java ├── OomExperimentExceedsDirectBufferRapidAlloc.java ├── OomExperimentExceedsHeap.java ├── OomExperimentSimple.java └── OomExperimentSimpleJustExit.java{code} Among those experiments, there is only one (OomExperimentExceedsHeap) can successfully trigger the handlers. The rest do throw the OutOfMemoryError, but the handlers are not triggered. *Some Research* The cause could be due to the difference of the code path in JVM implementation to allocate memory on heap and for direct buffer. (OpenJDK8 is the reference) Heap memory allocation happens at [collectedHeap.inline.hpp#CollectedHeap::common_mem_allocate_noinit|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp#L149]. When it failed, it calls [report_java_out_of_memory|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/utilities/debug.cpp#L287], which is responsible to create a heap dump on OOM and run the handlers. Meanwhile, allocating direct buffer take a different path. In java.nio.DirectByteBuffer, OOM can happen at 2 places. 1. Bits.reserveMemory, finds out there is not enough direct memory and throws OOM. In this case, I do not think the OOM is caught and handled in JVM to trigger report_java_out_of_memory. 2. unsafe.allocateMemory, which calls malloc directly, but [failed to allocate and throws OOM|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/prims/unsafe.cpp#L606]. Again, such OOM was throw in order to let the application to handle. Another proof is that [report_java_out_of_memory|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/utilities/debug.cpp#L287], the only place to trigger the handler, was not invoked during unsafe.allocateMemory. Here are [all the references of the method invocation|https://github.com/AdoptOpenJDK/openjdk-jdk8u/search?q=report_java_out_of_memory_q=report_java_out_of_memory]. Because of that, jvmkill or jvmquake mentioned in the ticket might not work. The tool replies on the notification of the [JvmtiExport::post_resource_exhausted|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp#L153], which does not present in the 2 places that direct buffer OOM can happen. Here is the implementation of [jvmkill|https://github.com/airlift/jvmkill/blob/master/jvmkill.c#L24] (less than 100 lines). > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900410#comment-16900410 ] Dinesh Joshi commented on CASSANDRA-15214: -- Sounds great. [~benedict] who would be able to take up the audit? Is this something I can help with? > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900394#comment-16900394 ] Benedict commented on CASSANDRA-15214: -- Sorry, I completely forgot to respond to this ticket so thanks for bumping it [~djoshi3] >From my POV, including a C JVMTI agent is absolutely fine. We'd have to take >a closer look at jvmkill and jvmquake, and do our own brief audit of the >version we include to ensure it seems to behave reasonably. But I don't see >any problem with utilising non-Java functionality. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900379#comment-16900379 ] Dinesh Joshi commented on CASSANDRA-15214: -- I think this issue might be related to https://bugs.openjdk.java.net/browse/JDK-8027434. Other projects that use the JVM have run into a similar issue and the usual solution is to use [jvmkill|https://github.com/airlift/jvmkill]. The issue at hand is when a JVM has run out of memory (heap or otherwise), it enters an undefined state. In this situation, I would not expect the handlers to work as expected either. I think we should either use jvmkill or [jvmquake|https://github.com/Netflix-Skunkworks/jvmquake] to solve this issue as it has proven to be reliable and Netflix, Facebook and other large JVM users are actively using it. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889767#comment-16889767 ] Tomas Shestakov commented on CASSANDRA-15214: - There is two options to handle *OOM* in java 8u92 [https://www.oracle.com/technetwork/java/javase/8u92-relnotes-2949471.html] -XX:+ExitOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886475#comment-16886475 ] Joseph Lynch commented on CASSANDRA-15214: -- We've (Netlfix) found handling OOMs to be generally hard to do correctly in all the various Java codebases we have so we built an agent solution which attaches to the JVM in [https://github.com/Netflix-Skunkworks/jvmquake]. I think the only reason that we couldn't just directly include that in C* is because it's a C JVMTI agent instead of a Java one, but perhaps we could just solve this with some documentation and making it really easy to include agents (which is useful regardless)? The following is the patch for supporting easy pluggable agents for C*: {noformat} diff --git a/conf/cassandra-env.sh b/conf/cassandra-env.sh index d6c48be0a3..92061db3ab 100644 --- a/conf/cassandra-env.sh +++ b/conf/cassandra-env.sh @@ -134,6 +134,29 @@ do JVM_OPTS="$JVM_OPTS $opt" done +# Pull in any agents present in CASSANDRA_HOME +for agent_file in ${CASSANDRA_HOME}/agents/*.jar; do + if [ -e "${agent_file}" ]; then +base_file="${agent_file%.jar}" +if [ -s "${base_file}.options" ]; then + options=`cat ${base_file}.options` + agent_file="${agent_file}=${options}" +fi +JVM_OPTS="$JVM_OPTS -javaagent:${agent_file}" + fi +done + +for agent_file in ${CASSANDRA_HOME}/agents/*.so; do + if [ -e "${agent_file}" ]; then +base_file="${agent_file%.so}" +if [ -s "${base_file}.options" ]; then + options=`cat ${base_file}.options` + agent_file="${agent_file}=${options}" +fi +JVM_OPTS="$JVM_OPTS -agentpath:${agent_file}" + fi +done {noformat} Then we can just drop agents into the {{CASSANDRA_HOME/agents}} folder and they are loaded automatically by Cassandra. From a security perspective this is identical to "drop a jar". > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org