[ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204348#comment-17204348
 ] 

Yifan Cai commented on CASSANDRA-15214:
---------------------------------------

Talked with Benedict on Slack and cleaned up my confusion. So the 
{{JVMStabilityInspector}} is able to inspect the OOM error. But after it 
re-throws, Netty catches all throwables and simply logs. It happens 
[here|https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L303-L316].
 Therefore, the {{propagateOutOfMemory}} parameter was added. 

I submitted a PR that allows to produce a heap space OOM error forcefully when 
catching a direct buffer OOM. 
The PR also removes the parameter {{propagateOutOfMemory}} in the 
{{JVMStabilityInspector}}. Because it makes sure the instance can crash/exit 
properly on OOM. (see the gist below)

PR: https://github.com/apache/cassandra/pull/761
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra/112/workflows/293a4334-d2df-43f9-b532-1d79876701c1

I have also created a separate demo to prove that JVM invokes the OOM handler 
even if such OOM error (not including the direct buffer one) is to be swallowed 
by a catch block. 
The code and the output can be found at the gist: 
https://gist.github.com/yifan-c/82ff4fd7fbe83fe41113f6f14cba4907.

> OOMs caught and not rethrown
> ----------------------------
>
>                 Key: CASSANDRA-15214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client, Messaging/Internode
>            Reporter: Benedict Elliott Smith
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0, 4.0-rc
>
>         Attachments: oom-experiments.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to