[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

David Capwell (Jira) Tue, 03 Mar 2020 19:23:22 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050728#comment-17050728
 ]


David Capwell edited comment on CASSANDRA-15358 at 3/4/20 3:22 AM:
-------------------------------------------------------------------

Sorry its been so long...

I put this under load several times without the patch and replicated with easy, 
and with the patch I am no longer able to replicate.  I want to do one more 
test where I set the pool to be 0 or 1mb, just to see if I can flesh out any 
more issues.

As far as I can tell the core change which appears to fix the issue is 
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:15358#diff-613d97c28af63fff3cf1f52baa7f6caaR647];
 we switch from heap buffers to direct buffers when out of space and 
org.apache.cassandra.utils.memory.BufferPool.LocalPool#put(java.nio.ByteBuffer) 
will just release right away.

General comments:
* +1 to removing ALLOCATE_ON_HEAP_WHEN_EXAHUSTED and DISABLED
* I don't think it makes sense to have the get methods take a BufferType, if 
you write 

{code}
get(42, BufferType.ON_HEAP) instanceOf HeapByteBuffer
{code}

you expect that to return true (it does) AND that its buffered (it is not).  I 
feel that every call to BufferPool with BufferType.ON_HEAP should instead just 
allocate the ByteBuffer directly; I do know that ChunkCache relies on this 
(figures out what type to do based off 
org.apache.cassandra.io.util.ChunkReader#preferredBufferType) but I don't feel 
it should in these cases. This is not a blocking comment and I am 100% fine if 
a different JIRA addresses.


To be clear, I have not +1 because I want to test more, my goal is to sign off 
this week


was (Author: dcapwell):
Sorry its been so long...

* +1 to removing ALLOCATE_ON_HEAP_WHEN_EXAHUSTED and DISABLED
* The core change which appears to fix the issue is 
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:15358#diff-613d97c28af63fff3cf1f52baa7f6caaR647];
 we switch from heap buffers to direct buffers and 
org.apache.cassandra.utils.memory.BufferPool.LocalPool#put(java.nio.ByteBuffer) 
will just release right away.

General comments:
1) I don't think it makes sense to have the get methods take a BufferType, if 
you write 

{code}
get(42, BufferType.ON_HEAP) instanceOf HeapByteBuffer
{code}

you expect that to return true (it does) AND that its buffered (it is not).  I 
feel that every call to BufferPool with BufferType.ON_HEAP should instead just 
allocate the ByteBuffer directly.

The BufferType is my only real feedback.  I put this under load several times 
without the patch and replicated with easy, and with the patch I am no longer 
able to replicate.  I want to do one more test where I set the pool to be 0 or 
1mb, just to see if I can flesh out any more issues.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15358
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/benchmark
>            Reporter: Santhosh Kumar Ramalingam
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>              Labels: 4.0, alpha
>             Fix For: 4.0, 4.0-beta
>
>         Attachments: all_errors.txt, debug_logs_during_repair.txt, 
> repair_1_trace.txt, verbose_logs.diff, verbose_logs.txt
>
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.<init>(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.<init>(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
> at 
> io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
> at 
> io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
> at 
> io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:777)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:835)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

Reply via email to