[ https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050728#comment-17050728 ]
David Capwell edited comment on CASSANDRA-15358 at 3/4/20 3:22 AM: ------------------------------------------------------------------- Sorry its been so long... I put this under load several times without the patch and replicated with easy, and with the patch I am no longer able to replicate. I want to do one more test where I set the pool to be 0 or 1mb, just to see if I can flesh out any more issues. As far as I can tell the core change which appears to fix the issue is [here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:15358#diff-613d97c28af63fff3cf1f52baa7f6caaR647]; we switch from heap buffers to direct buffers when out of space and org.apache.cassandra.utils.memory.BufferPool.LocalPool#put(java.nio.ByteBuffer) will just release right away. General comments: * +1 to removing ALLOCATE_ON_HEAP_WHEN_EXAHUSTED and DISABLED * I don't think it makes sense to have the get methods take a BufferType, if you write {code} get(42, BufferType.ON_HEAP) instanceOf HeapByteBuffer {code} you expect that to return true (it does) AND that its buffered (it is not). I feel that every call to BufferPool with BufferType.ON_HEAP should instead just allocate the ByteBuffer directly; I do know that ChunkCache relies on this (figures out what type to do based off org.apache.cassandra.io.util.ChunkReader#preferredBufferType) but I don't feel it should in these cases. This is not a blocking comment and I am 100% fine if a different JIRA addresses. To be clear, I have not +1 because I want to test more, my goal is to sign off this week was (Author: dcapwell): Sorry its been so long... * +1 to removing ALLOCATE_ON_HEAP_WHEN_EXAHUSTED and DISABLED * The core change which appears to fix the issue is [here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:15358#diff-613d97c28af63fff3cf1f52baa7f6caaR647]; we switch from heap buffers to direct buffers and org.apache.cassandra.utils.memory.BufferPool.LocalPool#put(java.nio.ByteBuffer) will just release right away. General comments: 1) I don't think it makes sense to have the get methods take a BufferType, if you write {code} get(42, BufferType.ON_HEAP) instanceOf HeapByteBuffer {code} you expect that to return true (it does) AND that its buffered (it is not). I feel that every call to BufferPool with BufferType.ON_HEAP should instead just allocate the ByteBuffer directly. The BufferType is my only real feedback. I put this under load several times without the patch and replicated with easy, and with the patch I am no longer able to replicate. I want to do one more test where I set the pool to be 0 or 1mb, just to see if I can flesh out any more issues. > Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue > ---------------------------------------------------------------------------- > > Key: CASSANDRA-15358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15358 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark > Reporter: Santhosh Kumar Ramalingam > Assignee: Benedict Elliott Smith > Priority: Normal > Labels: 4.0, alpha > Fix For: 4.0, 4.0-beta > > Attachments: all_errors.txt, debug_logs_during_repair.txt, > repair_1_trace.txt, verbose_logs.diff, verbose_logs.txt > > > Hitting a bug with cassandra 4 alpha version. The same bug is repeated with > difefrent version of Java(8,11 &12) [~benedict] > > Stack trace: > {code:java} > INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for > CQL clients on /1.3.0.6:9042 (unencrypted)... > WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - > 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message > of type PING_REQ whose timeout expired before reaching the network > WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - > 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /1.3.0.2:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - > 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of > type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network > INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 > - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect > > ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 > InboundMessageHandler.java:657 - > 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception > caught while processing inbound messages; terminating connection > java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. > at io.netty.buffer.UnpooledDirectByteBuf.<init>(UnpooledDirectByteBuf.java:87) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:59) > at > org.apache.cassandra.net.BufferPoolAllocator$Wrapped.<init>(BufferPoolAllocator.java:95) > at > org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) > at > io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) > at > io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > at > io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:777) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:835) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org