[jira] [Commented] (CASSANDRA-11838) dtest failure in largecolumn_test:TestLargeColumn.cleanup_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346499#comment-15346499 ] Philip Thompson commented on CASSANDRA-11838: - So this test doesn't appear to have ever stopped failing. You want a new ticket? > dtest failure in largecolumn_test:TestLargeColumn.cleanup_test > -- > > Key: CASSANDRA-11838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11838 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Alex Petrov > Labels: dtest > Fix For: 3.8 > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log > > > Example failure at: > http://cassci.datastax.com/job/trunk_offheap_dtest/200/testReport/largecolumn_test/TestLargeColumn/cleanup_test/ > node 1 contains the following OOM in its log: > {code} > ERROR [SharedPool-Worker-1] 2016-05-16 22:54:10,112 Message.java:611 - > Unexpected exception during request; channel = [id: 0xb97f2640, > L:/127.0.0.1:9042 - R:/127.0.0.1:48190] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.transport.CBUtil.readRawBytes(CBUtil.java:533) > ~[main/:na] > at > org.apache.cassandra.transport.CBUtil.readBoundValue(CBUtil.java:407) > ~[main/:na] > at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:462) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:417) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:365) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:45) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:41) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:280) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:261) > ~[main/:na] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:277) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:264) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:962) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:879) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:360) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:276) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > ERROR [SharedPool-Worker-1] 2016-05-16 22:54:10,756 Message.java:611 - > Unexpected exception during request; channel = [id: 0xba0be401, > L:/127.0.0.1:9042 - R:/127.0.0.1:48191] >
[jira] [Commented] (CASSANDRA-11838) dtest failure in largecolumn_test:TestLargeColumn.cleanup_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309434#comment-15309434 ] Alex Petrov commented on CASSANDRA-11838: - I've been trying to get rid of the `DataOutputBuffer` instances, although there always was a reference hanging somewhere. [~tjake] pointed out that it may be happening when the instance is recycled once but not again. I've opened a [PR|https://github.com/netty/netty/pull/5333] (already merged thanks to [~norman]). In short: large / grown instances were not recycled by us, but were hanging in the array backing the {{Stack}}. I'd suggest let's trim buffers to {128} bytes (default initial value) when the instances are > 1mb until we upgrade to the version where we can throw away the recycled instances. And create a follow-up ticket to make sure we remove custom code that trims buffers and can throw the instances away. The second point (with {{fill}}) is addressed, too. I've triggered tests (including one with running the failing test many times): |[trunk|https://github.com/ifesdjeen/cassandra/tree/11838-trunk] |[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11838-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11838-trunk-dtest/]|[multiplexed|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11838-trunk-multiplexed/]| > dtest failure in largecolumn_test:TestLargeColumn.cleanup_test > -- > > Key: CASSANDRA-11838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11838 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Alex Petrov > Labels: dtest > Fix For: 3.x > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log > > > Example failure at: > http://cassci.datastax.com/job/trunk_offheap_dtest/200/testReport/largecolumn_test/TestLargeColumn/cleanup_test/ > node 1 contains the following OOM in its log: > {code} > ERROR [SharedPool-Worker-1] 2016-05-16 22:54:10,112 Message.java:611 - > Unexpected exception during request; channel = [id: 0xb97f2640, > L:/127.0.0.1:9042 - R:/127.0.0.1:48190] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.transport.CBUtil.readRawBytes(CBUtil.java:533) > ~[main/:na] > at > org.apache.cassandra.transport.CBUtil.readBoundValue(CBUtil.java:407) > ~[main/:na] > at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:462) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:417) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:365) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:45) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:41) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:280) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:261) > ~[main/:na] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:277) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:264) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at >
[jira] [Commented] (CASSANDRA-11838) dtest failure in largecolumn_test:TestLargeColumn.cleanup_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307918#comment-15307918 ] T Jake Luciani commented on CASSANDRA-11838: Nice detective work. Nits: * We should probably drop the buffer on the floor vs re-allocate then recycle. My reasoning is if there are rows >1Mb there is probably some strange workload where it's going to need to re-allocate to > 1Mb anyway. So rather than allocating twice just don't recycle the ones >1Mb. * Prefer you use Arrays.fill vs for loop for a simple clear > dtest failure in largecolumn_test:TestLargeColumn.cleanup_test > -- > > Key: CASSANDRA-11838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11838 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Alex Petrov > Labels: dtest > Fix For: 3.x > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log > > > Example failure at: > http://cassci.datastax.com/job/trunk_offheap_dtest/200/testReport/largecolumn_test/TestLargeColumn/cleanup_test/ > node 1 contains the following OOM in its log: > {code} > ERROR [SharedPool-Worker-1] 2016-05-16 22:54:10,112 Message.java:611 - > Unexpected exception during request; channel = [id: 0xb97f2640, > L:/127.0.0.1:9042 - R:/127.0.0.1:48190] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.transport.CBUtil.readRawBytes(CBUtil.java:533) > ~[main/:na] > at > org.apache.cassandra.transport.CBUtil.readBoundValue(CBUtil.java:407) > ~[main/:na] > at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:462) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:417) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:365) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:45) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:41) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:280) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:261) > ~[main/:na] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:277) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:264) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:962) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:879) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:360) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:276) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) >
[jira] [Commented] (CASSANDRA-11838) dtest failure in largecolumn_test:TestLargeColumn.cleanup_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306065#comment-15306065 ] Alex Petrov commented on CASSANDRA-11838: - I've ran a git bisect and it confirms the commit [1e92ce|https://github.com/apache/cassandra/commit/1e92ce43a5a730f81d3f6cfd72e7f4b126db788a] same mentioned [here|http://cassci.datastax.com/job/trunk_offheap_dtest/185/]. The test is failing also without {{offheap_memtables}}. The patch is cleaning up the references to values within the reused {{BTree}} and "trims" the large values within {{DataOutputBuffer}}. Since the buffer is growing and there's a setup under which the value may grow large enough, even if it's just read once, it'll remain there, so we need to clean it up. Unfortunately, we can't just skip recycling for such objects, as there's no way to pass constructor options (which may be a good thing as {{Recycler}} assumes that all instances are the same, so it can pick whichever is available), so we have to force-trim (in this case, re-allocate the buffer to some threshold value). > dtest failure in largecolumn_test:TestLargeColumn.cleanup_test > -- > > Key: CASSANDRA-11838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11838 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Alex Petrov > Labels: dtest > Fix For: 3.x > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log > > > Example failure at: > http://cassci.datastax.com/job/trunk_offheap_dtest/200/testReport/largecolumn_test/TestLargeColumn/cleanup_test/ > node 1 contains the following OOM in its log: > {code} > ERROR [SharedPool-Worker-1] 2016-05-16 22:54:10,112 Message.java:611 - > Unexpected exception during request; channel = [id: 0xb97f2640, > L:/127.0.0.1:9042 - R:/127.0.0.1:48190] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.transport.CBUtil.readRawBytes(CBUtil.java:533) > ~[main/:na] > at > org.apache.cassandra.transport.CBUtil.readBoundValue(CBUtil.java:407) > ~[main/:na] > at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:462) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:417) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryOptions$Codec.decode(QueryOptions.java:365) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:45) > ~[main/:na] > at > org.apache.cassandra.transport.messages.ExecuteMessage$1.decode(ExecuteMessage.java:41) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:280) > ~[main/:na] > at > org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:261) > ~[main/:na] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:277) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:264) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:278) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:962) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:879) > ~[netty-all-4.0.36.Final.jar:4.0.36.Final] > at >