[ https://issues.apache.org/jira/browse/CASSANDRA-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826879#comment-15826879 ]
Tom van der Woerdt edited comment on CASSANDRA-13126 at 1/17/17 9:43 PM: ------------------------------------------------------------------------- Apparently I do! {code} ERROR [SharedPool-Worker-1] 2017-01-11 15:26:59,533 Message.java:617 - Unexpected exception during request; channel = [id: 0xc259e8df, /1.2.3.4:45232 => /5.6.7.8:9042] io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_112] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_112] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_112] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] ... 9 common frames omitted {code} was (Author: tvdw): Apparently I do! {code} at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] ERROR [SharedPool-Worker-1] 2017-01-11 15:26:59,533 Message.java:617 - Unexpected exception during request; channel = [id: 0xc259e8df, /1.2.3.4:45232 => /5.6.7.8:9042] io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_112] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_112] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_112] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] ... 9 common frames omitted {code} > native transport protocol corruption when using SSL > --------------------------------------------------- > > Key: CASSANDRA-13126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13126 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tom van der Woerdt > Priority: Critical > > This is a series of conditions that can result in client connections becoming > unusable. > 1) Cassandra GC must be well-tuned, to have short GC pauses every minute or so > 2) *client* SSL must be enabled and transmitting a significant amount of data > 3) Cassandra must run with the default library versions > 4) disableexplicitgc must be set (this is the default in the current > cassandra-env.sh) > This ticket relates to CASSANDRA-13114 which is a possible workaround (but > not a fix) for the SSL requirement to trigger this bug. > * Netty allocates nio.ByteBuffers for every outgoing SSL message. > * ByteBuffers consist of two parts, the jvm object and the off-heap object. > The jvm object is small and goes with regular GC cycles, the off-heap object > gets freed only when the small jvm object is freed. To avoid exploding the > native memory use, the jvm defaults to limiting its allocation to the max > heap size. Allocating beyond that limit triggers a System.gc(), a retry, and > potentially an exception. > * System.gc is a no-op under disableexplicitgc > * This means ByteBuffers are likely to throw an exception when too many > objects are being allocated > * The netty version shipped in Cassandra is broken when using SSL (see > CASSANDRA-13114) and causes significantly too many bytebuffers to be > allocated. > This gets more complicated though. > When /some/ clients use SSL, and others don't, the clients not using SSL can > still be affected by this bug, as bytebuffer starvation caused by ssl will > leak to other users. > ByteBuffers are used very early on in the native protocol as well. Before > even being able to decode the network protocol, this error can be thrown : > {noformat} > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > {noformat} > Note that this comes back with stream_id 0, so clients end up waiting for the > client timeout before the query is considered failed and retried. > A few frames later on the same connection, this appears: > {noformat} > Provided frame does not appear to be Snappy compressed > {noformat} > And after that everything errors out with: > {noformat} > Invalid or unsupported protocol version (54); the lowest supported version is > 3 and the greatest is 4 > {noformat} > So this bug ultimately affects the binary protocol and the connection becomes > useless if not downright dangerous. > I think there are several things that need to be done here. > * CASSANDRA-13114 should be fixed (easy, and probably needs to land in 3.0.11 > anyway) > * Connections should be closed after a DecoderException > * DisableExplicitGC should be removed from the default JVM arguments > Any of these three would limit the impact to clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)