[jira] [Updated] (CASSANDRA-15186) InternodeOutboundMetrics overloaded bytes/count mixup
[ https://issues.apache.org/jira/browse/CASSANDRA-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15186: - Complexity: Low Hanging Fruit > InternodeOutboundMetrics overloaded bytes/count mixup > - > > Key: CASSANDRA-15186 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15186 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Marcus Olsson >Priority: Normal > > In > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java] > there is a small mixup between overloaded count and bytes, in > [LargeMessageDroppedTasksDueToOverload|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java#L129] > and > [UrgentMessageDroppedTasksDueToOverload|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java#L151]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-15097: Status: Changes Suggested (was: Review In Progress) Thanks [~jay.zhuang], this looks like a reasonable change to me. It does need rebasing though as a couple of other changes touching {{Gossiper}} have landed recently. If you take care of that I'll re-run the CI with the HIRES config. > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close
[ https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883755#comment-16883755 ] Alex Petrov edited comment on CASSANDRA-15170 at 7/12/19 12:21 PM: --- [~jmeredithco] thank you for the patch. I have several minor nits: * {{numClusterNodes}} seems to be unused in {{ResourceLeakTest}} * I'm not 100% sure why we need changes to logging to remove instance IDs from some log messages and adding {{INSTANCE}} prefix to logger names. * We have a [shutdown hook|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L641], which should be using the instance class loader, but because we're running it after the instance class loader is already shut down, we get the exception [1]. The error message it throws is unclear, and I would probably override {InstanceClassLoader#close} to make it more obvious what's going on: if class loader is already closed, we should thrown with a message that it's been already shut down. In addition to this, I'd probably avoid adding a JVM shutdown hook, and close this explicitly. I think this was existing prior to this patch. * On multiple runs, I've also seen the exceptions [2], [3], and [4]. I'm not claiming that this patch has caused them. * We're seemingly logging each log message twice right now. I think this is also pre-existing, and this can be resolved by using only one of the two console appenders. [1] {code} java.lang.NoClassDefFoundError: org/apache/cassandra/utils/logging/LoggingSupportFactory at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:638) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.utils.logging.LoggingSupportFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClassInternal(InstanceClassLoader.java:95) at org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClass(InstanceClassLoader.java:84) ... 4 more {code} [2] {code} java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:162) at org.apache.cassandra.db.ColumnFamilyStore.waitForFlushes(ColumnFamilyStore.java:907) at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:873) at org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$19(SchemaKeyspace.java:348) at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407) at org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:348) at org.apache.cassandra.schema.SchemaKeyspace.applyChanges(SchemaKeyspace.java:1282) at org.apache.cassandra.schema.Schema.merge(Schema.java:653) at org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:586) at org.apache.cassandra.schema.MigrationTask.lambda$runMayThrow$0(MigrationTask.java:91) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:58) at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:885) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) {code} [3] {code} SEVERE: RuntimeException while executing runnable org.apache.cassandra.db.ColumnFamilyStore$Flush$1@46975039 with executor org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor@7616fb13[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21]
[jira] [Commented] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11
[ https://issues.apache.org/jira/browse/CASSANDRA-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883754#comment-16883754 ] Andy Tolbert commented on CASSANDRA-15206: -- I'm somewhat confident this isn't related to building with JDK11. Just to verify I built latest trunk (149caf01) and everything seems in working order. Rather I suspect a compatibility issue with cqlsh and the python-driver being used. cqlsh scans your lib directory for a file name starting with 'cassandra-driver-internal-only-' and adds it to the beginning of the python lib path. In the case of trunk, there should be a file named 'cassandra-driver-internal-only-3.12.0.post0-5838e2fd.zip'. Can you verify whether or not that is the case? What could be happening is the file is not found, and python is falling back on your installed libraries, and maybe you have a different version of the python driver installed than expected. Or maybe there are multiple files prefixed with 'cassandra-driver-internal-only-' in your path and an older one is being used. Another possibility is that you may have moved cqlsh/cqlsh.py out of the bin directory and therefore the zip file can't be found in a relative way, and cqlsh is falling back on your installed libraries, which may contain a different version of the python driver. Although I see in your stack trace that the script is running out of cassandra/bin, so maybe that isn't the case. > cassandra 4.0 cqlsh not working with jdk 11 > --- > > Key: CASSANDRA-15206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15206 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: RamyaK >Priority: Urgent > > Im able to start cassandra by compiling the latest git code with OpenJDK11, > but facing below error with cqlsh. please suggest. > > Traceback (most recent call last): > File "/home/id/cassandra/bin/cqlsh.py", line 2520, in > main(*read_options(sys.argv[1:], os.environ)) > File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main > allow_server_port_discovery=options.allow_server_port_discovery) > File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__ > **kwargs) > File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__ > TypeError: __init__() got an unexpected keyword argument > 'allow_server_port_discovery' -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close
[ https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883755#comment-16883755 ] Alex Petrov commented on CASSANDRA-15170: - [~jmeredithco] thank you for the patch. I have several minor nits: * {{numClusterNodes}} seems to be unused in {{ResourceLeakTest}} * I'm not 100% sure why we need changes to logging to remove instance IDs from some log messages and adding {{INSTANCE}} prefix to logger names. * We have a [shutdown hook|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L641], which should be using the instance class loader, but because we're running it after the instance class loader is already shut down, we get the exception [1]. The error message it throws is unclear, and I would probably override {InstanceClassLoader#close} to make it more obvious what's going on: if class loader is already closed, we should thrown with a message that it's been already shut down. In addition to this, I'd probably avoid adding a JVM shutdown hook, and close this explicitly. I think this was existing prior to this patch. * On multiple runs, I've also seen the exceptions [2] and [3]. I'm not claiming that this patch has caused them. * We're seemingly logging each log message twice right now. I think this is also pre-existing, and this can be resolved by using only one of the two console appenders. [1] {code} java.lang.NoClassDefFoundError: org/apache/cassandra/utils/logging/LoggingSupportFactory at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:638) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.utils.logging.LoggingSupportFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClassInternal(InstanceClassLoader.java:95) at org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClass(InstanceClassLoader.java:84) ... 4 more {code} [2] {code} java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:162) at org.apache.cassandra.db.ColumnFamilyStore.waitForFlushes(ColumnFamilyStore.java:907) at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:873) at org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$19(SchemaKeyspace.java:348) at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407) at org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:348) at org.apache.cassandra.schema.SchemaKeyspace.applyChanges(SchemaKeyspace.java:1282) at org.apache.cassandra.schema.Schema.merge(Schema.java:653) at org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:586) at org.apache.cassandra.schema.MigrationTask.lambda$runMayThrow$0(MigrationTask.java:91) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:58) at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:885) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) {code} [3] {code} SEVERE: RuntimeException while executing runnable org.apache.cassandra.db.ColumnFamilyStore$Flush$1@46975039 with executor org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor@7616fb13[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
[jira] [Updated] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11
[ https://issues.apache.org/jira/browse/CASSANDRA-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RamyaK updated CASSANDRA-15206: --- Severity: Critical Complexity: Challenging Platform: Java11,OpenJDK (was: All) Impacts: (was: None) Description: Im able to start cassandra by compiling the latest git code with OpenJDK11, but facing below error with cqlsh. please suggest. Traceback (most recent call last): File "/home/id/cassandra/bin/cqlsh.py", line 2520, in main(*read_options(sys.argv[1:], os.environ)) File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main allow_server_port_discovery=options.allow_server_port_discovery) File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__ **kwargs) File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__ TypeError: __init__() got an unexpected keyword argument 'allow_server_port_discovery' was: > cassandra 4.0 cqlsh not working with jdk 11 > --- > > Key: CASSANDRA-15206 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15206 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: RamyaK >Priority: Urgent > > Im able to start cassandra by compiling the latest git code with OpenJDK11, > but facing below error with cqlsh. please suggest. > > Traceback (most recent call last): > File "/home/id/cassandra/bin/cqlsh.py", line 2520, in > main(*read_options(sys.argv[1:], os.environ)) > File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main > allow_server_port_discovery=options.allow_server_port_discovery) > File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__ > **kwargs) > File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__ > TypeError: __init__() got an unexpected keyword argument > 'allow_server_port_discovery' -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-15097: Reviewers: Sam Tunnicliffe Status: Review In Progress (was: Patch Available) > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11
RamyaK created CASSANDRA-15206: -- Summary: cassandra 4.0 cqlsh not working with jdk 11 Key: CASSANDRA-15206 URL: https://issues.apache.org/jira/browse/CASSANDRA-15206 Project: Cassandra Issue Type: Bug Components: Tool/cqlsh Reporter: RamyaK -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883581#comment-16883581 ] Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 7/12/19 6:57 AM: - Sure [~benedict]. Here are the patches: *3.0* Patch: [^15013-3.0.txt] Passing UTs and DTests https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238 Github: https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1 *3.11* Patch: [^15013-3.11.txt] Passing UTs and DTests https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac Github: https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1 *trunk* Patch: [^15013-trunk.txt] Passing UTs and DTests https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607 Github: https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1 was (Author: sumanth.pasupuleti): Sure [~benedict]. Here are the patches: *3.0* Patch: [^15013-3.0.txt] Passing UTs and DTests https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238 Github branch: https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1 *3.11* Patch: [^15013-3.11.txt] Passing UTs and DTests https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac Github branch: https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1 *trunk* Patch: [^15013-trunk.txt] Passing UTs and DTests https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607 Github branch: https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, > BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, > perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, > perftest2_heap.png, perftest2_read_latency_99th.png, > perftest2_read_latency_avg.png, perftest2_readops.png, > perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, > perftest2_writeops.png, perftest_blockedthreads.png, > perftest_connections_count.png, perftest_cpu_usage.png, > perftest_heap_usage.png, perftest_readlatency_99th.png, > perftest_readlatency_avg.png, perftest_readops.png, > perftest_writelatency_99th.png, perftest_writelatency_avg.png, > perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883581#comment-16883581 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Sure [~benedict]. Here are the patches: *3.0* Patch: [^15013-3.0.txt] Passing UTs and DTests https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238 Github branch: https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1 *3.11* Patch: [^15013-3.11.txt] Passing UTs and DTests https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac Github branch: https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1 *trunk* Patch: [^15013-trunk.txt] Passing UTs and DTests https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607 Github branch: https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, > BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, > perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, > perftest2_heap.png, perftest2_read_latency_99th.png, > perftest2_read_latency_avg.png, perftest2_readops.png, > perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, > perftest2_writeops.png, perftest_blockedthreads.png, > perftest_connections_count.png, perftest_cpu_usage.png, > perftest_heap_usage.png, perftest_readlatency_99th.png, > perftest_readlatency_avg.png, perftest_readops.png, > perftest_writelatency_99th.png, perftest_writelatency_avg.png, > perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanth Pasupuleti updated CASSANDRA-15013: --- Attachment: 15013-trunk.txt 15013-3.0.txt 15013-3.11.txt > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, > BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, > perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, > perftest2_heap.png, perftest2_read_latency_99th.png, > perftest2_read_latency_avg.png, perftest2_readops.png, > perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, > perftest2_writeops.png, perftest_blockedthreads.png, > perftest_connections_count.png, perftest_cpu_usage.png, > perftest_heap_usage.png, perftest_readlatency_99th.png, > perftest_readlatency_avg.png, perftest_readops.png, > perftest_writelatency_99th.png, perftest_writelatency_avg.png, > perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org