[jira] [Updated] (CASSANDRA-15186) InternodeOutboundMetrics overloaded bytes/count mixup

2019-07-12 Thread Jeremy Hanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-15186:
-
Complexity: Low Hanging Fruit

> InternodeOutboundMetrics overloaded bytes/count mixup
> -
>
> Key: CASSANDRA-15186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15186
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Marcus Olsson
>Priority: Normal
>
> In 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java]
>  there is a small mixup between overloaded count and bytes, in 
> [LargeMessageDroppedTasksDueToOverload|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java#L129]
>  and 
> [UrgentMessageDroppedTasksDueToOverload|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/InternodeOutboundMetrics.java#L151].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-12 Thread Sam Tunnicliffe (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15097:

Status: Changes Suggested  (was: Review In Progress)

Thanks [~jay.zhuang], this looks like a reasonable change to me. It does need 
rebasing though as a couple of other changes touching {{Gossiper}} have landed 
recently. If you take care of that I'll re-run the CI with the HIRES config.


> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-12 Thread Alex Petrov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883755#comment-16883755
 ] 

Alex Petrov edited comment on CASSANDRA-15170 at 7/12/19 12:21 PM:
---

[~jmeredithco] thank you for the patch. I have several minor nits:

  * {{numClusterNodes}} seems to be unused in {{ResourceLeakTest}}
  * I'm not 100% sure why we need changes to logging to remove instance IDs 
from some log messages and adding {{INSTANCE}} prefix to logger names.
  * We have a [shutdown 
hook|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L641],
 which should be using the instance class loader, but because we're running it 
after the instance class loader is already shut down, we get the exception [1]. 
The error message it throws is unclear, and I would probably override 
{InstanceClassLoader#close} to make it more obvious what's going on: if class 
loader is already closed, we should thrown with a message that it's been 
already shut down. In addition to this, I'd probably avoid adding a JVM 
shutdown hook, and close this explicitly. I think this was existing prior to 
this patch. 
 * On multiple runs, I've also seen the exceptions [2], [3], and [4]. I'm not 
claiming that this patch has caused them.  
 * We're seemingly logging each log message twice right now. I think this is 
also pre-existing, and this can be resolved by using only one of the two 
console appenders.
 
[1]
{code}
java.lang.NoClassDefFoundError: 
org/apache/cassandra/utils/logging/LoggingSupportFactory
at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:638)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.utils.logging.LoggingSupportFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClassInternal(InstanceClassLoader.java:95)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClass(InstanceClassLoader.java:84)
... 4 more
{code}

[2]
{code}
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:162)
at 
org.apache.cassandra.db.ColumnFamilyStore.waitForFlushes(ColumnFamilyStore.java:907)
at 
org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:873)
at 
org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$19(SchemaKeyspace.java:348)
at 
com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
at 
org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:348)
at 
org.apache.cassandra.schema.SchemaKeyspace.applyChanges(SchemaKeyspace.java:1282)
at org.apache.cassandra.schema.Schema.merge(Schema.java:653)
at 
org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:586)
at 
org.apache.cassandra.schema.MigrationTask.lambda$runMayThrow$0(MigrationTask.java:91)
at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:58)
at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:885)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}

[3]
{code}
SEVERE: RuntimeException while executing runnable 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1@46975039 with executor 
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor@7616fb13[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21]

[jira] [Commented] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11

2019-07-12 Thread Andy Tolbert (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883754#comment-16883754
 ] 

Andy Tolbert commented on CASSANDRA-15206:
--

I'm somewhat confident this isn't related to building with JDK11.  Just to 
verify I built latest trunk (149caf01) and everything seems in working order.  
Rather I suspect a compatibility issue with cqlsh and the python-driver being 
used.

cqlsh scans your lib directory for a file name starting with 
'cassandra-driver-internal-only-' and adds it to the beginning of the python 
lib path.  In the case of trunk, there should be a file named 
'cassandra-driver-internal-only-3.12.0.post0-5838e2fd.zip'.   Can you verify 
whether or not that is the case?  What could be happening is the file is not 
found, and python is falling back on your installed libraries, and maybe you 
have a different version of the python driver installed than expected.  Or 
maybe there are multiple files prefixed with 'cassandra-driver-internal-only-' 
in your path and an older one is being used.

Another possibility is that you may have moved cqlsh/cqlsh.py out of the bin 
directory and therefore the zip file can't be found in a relative way, and 
cqlsh is falling back on your installed libraries, which may contain a 
different version of the python driver.  Although I see in your stack trace 
that the script is running out of cassandra/bin, so maybe that isn't the case.

> cassandra 4.0 cqlsh not working with jdk 11
> ---
>
> Key: CASSANDRA-15206
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15206
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: RamyaK
>Priority: Urgent
>
> Im able to start cassandra by compiling the latest git code with OpenJDK11, 
> but facing below error with cqlsh. please suggest.
>  
> Traceback (most recent call last):
>   File "/home/id/cassandra/bin/cqlsh.py", line 2520, in 
>     main(*read_options(sys.argv[1:], os.environ))
>   File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main
>     allow_server_port_discovery=options.allow_server_port_discovery)
>   File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__
>     **kwargs)
>   File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__
> TypeError: __init__() got an unexpected keyword argument 
> 'allow_server_port_discovery'



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-12 Thread Alex Petrov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883755#comment-16883755
 ] 

Alex Petrov commented on CASSANDRA-15170:
-

[~jmeredithco] thank you for the patch. I have several minor nits:

  * {{numClusterNodes}} seems to be unused in {{ResourceLeakTest}}
  * I'm not 100% sure why we need changes to logging to remove instance IDs 
from some log messages and adding {{INSTANCE}} prefix to logger names.
  * We have a [shutdown 
hook|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L641],
 which should be using the instance class loader, but because we're running it 
after the instance class loader is already shut down, we get the exception [1]. 
The error message it throws is unclear, and I would probably override 
{InstanceClassLoader#close} to make it more obvious what's going on: if class 
loader is already closed, we should thrown with a message that it's been 
already shut down. In addition to this, I'd probably avoid adding a JVM 
shutdown hook, and close this explicitly. I think this was existing prior to 
this patch. 
 * On multiple runs, I've also seen the exceptions [2] and [3]. I'm not 
claiming that this patch has caused them.  
 * We're seemingly logging each log message twice right now. I think this is 
also pre-existing, and this can be resolved by using only one of the two 
console appenders.
 
[1]
{code}
java.lang.NoClassDefFoundError: 
org/apache/cassandra/utils/logging/LoggingSupportFactory
at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:638)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.utils.logging.LoggingSupportFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClassInternal(InstanceClassLoader.java:95)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClass(InstanceClassLoader.java:84)
... 4 more
{code}

[2]
{code}
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:162)
at 
org.apache.cassandra.db.ColumnFamilyStore.waitForFlushes(ColumnFamilyStore.java:907)
at 
org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:873)
at 
org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$19(SchemaKeyspace.java:348)
at 
com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
at 
org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:348)
at 
org.apache.cassandra.schema.SchemaKeyspace.applyChanges(SchemaKeyspace.java:1282)
at org.apache.cassandra.schema.Schema.merge(Schema.java:653)
at 
org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:586)
at 
org.apache.cassandra.schema.MigrationTask.lambda$runMayThrow$0(MigrationTask.java:91)
at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:58)
at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:885)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}

[3]
{code}
SEVERE: RuntimeException while executing runnable 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1@46975039 with executor 
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor@7616fb13[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 

[jira] [Updated] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11

2019-07-12 Thread RamyaK (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RamyaK updated CASSANDRA-15206:
---
   Severity: Critical
 Complexity: Challenging
   Platform: Java11,OpenJDK  (was: All)
Impacts:   (was: None)
Description: 
Im able to start cassandra by compiling the latest git code with OpenJDK11, but 
facing below error with cqlsh. please suggest.

 

Traceback (most recent call last):
  File "/home/id/cassandra/bin/cqlsh.py", line 2520, in 
    main(*read_options(sys.argv[1:], os.environ))
  File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main
    allow_server_port_discovery=options.allow_server_port_discovery)
  File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__
    **kwargs)
  File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__
TypeError: __init__() got an unexpected keyword argument 
'allow_server_port_discovery'

  was:
 

 


> cassandra 4.0 cqlsh not working with jdk 11
> ---
>
> Key: CASSANDRA-15206
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15206
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: RamyaK
>Priority: Urgent
>
> Im able to start cassandra by compiling the latest git code with OpenJDK11, 
> but facing below error with cqlsh. please suggest.
>  
> Traceback (most recent call last):
>   File "/home/id/cassandra/bin/cqlsh.py", line 2520, in 
>     main(*read_options(sys.argv[1:], os.environ))
>   File "/home/id/cassandra/bin/cqlsh.py", line 2498, in main
>     allow_server_port_discovery=options.allow_server_port_discovery)
>   File "/home/id/cassandra/bin/cqlsh.py", line 491, in __init__
>     **kwargs)
>   File "cassandra/cluster.py", line 802, in cassandra.cluster.Cluster.__init__
> TypeError: __init__() got an unexpected keyword argument 
> 'allow_server_port_discovery'



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-12 Thread Sam Tunnicliffe (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15097:

Reviewers: Sam Tunnicliffe
   Status: Review In Progress  (was: Patch Available)

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15206) cassandra 4.0 cqlsh not working with jdk 11

2019-07-12 Thread RamyaK (JIRA)
RamyaK created CASSANDRA-15206:
--

 Summary: cassandra 4.0 cqlsh not working with jdk 11
 Key: CASSANDRA-15206
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15206
 Project: Cassandra
  Issue Type: Bug
  Components: Tool/cqlsh
Reporter: RamyaK


 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-07-12 Thread Sumanth Pasupuleti (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883581#comment-16883581
 ] 

Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 7/12/19 6:57 AM:
-

Sure [~benedict]. Here are the patches:

*3.0*
Patch:  [^15013-3.0.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238
Github: 
https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1

*3.11*
Patch:  [^15013-3.11.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac
Github: 
https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1

*trunk*
Patch:  [^15013-trunk.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607
Github: 
https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1



was (Author: sumanth.pasupuleti):
Sure [~benedict]. Here are the patches:

*3.0*
Patch:  [^15013-3.0.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238
Github branch: 
https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1

*3.11*
Patch:  [^15013-3.11.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac
Github branch: 
https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1

*trunk*
Patch:  [^15013-trunk.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607
Github branch: 
https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1


> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, 
> BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png, 
> perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, 
> perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, 
> perftest2_heap.png, perftest2_read_latency_99th.png, 
> perftest2_read_latency_avg.png, perftest2_readops.png, 
> perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, 
> perftest2_writeops.png, perftest_blockedthreads.png, 
> perftest_connections_count.png, perftest_cpu_usage.png, 
> perftest_heap_usage.png, perftest_readlatency_99th.png, 
> perftest_readlatency_avg.png, perftest_readops.png, 
> perftest_writelatency_99th.png, perftest_writelatency_avg.png, 
> perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-07-12 Thread Sumanth Pasupuleti (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883581#comment-16883581
 ] 

Sumanth Pasupuleti commented on CASSANDRA-15013:


Sure [~benedict]. Here are the patches:

*3.0*
Patch:  [^15013-3.0.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238
Github branch: 
https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1

*3.11*
Patch:  [^15013-3.11.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac
Github branch: 
https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1

*trunk*
Patch:  [^15013-trunk.txt] 
Passing UTs and DTests 
https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607
Github branch: 
https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1


> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, 
> BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png, 
> perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, 
> perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, 
> perftest2_heap.png, perftest2_read_latency_99th.png, 
> perftest2_read_latency_avg.png, perftest2_readops.png, 
> perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, 
> perftest2_writeops.png, perftest_blockedthreads.png, 
> perftest_connections_count.png, perftest_cpu_usage.png, 
> perftest_heap_usage.png, perftest_readlatency_99th.png, 
> perftest_readlatency_avg.png, perftest_readops.png, 
> perftest_writelatency_99th.png, perftest_writelatency_avg.png, 
> perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

2019-07-12 Thread Sumanth Pasupuleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanth Pasupuleti updated CASSANDRA-15013:
---
Attachment: 15013-trunk.txt
15013-3.0.txt
15013-3.11.txt

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> ---
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, 
> BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png, 
> perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, 
> perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, 
> perftest2_heap.png, perftest2_read_latency_99th.png, 
> perftest2_read_latency_avg.png, perftest2_readops.png, 
> perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, 
> perftest2_writeops.png, perftest_blockedthreads.png, 
> perftest_connections_count.png, perftest_cpu_usage.png, 
> perftest_heap_usage.png, perftest_readlatency_99th.png, 
> perftest_readlatency_avg.png, perftest_readops.png, 
> perftest_writelatency_99th.png, perftest_writelatency_avg.png, 
> perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org