[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-02 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188831#comment-16188831
 ] 

Dan Kinder commented on CASSANDRA-13923:


>From my own poking around, 
>https://issues.apache.org/jira/browse/CASSANDRA-9882 seemed related.

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189361#comment-16189361
 ] 

Marcus Eriksson commented on CASSANDRA-13923:
-

Looking at the stack traces, we should clearly avoid grabbing the read lock on 
CSM when checking shouldDefragment() - if we race with swapping out the 
compaction strategies, just return false. Maybe we should finally remove the 
defragment code altogether 

Also, there is 
{{org.apache.cassandra.db.lifecycle.Helpers.prepareForObsoletion}} in the other 
stack trace, this was optimised for TRUNCATE in CASSANDRA-13909 - we might need 
it something here as well. 

I'll keep looking, any chance you could provide some jstack traces with {{-l}} 
[~dkinder]?

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189378#comment-16189378
 ] 

Marcus Eriksson commented on CASSANDRA-13923:
-

also it is not impossible you are suffering from CASSANDRA-13215 - could you 
test run a patch [~dkinder]?

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189576#comment-16189576
 ] 

Marcus Eriksson commented on CASSANDRA-13923:
-

created https://issues.apache.org/jira/browse/CASSANDRA-13930 for the defrag 
issue

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-03 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189886#comment-16189886
 ] 

Dan Kinder commented on CASSANDRA-13923:


I did notice that startup time was significantly slower now on 3.11, similar to 
the symptom reported in https://issues.apache.org/jira/browse/CASSANDRA-13215

I'll try to get you the other info shortly and apply that patch. Unfortunately 
jstack sometimes NPEs, so once I get lucky you'll get one with -l

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-03 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189997#comment-16189997
 ] 

Dan Kinder commented on CASSANDRA-13923:


Brought up one node with [the 
patch|https://issues.apache.org/jira/secure/attachment/12852352/simple-cache.patch]
 -- wow, it starts up WAY faster. It also seemed to properly complete flushes 
on that node so I'm pushing out to other nodes now to see how it does.

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-04 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191699#comment-16191699
 ] 

Dan Kinder commented on CASSANDRA-13923:


So far the nodes seem to be performing better, they are doing a lot of 
compactions. So, that patch seemed to do the trick.

Sadly I was not able to get a jstack with -l.

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13923) Flushers blocked due to many SSTables

2017-10-16 Thread Dan Kinder (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206598#comment-16206598
 ] 

Dan Kinder commented on CASSANDRA-13923:


This seems just as related to CASSANDRA-13948

> Flushers blocked due to many SSTables
> -
>
> Key: CASSANDRA-13923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Local Write-Read Paths
> Environment: Cassandra 3.11.0
> Centos 6 (downgraded JNA)
> 64GB RAM
> 12-disk JBOD
>Reporter: Dan Kinder
>Assignee: Marcus Eriksson
> Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
>
>
> This started on the mailing list and I'm not 100% sure of the root cause, 
> feel free to re-title if needed.
> I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving 
> traffic, thread pools begin to back up and grow pending tasks indefinitely. 
> This happens to multiple different stages (Read, Mutation) and consistently 
> builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
> Using jstack shows that there is blocking going on when trying to call 
> getCompactionCandidates, which seems to happen on flush. We have fairly large 
> nodes that have ~15,000 SSTables per node, all LCS.
> I seems like this can cause reads to get blocked because they try to acquire 
> a read lock when calling shouldDefragment.
> And writes, of course, block once we can't allocate anymore memtables, 
> because flushes are backed up.
> We did not have this problem in 2.2.6, so it seems like there is some 
> regression causing it to be incredibly slow trying to do calls like 
> getCompactionCandidates that list out the SSTables.
> In our case this causes nodes to build up pending tasks and simply stop 
> responding to requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org