[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Fix Version/s: 3.0.26 3.11.12 4.0.2 Since Version: 3.0.0 Source Control Link: https://github.com/apache/cassandra/commit/97b47c3b5f845097181130125752bd6efc1e1e47 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Fix For: 3.0.26, 3.11.12, 4.0.2 > > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Test and Documentation Plan: Pre-commit tests on 3.0, 3.11, 4.0 and trunk. Included microbench shows reduction from ~140ms / to 20ms / load in keycache loading with 1000 sstables. was:Pre-commit tests on 3.0, 3.11, 4.0 and trunk. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Status: Ready to Commit (was: Review In Progress) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Test and Documentation Plan: Pre-commit tests on 3.0, 3.11, 4.0 and trunk. Status: Patch Available (was: Open) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Reviewers: Joey Lynch, Marcus Eriksson, Joey Lynch (was: Joey Lynch, Marcus Eriksson) Joey Lynch, Marcus Eriksson, Joey Lynch (was: Joey Lynch, Marcus Eriksson) Status: Review In Progress (was: Patch Available) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Lynch updated CASSANDRA-14898: --- Reviewers: Joey Lynch, Marcus Eriksson (was: Marcus Eriksson) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14898: Reviewers: Marcus Eriksson > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 20m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14898: - Labels: Performance low-hanging-fruit (was: ) > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joseph Lynch >Priority: Minor > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org