[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456521#comment-17456521 ] Joey Lynch edited comment on CASSANDRA-14898 at 12/9/21, 4:04 PM: -- Committed to 3.0, 3.11, 4.0 and trunk. Thank you [~n.v.harikrishna] for your contribution! trunk: [97b47c3b|https://github.com/apache/cassandra/commit/97b47c3b5f845097181130125752bd6efc1e1e47] 4.0: [e73d05bf|https://github.com/apache/cassandra/commit/e73d05bf858fa195ac2f2778027ef0d2ebcd3abd] 3.11: [1fce84f9|https://github.com/apache/cassandra/commit/1fce84f9833bd62227dbf8f5d063935457dbc18e] 3.0: [1911a887|https://github.com/apache/cassandra/commit/1911a887e8316d343c9bfe3aca3f9d143e9f4a61] was (Author: jolynch): Committed to 3.0, 3.11, 4.0 and trunk. Thank you [~n.v.harikrishna] for your contribution! > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Fix For: 3.0.26, 3.11.12, 4.0.2 > > Attachments: key_cache_load_slow.svg > > Time Spent: 1h > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456489#comment-17456489 ] Joey Lynch edited comment on CASSANDRA-14898 at 12/9/21, 3:22 PM: -- *trunk* - [Java 11 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/67/workflows/6a98bc38-3021-4763-906b-f84099157ba0], [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/67/workflows/06b843ac-a0f9-4123-b14a-965dc8f22755] 4 dtest failures in: * (J8) test_bootstrap_with_reset_bootstrap_state - bootstrap_test.TestBootstrap * (J8) test_compression_cql_options - compression_test.TestCompression * (J11) test_multiple_repair - repair_tests.incremental_repair_test.TestIncRepair * (J11, marked flakey) test_bootstrap_with_reset_bootstrap_state - bootstrap_test.TestBootstrap 1 JVM dtest failure: * (J8): readWriteDuringBootstrapTest - org.apache.cassandra.distributed.test.ring.BootstrapTest *4.0* - [Java 11 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/68/workflows/2fe689eb-1938-42fd-b5cb-abab32cf0d1f] [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/68/workflows/7929a6c9-cc9c-4c5e-b271-cc276fc1d9a1] 1 JVM dtest failure: * (J8, J11) readWriteDuringBootstrapTest - org.apache.cassandra.distributed.test.ring.BootstrapTest *3.11* [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/65/workflows/6e86d386-4057-4d83-aab1-4d3731023f5d] 1 JVM dtest failure: * (J8) bootstrapTest - org.apache.cassandra.distributed.test.BootstrapTest *3.0* [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/64/workflows/5f7095bf-5031-465f-9d82-d574991e21ac] 1 JVM dtest failure: * (J8) bootstrapTest - org.apache.cassandra.distributed.test.BootstrapTest Looking at the tests that failed I don't think any of these look related to the change. A local run of readWriteDuringBootstrapTest passes on 4.0 and bootstrapTest passed on 3.0, plus I'm running a trunk [CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/69/workflows/58cdf28a-8e13-4920-94a2-870d0fe00790] run to see if readWriteDuringBootstrapTest flakes on trunk as well without the patch. I believe this is ready to commit. was (Author: jolynch): *trunk* - [Java 11 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/67/workflows/6a98bc38-3021-4763-906b-f84099157ba0], [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/67/workflows/06b843ac-a0f9-4123-b14a-965dc8f22755] 4 dtest failures in: * (J8) test_bootstrap_with_reset_bootstrap_state - bootstrap_test.TestBootstrap * (J8) test_compression_cql_options - compression_test.TestCompression * (J11) test_multiple_repair - repair_tests.incremental_repair_test.TestIncRepair * (J11, marked flakey) test_bootstrap_with_reset_bootstrap_state - bootstrap_test.TestBootstrap 1 JVM dtest failure: * (J8): readWriteDuringBootstrapTest - org.apache.cassandra.distributed.test.ring.BootstrapTest *4.0* - [Java 11 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/68/workflows/2fe689eb-1938-42fd-b5cb-abab32cf0d1f] [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/68/workflows/7929a6c9-cc9c-4c5e-b271-cc276fc1d9a1] 1 JVM dtest failure: * (J8, J11) readWriteDuringBootstrapTest - org.apache.cassandra.distributed.test.ring.BootstrapTest *3.11* [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/65/workflows/6e86d386-4057-4d83-aab1-4d3731023f5d] 1 JVM dtest failure: * (J8) bootstrapTest - org.apache.cassandra.distributed.test.BootstrapTest *3.0* [Java 8 CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/64/workflows/5f7095bf-5031-465f-9d82-d574991e21ac] 1 JVM dtest failure: * (J8) bootstrapTest - org.apache.cassandra.distributed.test.BootstrapTest Looking at the tests that failed I don't think any of these look related to the change. A local run of IreadWriteDuringBootstrapTest passes on 4.0 and I'm running a trunk [CI|https://app.circleci.com/pipelines/github/jolynch/cassandra/69/workflows/58cdf28a-8e13-4920-94a2-870d0fe00790] run to see if readWriteDuringBootstrapTest flakes on trunk as well without the patch. I believe this is ready to commit. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit >
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455898#comment-17455898 ] Joey Lynch edited comment on CASSANDRA-14898 at 12/8/21, 5:38 PM: -- +1 . Left a few minor suggestions on the 3.0 PR (apply to other branches as well). When you're ready please rebase to a single commit for each branch with a commit that looks something like {noformat} $ git log -1 commit ... Author: Venkata Harikrishna Nukala Date: ... Fix slow keycache load which blocks startup for tables with many sstables. Patch by Venkata Harikrishna Nukala; reviewed by Marcus Eriksson and Joseph Lynch for CASSANDRA-14898 {noformat} Important part is your name is attributed as author (if you want it to be), and we note the patch author, reviewers and ticket in the last line. was (Author: jolynch): +1 . Left a few minor suggestions on the 3.0 PR (apply to other branches as well), +1. When you're ready please rebase to a single commit for each branch with a commit that looks something like {noformat} $ git log -1 commit ... Author: Venkata Harikrishna Nukala Date: ... Fix slow keycache load which blocks startup for tables with many sstables. Patch by Venkata Harikrishna Nukala; reviewed by Marcus Eriksson and Joseph Lynch for CASSANDRA-14898 {noformat} Important part is your name is attributed as author (if you want it to be), and we note the patch author, reviewers and ticket in the last line. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 50m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear >
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454128#comment-17454128 ] Joey Lynch edited comment on CASSANDRA-14898 at 12/6/21, 4:48 PM: -- Also lg2m, let's iron out the configuration name since that's a public interface but I think the rest of the feedback is straightforward. in-jvm dtest flakes seem unrelated to me was (Author: jolynch): Also lg2m, let's iron out the configuration name since that's a public interface but I think the rest of the feedback is straightforward. Test flakes seem unrelated to me > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 40m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454128#comment-17454128 ] Joey Lynch edited comment on CASSANDRA-14898 at 12/6/21, 4:47 PM: -- Also lg2m, let's iron out the configuration name since that's a public interface but I think the rest of the feedback is straightforward. Test flakes seem unrelated to me was (Author: jolynch): Also lg2m, let's iron out the configuration name since that's a public interface but I think the rest of the feedback is straightforward. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 40m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434487#comment-17434487 ] Venkata Harikrishna Nukala edited comment on CASSANDRA-14898 at 11/30/21, 4:22 AM: --- Here are the changes: |[trunk|https://github.com/apache/cassandra/pull/1287]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/41/workflows/7fb33758-2a6d-464a-bcc5-8b5709d1d997]| |[3.11|https://github.com/apache/cassandra/pull/1288]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/42/workflows/e7cdd28d-9d2e-4445-9157-e2b68ca0ef47]| |[3.0|https://github.com/apache/cassandra/pull/1290]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/46/workflows/a2baf042-657d-44c2-8890-b52215f9f0cf]| was (Author: n.v.harikrishna): Here are the changes: |[trunk|https://github.com/apache/cassandra/pull/1287 ]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/41/workflows/7fb33758-2a6d-464a-bcc5-8b5709d1d997]| |[3.11|https://github.com/apache/cassandra/pull/1288 ]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/42/workflows/e7cdd28d-9d2e-4445-9157-e2b68ca0ef47]| |[3.0|https://github.com/apache/cassandra/pull/1290 ]|[CI|https://app.circleci.com/pipelines/github/nvharikrishna/cassandra/46/workflows/a2baf042-657d-44c2-8890-b52215f9f0cf]| > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 20m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434495#comment-17434495 ] Joey Lynch edited comment on CASSANDRA-14898 at 10/26/21, 5:54 PM: --- Thank you for the patches, will review asap! was (Author: jolynch): Thank you for the patches, will review asap. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > Time Spent: 20m > Remaining Estimate: 0h > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417675#comment-17417675 ] Joey Lynch edited comment on CASSANDRA-14898 at 9/20/21, 2:14 PM: -- [~n.v.harikrishna] Thank you for the patch! Would you like to get this into 3.0/3.11/4.0 and trunk? If so I think we need patches against each branch (the metadata variable names have changed slightly so I think separate patches might be easiest rather than one patch that we merge up); I am happy to do the minor changes on each branch if you like as well. Review of patch: * I think introducing a little bit of state here is wise, if we want to be extra careful do you think we could use [AutoCloseable|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html] somehow to ensure the state is cleaned up? Seems reasonable for {{close}} to just cleanup the state used during load or dump. * Did you mean to remove RowIndexEntry.Serializer.skipForCache(input); on line 453 ? * Do you think we could improve the tests in AutoSavingCacheTest to have a case where a few sstables are created across different ks/table and we want to confirm the key cache is loaded properly with more than one keyspace/table? * Regarding the deadline approach that is an interesting idea and I like it, I'd recommend you implement it in the while loop in AutoSavingCache.loadSaved (the while in.available loop) instead of by cancelling the future at the high level. was (Author: jolynch): [~n.v.harikrishna] Thank you for the patch. Would you like to get this into 3.0/3.11/4.0 and trunk? If so I think we need patches against each branch (the metadata variable names have changed slightly so I think separate patches might be easiest rather than one patch that we merge up); I am happy to do the minor changes on each branch if you like as well. Review of patch: * I think introducing a little bit of state here is wise, if we want to be extra careful do you think we could use [AutoCloseable|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html] somehow to ensure the state is cleaned up? Seems reasonable for {{close}} to just cleanup the state used during load or dump. * Did you mean to remove RowIndexEntry.Serializer.skipForCache(input); on line 453 ? * Do you think we could improve the tests in AutoSavingCacheTest to have a case where a few sstables are created across different ks/table and we want to confirm the key cache is loaded properly with more than one keyspace/table? * Regarding the deadline approach that is an interesting idea and I like it, I'd recommend you implement it in the while loop in AutoSavingCache.loadSaved (the while in.available loop) instead of by cancelling the future at the high level. > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268097#comment-17268097 ] Venkata Harikrishna Nukala edited comment on CASSANDRA-14898 at 1/19/21, 6:04 PM: -- [~jolynch] {quote}is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a {{O( n )}} scan over those tables just within the View? {quote} Yes, wanted to avoid copying lots of SSTables to hash set when we need only one. View's _find_ method don't do O( n ) scans all the times. In a normal case, find method stops at the fist entry (all SSTables generation would be same). I think it will require ( O( n ) scans for entry) in the case of upgrade and required generation SSTable is at end of the list. Added 60s for _get_ to avoid waiting forever. {quote}I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? {quote} Thought of avoiding state across calls. _CacheService deserialize_ method is called for each entry. Things would have been easier if _deserialize_ method returns all keys instead of single key value pair (no need to maintain state across calls). was (Author: n.v.harikrishna): [~jolynch] {quote}is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a {{O( n )}} scan over those tables just within the View? {quote} Yes, wanted to avoid copying lots of SSTables to hash set when we need only one. View's _find_ method don't do O(n) scans all the times. In a normal case, find method stops at the fist entry (all SSTables generation would be same). I think it will require ( O(n) scans for entry) in the case of upgrade and required generation SSTable is at end of the list. Added 60s for _get_ to avoid waiting forever. {quote}I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? {quote} Thought of avoiding state across calls. _CacheService deserialize_ method is called for each entry. Things would have been easier if _deserialize_ method returns all keys instead of single key value pair (no need to maintain state across calls). > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} >
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267378#comment-17267378 ] Joey Lynch edited comment on CASSANDRA-14898 at 1/18/21, 3:52 PM: -- [~n.v.harikrishna] thank you for the patch! I just took a quick look at it (it's a holiday here but I will review more closely tomorrow), but is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a {{O( n )}} scan over those tables just within the View? I think the API that's been introduced into View isn't problematic in that it seems generic and useful, but I'm curious is there a reason you preferred that to the generation cache approach in the initial proof of concept patch? I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? was (Author: jolynch): [~n.v.harikrishna] thank you for the patch! I just took a quick look at it (it's a holiday here but I will review more closely tomorrow), but is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a {{O(n)}} scan over those tables just within the View? I think the API that's been introduced into View isn't problematic in that it seems generic and useful, but I'm curious is there a reason you preferred that to the generation cache approach in the initial proof of concept patch? I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267378#comment-17267378 ] Joey Lynch edited comment on CASSANDRA-14898 at 1/18/21, 3:51 PM: -- [~n.v.harikrishna] thank you for the patch! I just took a quick look at it (it's a holiday here but I will review more closely tomorrow), but is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a {{O(n)}} scan over those tables just within the View? I think the API that's been introduced into View isn't problematic in that it seems generic and useful, but I'm curious is there a reason you preferred that to the generation cache approach in the initial proof of concept patch? I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? was (Author: jolynch): [~n.v.harikrishna] thank you for the patch! I just took a quick look at it (it's a holiday here but I will review more closely tomorrow), but is the gist of the patch that we skip copying the sstables into a new hash set (by just pushing the filter down to the View) but still perform a O(n) scan over those tables just within the View? I think the API that's been introduced into View isn't problematic in that it seems generic and useful, but I'm curious is there a reason you preferred that to the generation cache approach in the initial proof of concept patch? I guess the tradeoff is making the CacheService implementations potentially stateful (introducing a new contract that AutoSavingCache will call a function at the end)? > Key cache loading is very slow when there are many SSTables > --- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. >Reporter: Joey Lynch >Assignee: Venkata Harikrishna Nukala >Priority: Low > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent
[jira] [Comment Edited] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693413#comment-16693413 ] Joseph Lynch edited comment on CASSANDRA-14898 at 11/20/18 4:18 PM: Started exploring caching the {{getSSTables}} call in [a 3.0 branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x]. The strategy I ended up using was to add an optional {{cleanupAfterDeserialize}} method to the {{CacheSerializer}} interface that cleans up any cached state after deserializations because I felt it was the least invasive change that allowed us to fix this problem. I also considered splitting the {{CacheSerializer}} interface into a {{CacheSerializer}} and {{CacheDeserializer}} class since that way the Deserializers wouldn't have to be static and could just throw away their state naturally at the end of a deserialization (since we only use them once afaik at startup) but I felt it was a more invasive change. I also wrote a quick [microbenchmark|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x#diff-51c5d5cb6842a1aedbff457ba77dd424] that creates 1000 small sstables and then does a keycache load and as expected my branch is about two orders of magnitude faster. I'd expect it to reduce by the number of sstables which it did so that's good. Benchmark results: status quo: {noformat} Benchmark Mode Cnt ScoreError Units CacheLoaderBench.keyCacheLoadTest sample 25 167.458 ± 24.430 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00sample 125.436 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50sample 164.364 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90sample 216.583 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95sample 231.106 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0. sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00sample 236.454 ms/op {noformat} Pre-allocating the Hashmap: {noformat} Benchmark Mode Cnt ScoreError Units CacheLoaderBench.keyCacheLoadTest sample 28 149.801 ± 11.483 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00sample 114.426 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50sample 147.980 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90sample 168.401 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95sample 181.941 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0. sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00sample 190.317 ms/op {noformat} Caching the entire {{getSSTables}} call and using a generation map for O(1) lookup: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 1101 3.190 ± 0.108 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00sample 1.876 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50sample 2.863 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90sample 4.615 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95sample 5.161 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99sample 6.750 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 14.178 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0. sample 14.205 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00sample 14.205 ms/op {noformat} So this would seem to indicate that the production traces are correct and the {{O\(n\)}} traversal of the {{getSSTables}} is the dominating factor. was (Author: jolynch): Started exploring caching the {{getSSTables}} call in [a 3.0 branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x]. The strategy I ended up using was to add an optional