[ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693413#comment-16693413 ]
Joseph Lynch edited comment on CASSANDRA-14898 at 11/20/18 4:18 PM: -------------------------------------------------------------------- Started exploring caching the {{getSSTables}} call in [a 3.0 branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x]. The strategy I ended up using was to add an optional {{cleanupAfterDeserialize}} method to the {{CacheSerializer}} interface that cleans up any cached state after deserializations because I felt it was the least invasive change that allowed us to fix this problem. I also considered splitting the {{CacheSerializer}} interface into a {{CacheSerializer}} and {{CacheDeserializer}} class since that way the Deserializers wouldn't have to be static and could just throw away their state naturally at the end of a deserialization (since we only use them once afaik at startup) but I felt it was a more invasive change. I also wrote a quick [microbenchmark|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x#diff-51c5d5cb6842a1aedbff457ba77dd424] that creates 1000 small sstables and then does a keycache load and as expected my branch is about two orders of magnitude faster. I'd expect it to reduce by the number of sstables which it did so that's good. Benchmark results: status quo: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 25 167.458 ± 24.430 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 125.436 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 164.364 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 216.583 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 231.106 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 236.454 ms/op {noformat} Pre-allocating the Hashmap: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 28 149.801 ± 11.483 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 114.426 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 147.980 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 168.401 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 181.941 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 190.317 ms/op {noformat} Caching the entire {{getSSTables}} call and using a generation map for O(1) lookup: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 1101 3.190 ± 0.108 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 1.876 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 2.863 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 4.615 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 5.161 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 6.750 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 14.178 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 14.205 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 14.205 ms/op {noformat} So this would seem to indicate that the production traces are correct and the {{O\(n\)}} traversal of the {{getSSTables}} is the dominating factor. was (Author: jolynch): Started exploring caching the {{getSSTables}} call in [a 3.0 branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x]. The strategy I ended up using was to add an optional {{cleanupAfterDeserialize}} method to the {{CacheSerializer}} interface that cleans up any cached state after deserializations because I felt it was the least invasive change that allowed us to fix this problem. I also considered splitting the {{CacheSerializer}} interface into a {{CacheSerializer}} and {{CacheDeserializer}} class since that way the Deserializers wouldn't have to be static and could just throw away their state naturally at the end of a deserialization (since we only use them once afaik at startup) but I felt it was a more invasive change. I also wrote a quick [microbenchmark|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x#diff-51c5d5cb6842a1aedbff457ba77dd424] that creates 1000 small sstables and then does a keycache load and as expected my branch is about two orders of magnitude faster. I'd expect it to reduce by the number of sstables which it did so that's good. Benchmark results: status quo: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 25 167.458 ± 24.430 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 125.436 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 164.364 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 216.583 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 231.106 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 236.454 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 236.454 ms/op {noformat} Pre-allocating the Hashmap: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 28 149.801 ± 11.483 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 114.426 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 147.980 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 168.401 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 181.941 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 190.317 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 190.317 ms/op {noformat} Caching the entire {{getSSTables}} call and using a generation map for O(1) lookup: {noformat} Benchmark Mode Cnt Score Error Units CacheLoaderBench.keyCacheLoadTest sample 1101 3.190 ± 0.108 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00 sample 1.876 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50 sample 2.863 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90 sample 4.615 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95 sample 5.161 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99 sample 6.750 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999 sample 14.178 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999 sample 14.205 ms/op CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00 sample 14.205 ms/op {noformat} So this would seem to indicate that the production traces are correct and the {{O(n)}} traversal of the {{getSSTables}} is the dominating factor. > Key cache loading is very slow when there are many SSTables > ----------------------------------------------------------- > > Key: CASSANDRA-14898 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14898 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of > RAM, loading about 8MB of KeyCache with 10k keys in it. > Reporter: Joseph Lynch > Assignee: Joseph Lynch > Priority: Minor > Labels: Performance, low-hanging-fruit > Attachments: key_cache_load_slow.svg > > > While dealing with a production issue today where some 3.0.17 nodes had close > to ~8k sstables on disk due to excessive write pressure, we had a few nodes > crash due to OOM and then they took close to 17 minutes to load the key cache > and recover. This excessive key cache load significantly increased the > duration of the outage (to mitigate we just removed the saved key cache > files). For example here is one example taking 17 minutes to load 10k keys, > or about 10 keys per second (which is ... very slow): > {noformat} > INFO [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - > reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db > INFO [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - > Completed loading (1014606 ms; 10103 keys) KeyCache cache > {noformat} > I've witnessed similar behavior in the past with large LCS clusters, and > indeed it appears that any time the number of sstables is large, KeyCache > loading takes a _really_ long time. Today I got a flame graph and I believe > that I found the issue and I think it's reasonably easy to fix. From what I > can tell the {{KeyCacheSerializer::deserialize}} [method > |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] > which is called for every key is linear in the number of sstables due to the > [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] > to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} > [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. > The {{View::select}} call is linear in the number of sstables and causes a > _lot_ of {{HashSet}} > [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] > when the number of sstables is much greater than 16 (the default size of the > backing {{HashMap}}). > As we see in the attached flamegraph we spend 50% of our CPU time in these > {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in > {{View::select}} and 17% is spent just iterating the sstables in the first > place. A full 16% of CPU time is spent _just resizing the HashMap_. Then > another 4% is spend calling {{CacheService::findDesc}} which does [a linear > search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] > for the sstable generation. > I believe that this affects at least Cassandra 3.0.17 and trunk, and could be > pretty easily fixed by either caching the getSSTables call or at the very > least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the > sstables map. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org