[ 
https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693413#comment-16693413
 ] 

Joseph Lynch edited comment on CASSANDRA-14898 at 11/20/18 4:18 PM:
--------------------------------------------------------------------

Started exploring caching the {{getSSTables}} call in [a 3.0 
branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x].
 The strategy I ended up using was to add an optional 
{{cleanupAfterDeserialize}} method to the {{CacheSerializer}} interface that 
cleans up any cached state after deserializations because I felt it was the 
least invasive change that allowed us to fix this problem. I also considered 
splitting the {{CacheSerializer}} interface into a {{CacheSerializer}} and 
{{CacheDeserializer}} class since that way the Deserializers wouldn't have to 
be static and could just throw away their state naturally at the end of a 
deserialization (since we only use them once afaik at startup) but I felt it 
was a more invasive change.

I also wrote a quick 
[microbenchmark|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x#diff-51c5d5cb6842a1aedbff457ba77dd424]
 that creates 1000 small sstables and then does a keycache load and as expected 
my branch is about two orders of magnitude faster. I'd expect it to reduce by 
the number of sstables which it did so that's good. Benchmark results:

status quo:
{noformat}
Benchmark                                                     Mode  Cnt    
Score    Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample   25  
167.458 ± 24.430  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample       
125.436           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample       
164.364           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample       
216.583           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample       
231.106           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample       
236.454           ms/op
{noformat}
Pre-allocating the Hashmap:
{noformat}
Benchmark                                                     Mode  Cnt    
Score    Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample   28  
149.801 ± 11.483  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample       
114.426           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample       
147.980           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample       
168.401           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample       
181.941           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample       
190.317           ms/op
{noformat}
Caching the entire {{getSSTables}} call and using a generation map for O(1) 
lookup:
{noformat}
Benchmark                                                     Mode   Cnt   
Score   Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample  1101   
3.190 ± 0.108  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample         
1.876          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample         
2.863          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample         
4.615          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample         
5.161          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample         
6.750          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample        
14.178          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample        
14.205          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample        
14.205          ms/op
{noformat}
So this would seem to indicate that the production traces are correct and the 
{{O\(n\)}} traversal of the {{getSSTables}} is the dominating factor.


was (Author: jolynch):
Started exploring caching the {{getSSTables}} call in [a 3.0 
branch|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x].
 The strategy I ended up using was to add an optional 
{{cleanupAfterDeserialize}} method to the {{CacheSerializer}} interface that 
cleans up any cached state after deserializations because I felt it was the 
least invasive change that allowed us to fix this problem. I also considered 
splitting the {{CacheSerializer}} interface into a {{CacheSerializer}} and 
{{CacheDeserializer}} class since that way the Deserializers wouldn't have to 
be static and could just throw away their state naturally at the end of a 
deserialization (since we only use them once afaik at startup) but I felt it 
was a more invasive change.

I also wrote a quick 
[microbenchmark|https://github.com/apache/cassandra/compare/cassandra-3.0...jolynch:CASSANDRA-14898_30x#diff-51c5d5cb6842a1aedbff457ba77dd424]
 that creates 1000 small sstables and then does a keycache load and as expected 
my branch is about two orders of magnitude faster. I'd expect it to reduce by 
the number of sstables which it did so that's good. Benchmark results:

status quo:
{noformat}
Benchmark                                                     Mode  Cnt    
Score    Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample   25  
167.458 ± 24.430  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample       
125.436           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample       
164.364           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample       
216.583           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample       
231.106           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample       
236.454           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample       
236.454           ms/op
{noformat}
Pre-allocating the Hashmap:
{noformat}
Benchmark                                                     Mode  Cnt    
Score    Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample   28  
149.801 ± 11.483  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample       
114.426           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample       
147.980           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample       
168.401           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample       
181.941           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample       
190.317           ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample       
190.317           ms/op
{noformat}
Caching the entire {{getSSTables}} call and using a generation map for O(1) 
lookup:
{noformat}
Benchmark                                                     Mode   Cnt   
Score   Error  Units
CacheLoaderBench.keyCacheLoadTest                           sample  1101   
3.190 ± 0.108  ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.00    sample         
1.876          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.50    sample         
2.863          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.90    sample         
4.615          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.95    sample         
5.161          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.99    sample         
6.750          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.999   sample        
14.178          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p0.9999  sample        
14.205          ms/op
CacheLoaderBench.keyCacheLoadTest:keyCacheLoadTest·p1.00    sample        
14.205          ms/op
{noformat}
So this would seem to indicate that the production traces are correct and the 
{{O(n)}} traversal of the {{getSSTables}} is the dominating factor.

> Key cache loading is very slow when there are many SSTables
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-14898
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14898
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of 
> RAM, loading about 8MB of KeyCache with 10k keys in it.
>            Reporter: Joseph Lynch
>            Assignee: Joseph Lynch
>            Priority: Minor
>              Labels: Performance, low-hanging-fruit
>         Attachments: key_cache_load_slow.svg
>
>
> While dealing with a production issue today where some 3.0.17 nodes had close 
> to ~8k sstables on disk due to excessive write pressure, we had a few nodes 
> crash due to OOM and then they took close to 17 minutes to load the key cache 
> and recover. This excessive key cache load significantly increased the 
> duration of the outage (to mitigate we just removed the saved key cache 
> files). For example here is one example taking 17 minutes to load 10k keys, 
> or about 10 keys per second (which is ... very slow):
> {noformat}
> INFO  [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - 
> reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db
> INFO  [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - 
> Completed loading (1014606 ms; 10103 keys) KeyCache cache
> {noformat}
> I've witnessed similar behavior in the past with large LCS clusters, and 
> indeed it appears that any time the number of sstables is large, KeyCache 
> loading takes a _really_ long time. Today I got a flame graph and I believe 
> that I found the issue and I think it's reasonably easy to fix. From what I 
> can tell the {{KeyCacheSerializer::deserialize}} [method 
> |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445]
>  which is called for every key is linear in the number of sstables due to the 
> [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459]
>  to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} 
> [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139].
>  The {{View::select}} call is linear in the number of sstables and causes a 
> _lot_ of {{HashSet}} 
> [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]
>  when the number of sstables is much greater than 16 (the default size of the 
> backing {{HashMap}}).
> As we see in the attached flamegraph we spend 50% of our CPU time in these 
> {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in 
> {{View::select}} and 17% is spent just iterating the sstables in the first 
> place. A full 16% of CPU time is spent _just resizing the HashMap_. Then 
> another 4% is spend calling {{CacheService::findDesc}} which does [a linear 
> search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475]
>  for the sstable generation.
> I believe that this affects at least Cassandra 3.0.17 and trunk, and could be 
> pretty easily fixed by either caching the getSSTables call or at the very 
> least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the 
> sstables map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to