[ 
https://issues.apache.org/jira/browse/CASSANDRA-19703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936514#comment-17936514
 ] 

Andy Tolbert commented on CASSANDRA-19703:
------------------------------------------

I think that's a good idea [~bschoeni], I'll add something to the docs to note 
this in my PR.  I think your point about Spring Data is good, as most users 
I've worked with whose clusters have had this issue were using a library that 
prepared statements for them without them realizing what was going on.  Another 
example of this is gocql, which implicitly prepares everything (which I've come 
around to actually being a good idea because it simplifies things quite a bit 
when you have all the query metadata).

I have deployed this to a couple hundred clusters now and it does seem to 
resolve the prepared statement leak and prevent instances from OOMing on 
startup, but there's a couple things I have to acknowledge:

1. The fix addresses caffeine evicting before inserting by not leaking into 
system.prepared_statements by ensuring the client timestamp on the insert 
precedes the delete, but there still is the problem of the early eviction, 
which causes a decent amount of client re-prepare behavior.   However, in 
extreme cases like this where clients are overwhelming the prepared statement 
cache, you are still going to see a ton of reprepares regardless of how the 
cache chooses to evict.  I've seen some clusters where the cache essentially 
gets reset every 2 seconds.

2. The fix addresses the OOM on startup by introducing pagination of the 
system.prepared_statements table, but if you have hundreds of thousands of 
leaked statements, it can take hours for a node to start up.  I think we could 
consider adding some logic (like truncating the table after X000 prepared 
statements), but I think it's probably best not to add something to startup 
that does something like truncate and just provide some good guidance for folks 
who need to dig themselves out of this.   Since we were able to detect the 
problem before rolling out the change more than a couple hundred clusters, it 
was pretty easy for us to work around this; so I expect maybe that problem 
could be solved by documentation.

> Newly inserted prepared statements got evicted too early from cache that 
> leads to race condition
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19703
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19703
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Yuqi Yan
>            Assignee: Cameron Zemek
>            Priority: Normal
>             Fix For: 4.1.x
>
>         Attachments: ci_summary.html
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We're upgrading from Cassandra 4.0 to Cassandra 4.1.3 and 
> system.prepared_statements table size start growing to GB size after upgrade. 
> This slows down node startup significantly when it's doing 
> preloadPreparedStatements
> I can't share the exact log but it's a race condition like this:
>  # [Thread 1] Receives a prepared request for S1. Attempts to get S1 in cache
>  # [Thread 1] Cache miss, put this S1 into cache
>  # [Thread 1] Attempts to write S1 into local table
>  # [Thread 2] Receives a prepared request for S2. Attempts to get S2 in cache
>  # [Thread 2] Cache miss, put this S2 into cache
>  # [Thread 2] Cache is full, evicting S1 from cache
>  # [Thread 2] Attempts to delete S1 from local table
>  # [Thread 2] Tombstone inserted for S1, delete finished
>  # [Thread 1] Record inserted for S1, write finished
> Thread 2 inserted a tombstone for S1 earlier than Thread 1 was able to insert 
> the record in the table. Hence the data will not be removed because the later 
> insert has newer write time than the tombstone.
> Whether this would happen or not depends on how the cache decides what’s the 
> next entry to evict when it’s full. We noticed that in 4.1.3 Caffeine was 
> upgraded to 2.9.2 CASSANDRA-15153
>  
> I did a small research in Caffeine commits. It seems this commit was causing 
> the entry got evicted to early: Eagerly evict an entry if it too large to fit 
> in the cache(Feb 2021), available after 2.9.0: 
> [https://github.com/ben-manes/caffeine/commit/464bc1914368c47a0203517fda2151fbedaf568b]
> And later fixed in: Improve eviction when overflow or the weight is 
> oversized(Aug 2022), available after 3.1.2: 
> [https://github.com/ben-manes/caffeine/commit/25b7d17b1a246a63e4991d4902a2ecf24e86d234]
> {quote}Previously an attempt to centralize evictions into one code path led 
> to a suboptimal approach 
> ([{{464bc19}}|https://github.com/ben-manes/caffeine/commit/464bc1914368c47a0203517fda2151fbedaf568b]
> ). This tried to move those entries into the LRU position for early eviction, 
> but was confusing and could too aggressively evict something that is 
> desirable to keep.
> {quote}
>  
> I upgrade the Caffeine to 3.1.8 (same as 5.0 trunk) and this issue is gone. 
> But I think this version is not compatible with Java 8.
> I'm not 100% sure if this is the root cause and what's the correct fix here. 
> Would appreciate if anyone can have a look, thanks
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to