[jira] [Commented] (CASSANDRA-15949) NPE thrown while updating speculative execution time if table is removed during task execution

Caleb Rackliffe (Jira) Thu, 10 Sep 2020 21:06:10 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193979#comment-17193979
 ]


Caleb Rackliffe commented on CASSANDRA-15949:
---------------------------------------------

[~dcapwell] [~jmeredithco] I think I can see a sequence that produces the error 
above now. Here goes...

1.) We drop a keyspace, which hits {{Schema#dropKeyspace()}}.
2.) This winds around a bit, but finally clears the keyspace name from the 
{{keyspaceInstances}} field of {{Schema}}. However, the {{keyspaces}} field 
still thinks the keyspace is present right before {{dropKeyspace()}} proceeds 
to {{unload()}}.
3.) At this point, the speculative retry threshold task decides it's a good 
time to run. It hits {{Keyspace.open()}} and sees that, according to 
{{Schema#getKeyspaceInstance()}}, the keyspace doesn't exist!
4.) We move into the {{Keyspace}} constructor just in time to get a reference 
to a {{KeyspaceMetadata}} from {{keyspaces}} in {{Schema}} that thinks the 
table in question still exists.
5.) Immediately after this happens, the original thread continues into 
{{unload()}} and drops the hammer on everything, including {{metadataRefs}}.
6.) The task thread wakes up and proceeds in the {{Keyspace}} constructor to 
try to get a {{TableMetadataRef}}, but of course, it's gone.

If the sequence above is coherent, I think it means the [current 
patch|https://github.com/apache/cassandra/pull/733/files] is at least an 
improvement, given it stops the {{Keyspace}} constructor before it proceeds to 
{{initCf}} and doesn't kill all future executions of the threshold update task 
in addition. My only concern is that we might still be able to hit {{initCf()}} 
if we get a {{TableMetadataRef}} _just_ before {{unload()}} blows it away, 
which seems like it would create a new {{ColumnFamilyStore}}.

So naïve question...why do we allow the speculative retry threshold updater 
task to create keyspaces at all, ever? It seems like we could approach this an 
entirely different way...by just having something like 
{{Keyspace.allExisting()}} that just uses the non-null results of 
{{Schema#getKeyspaceInstance()}} instead of {{Keyspace.open()}}. Even if one of 
those instances is in the process of being removed, updating the thresholds on 
a doomed CFS is harmless. (Also, we can avoid a new esoteric bit of logging 
trying to explain how our schema updates work.)

> NPE thrown while updating speculative execution time if table is removed 
> during task execution
> ----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15949
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15949
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Jon Meredith
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.0-beta
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> CASSANDRA-14338 fixed the scheduling the speculation retry threshold 
> calculation, but if the task happens to be scheduled while a table is being 
> dropped, it triggers an NPE. 
> ERROR 2020-07-14T11:34:55,762 [OptionalTasks:1] 
> org.apache.cassandra.service.CassandraDaemon:446 - Exception in thread 
> Thread[OptionalTasks:1,5,main]
> java.lang.NullPointerException: null
>        at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:444) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:346) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:139) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:116) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:102) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:99) 
> ~[cassandra-4.0.0.jar:4.0.0]
>        at 
> com.google.common.collect.Iterables$5.lambda$forEach$0(Iterables.java:704) 
> ~[guava-27.0-jre.jar:?]
>        at 
> com.google.common.collect.IndexedImmutableSet.forEach(IndexedImmutableSet.java:45)
>  ~[guava-27.0-jre.jar:?]
>        at com.google.common.collect.Iterables$5.forEach(Iterables.java:704) 
> ~[guava-27.0-jre.jar:?]
>        at 
> org.apache.cassandra.service.CassandraDaemon.lambda$setup$2(CassandraDaemon.java:412)
>  ~[cassandra-4.0.0.jar:4.0.0]
>        at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  [cassandra-4.0.0.jar:4.0.0]
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) 
> [?:?]
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  [?:?]
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>        at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.37.Final.jar:4.1.37.Final]
>        at java.lang.Thread.run(Thread.java:834) [?:?]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15949) NPE thrown while updating speculative execution time if table is removed during task execution

Reply via email to