[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972662#comment-16972662 ]
Jordan West commented on CASSANDRA-15295: ----------------------------------------- Hi [~djoshi] , I like the approach to removing the potential of re-introducing the bug. However, it seems like {{CommitLog}} was changed only partially to be thread-safe. For example, concurrent calls to {{#start()}} and {{#shutdownBlocking()}} could leave the {{CommitLog}} in an invalid state: If the thread executing {{#start()}} pauses before calling {{executor.start()}}, and resumes only after a separate thread executing {{#shutdownBlocking}} calls {{executor.shutdown()}} and {{executor.awaitTermination}} (which will immediately exit since {{thread == null}}), the {{segmentManager}} will be shutdown but the {{executor}} will still be running. Is it necessary to make {{CommitLog}} thread-safe? Removing the singleton doesn’t really change the odds of it being used from multiple threads and the original version wasn’t thread-safe either w.r.t to these functions. Some other comments/minor nits: * I like moving the factory code to a function to reduce the amount of new code * Is it necessary to change AbstractCommitLogSegmentManager#shutdown() to no longer use an assert? That seems like a semantic change that is stylistic but I may be missing a further motivation for this change. * Minor style nit: In CommitLog#start, {{if (started) return true;}} should be on two lines per the Cassandra style guides > Running into deadlock when do CommitLog initialization > ------------------------------------------------------ > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log > Reporter: Zephyr Guo > Assignee: Zephyr Guo > Priority: Normal > Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org