[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164051#comment-17164051 ]
David Capwell commented on CASSANDRA-15191: ------------------------------------------- CI 3.11 - https://app.circleci.com/pipelines/github/dcapwell/cassandra/309/workflows/b4cbed8d-868f-4640-a697-471fa03fd4bf CI trunk - https://app.circleci.com/pipelines/github/dcapwell/cassandra/310/workflows/62969c9b-9c65-4558-9ec0-3fcc3f17d79e Looks like this patch doesn't play nicely with commit log, this breaks the following tests commitlog_test.py - test_ignore_failure_policy - test_stop_commit_failure_policy Here is the log from the ignore policy test https://1573-209217594-gh.circle-artifacts.com/62/dtest_j8_without_vnodes_logs/1595547611103_test_ignore_failure_policy/node1.log sample that stands out {code} ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:08,735 CommitLog.java:499 - Failed managing commit log segments org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: /tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:180) at org.apache.cassandra.db.commitlog.MemoryMappedSegment.<init>(MemoryMappedSegment.java:45) at org.apache.cassandra.db.commitlog.CommitLogSegment.createSegment(CommitLogSegment.java:137) at org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:66) at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(AbstractCommitLogSegmentManager.java:114) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.AccessDeniedException: /tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:175) ... 7 common frames omitted ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:09,736 DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is stop {code} Looks like the commit policy isn't respected and instead we fall back to the normal disk policy. [~stefan.miklosovic] can you look into this? > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config > Reporter: Vincent White > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org