[ https://issues.apache.org/jira/browse/KUDU-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke resolved KUDU-3191. ------------------------------- Fix Version/s: 1.14.0 Resolution: Fixed > Fail tablet replicas that suffer from KUDU-2233 instead of crashing > ------------------------------------------------------------------- > > Key: KUDU-3191 > URL: https://issues.apache.org/jira/browse/KUDU-3191 > Project: Kudu > Issue Type: Task > Components: compaction > Reporter: Andrew Wong > Assignee: Andrew Wong > Priority: Major > Fix For: 1.14.0 > > > KUDU-2233 results in persisted corruption that causes a broken invariant, > leading to a server crash. The recovery process for this corruption is > arduous, especially if there are multiple tablet replicas in a given server > that suffer from it -- users typically start the server, see the crash, > remove the affected replica manually via tooling, and restart, repeatedly > until the server comes up healthily. > Instead, we should consider treating this as we do CFile block-level > corruption[1] and fail the tablet replica. At best, we end up recovering from > a non-corrupted replica. At worst, we'd end up with multiple corrupted > replicas, which is still better than what we have today, which is multiple > corrupted replicas and unavailable servers that lead to excessive > re-replication. > [1] > https://github.com/apache/kudu/commit/cf6927cb153f384afb649b664de1d4276bd6d83f -- This message was sent by Atlassian Jira (v8.3.4#803005)