[ https://issues.apache.org/jira/browse/PHOENIX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417753#comment-15417753 ]
James Taylor commented on PHOENIX-3165: --------------------------------------- I agree that mucking with the system catalog in the HBase shell is not the right approach if it becomes corrupted. It's safest to rely on a backup/restore mechanism to put the system catalog back into a known/good state IMHO. For particular scenarios in which the SYSTEM.CATALOG is being updated across many rows (such as during upgrade), I think PHOENIX-808 will be a good, simple, and quick to implement safeguard. Corruption can take many forms, though. I think it's important to understand the root cause of the corruption, as IMHO prevention is the best medicine. If a system interacts with Phoenix at the HBase level, this is very dangerous as that system will not know if it's changing the table in an invalid manner. Best to have Phoenix-level APIs instead that can guarantee that the system catalog is kept in a valid state with commits being performed transactionally. Another approach would be to have an RDBMS-style referential integrity check (https://en.wikipedia.org/wiki/Referential_integrity) to prevent invalid states from being entered. This would require, of course, that tools mucking with the SYSTEM.CATALOG go through APIs that check integrity. This would be a pretty big undertaking and it's typically the first thing that a real installation disables because it adds too much overhead. It also wouldn't provide all the integrity checks we need with the denormalization we do. In theory, we could enhance our integrity checks to be able to express these and include them in the check. This would be a very big undertaking. bq. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM table), run the tool, and have a completely correct recreation of SYSTEM.CATALOG available at the end of its execution. Unfortunately, that's not possible across all the features of Phoenix: - The SYSTEM.CATALOG has Phoenix table definitions for all tenants in the form of views. These views are essentially unbounded - for example, a time-series metric system such as Argus may have 10M of them. Other use cases may have multiple per user of a system. There's no other place this information can be retrieved or derived from. - The SYSTEM.CATALOG may vary over time. A client can connect at an earlier time stamp with our CURRENT_SCN capability and see the version that was in place at that time which may be different than the latest. - Updating HBase metadata with every change to the SYSTEM.CATALOG would put a huge drag on the system. If we're going to do something like that, better to change the design and keep the system-of-record in zookeeper instead. - Because we need updates to the system catalog to have all or none commit behavior (i.e. a DDL operation should succeed completely or on failure have made no change), we store both column and table information in the same table (in contiguous rows). We also store view and index metadata in the table. If the table becomes corrupt, it'd be potentially ambiguous on how to fix it. In theory, I suppose, a tool could let the user choose between the possible choices it'd make to fix it. - Since the SYSTEM.CATALOG table is essentially data, corruption may mean data loss. You can't recover from this (other than by restoring from a backup). I don't think guessing or default values that are loss would be viable. In theory, the tool could ask they user what value they'd like to use, but if even a small percentage of 10M rows are corrupt, I don't think this is feasible. For SYSTEM.SEQUENCE corruption, we're in a similar, but more precarious situation. If any attempts to fix sequences cause sequences to no longer be monotonically increasing, then user data can start to be corrupted. It'd be a bit scary to have an automated system drive this. Might need to fallback to a manual approach here, as you might need to look at user data (and Phoenix wouldn't know which data) to know what to reset the current value of a sequence to. > System table integrity check and repair tool > -------------------------------------------- > > Key: PHOENIX-3165 > URL: https://issues.apache.org/jira/browse/PHOENIX-3165 > Project: Phoenix > Issue Type: New Feature > Reporter: Andrew Purtell > Priority: Critical > > When the Phoenix system tables become corrupt recovery is a painstaking > process of low level examination of table contents and manipulation of same > with the HBase shell. This is very difficult work providing no margin of > safety, and is a critical gap in terms of usability. > At the OS level, we have fsck. > At the HDFS level, we have fsck (integrity checking only, though) > At the HBase level, we have hbck. > At the Phoenix level, we lack a system table repair tool. > Implement a tool that: > - Does not depend on the Phoenix client. > - Supports integrity checking of SYSTEM tables. Check for the existence of > all required columns in entries. Check that entries exist for all Phoenix > managed tables (implies Phoenix should add supporting advisory-only metadata > to the HBase table schemas). Check that serializations are valid. > - Supports complete repair of SYSTEM.CATALOG and recreation, if necessary, of > other tables like SYSTEM.STATS which can be dropped to recover from an > emergency. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM > table), run the tool, and have a completely correct recreation of > SYSTEM.CATALOG available at the end of its execution. > - To the extent we have or introduce cross-system-table invariants, check > them and offer a repair or reconstruction option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)