[ 
https://issues.apache.org/jira/browse/PHOENIX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417753#comment-15417753
 ] 

James Taylor commented on PHOENIX-3165:
---------------------------------------

I agree that mucking with the system catalog in the HBase shell is not the 
right approach if it becomes corrupted. It's safest to rely on a backup/restore 
mechanism to put the system catalog back into a known/good state IMHO. For 
particular scenarios in which the SYSTEM.CATALOG is being updated across many 
rows (such as during upgrade), I think PHOENIX-808 will be a good, simple, and 
quick to implement safeguard.

Corruption can take many forms, though. I think it's important to understand 
the root cause of the corruption, as IMHO prevention is the best medicine. If a 
system interacts with Phoenix at the HBase level, this is very dangerous as 
that system will not know if it's changing the table in an invalid manner. Best 
to have Phoenix-level APIs instead that can guarantee that the system catalog 
is kept in a valid state with commits being performed transactionally.

Another approach would be to have an RDBMS-style referential integrity check 
(https://en.wikipedia.org/wiki/Referential_integrity) to prevent invalid states 
from being entered. This would require, of course, that tools mucking with the 
SYSTEM.CATALOG go through APIs that check integrity. This would be a pretty big 
undertaking and it's typically the first thing that a real installation 
disables because it adds too much overhead. It also wouldn't provide all the 
integrity checks we need with the denormalization we do. In theory, we could 
enhance our integrity checks to be able to express these and include them in 
the check. This would be a very big undertaking.

bq. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM table), run 
the tool, and have a completely correct recreation of SYSTEM.CATALOG available 
at the end of its execution.
Unfortunately, that's not possible across all the features of Phoenix:
- The SYSTEM.CATALOG has Phoenix table definitions for all tenants in the form 
of views. These views are essentially unbounded - for example, a time-series 
metric system such as Argus may have 10M of them. Other use cases may have 
multiple per user of a system. There's no other place this information can be 
retrieved or derived from.
- The SYSTEM.CATALOG may vary over time. A client can connect at an earlier 
time stamp with our CURRENT_SCN capability and see the version that was in 
place at that time which may be different than the latest.
- Updating HBase metadata with every change to the SYSTEM.CATALOG would put a 
huge drag on the system. If we're going to do something like that, better to 
change the design and keep the system-of-record in zookeeper instead.
- Because we need updates to the system catalog to have all or none commit 
behavior (i.e. a DDL operation should succeed completely or on failure have 
made no change), we store both column and table information in the same table 
(in contiguous rows). We also store view and index metadata in the table. If 
the table becomes corrupt, it'd be potentially ambiguous on how to fix it. In 
theory, I suppose, a tool could let the user choose between the possible 
choices it'd make to fix it.
- Since the SYSTEM.CATALOG table is essentially data, corruption may mean data 
loss. You can't recover from this (other than by restoring from a backup). I 
don't think guessing or default values that are loss would be viable. In 
theory, the tool could ask they user what value they'd like to use, but if even 
a small percentage of 10M rows are corrupt, I don't think this is feasible.

For SYSTEM.SEQUENCE corruption, we're in a similar, but more precarious 
situation. If any attempts to fix sequences cause sequences to no longer be 
monotonically increasing, then user data can start to be corrupted. It'd be a 
bit scary to have an automated system drive this. Might need to fallback to a 
manual approach here, as you might need to look at user data (and Phoenix 
wouldn't know which data) to know what to reset the current value of a sequence 
to.



> System table integrity check and repair tool
> --------------------------------------------
>
>                 Key: PHOENIX-3165
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3165
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Critical
>
> When the Phoenix system tables become corrupt recovery is a painstaking 
> process of low level examination of table contents and manipulation of same 
> with the HBase shell. This is very difficult work providing no margin of 
> safety, and is a critical gap in terms of usability.
> At the OS level, we have fsck.
> At the HDFS level, we have fsck (integrity checking only, though)
> At the HBase level, we have hbck. 
> At the Phoenix level, we lack a system table repair tool. 
> Implement a tool that:
> - Does not depend on the Phoenix client.
> - Supports integrity checking of SYSTEM tables. Check for the existence of 
> all required columns in entries. Check that entries exist for all Phoenix 
> managed tables (implies Phoenix should add supporting advisory-only metadata 
> to the HBase table schemas). Check that serializations are valid. 
> - Supports complete repair of SYSTEM.CATALOG and recreation, if necessary, of 
> other tables like SYSTEM.STATS which can be dropped to recover from an 
> emergency. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM 
> table), run the tool, and have a completely correct recreation of 
> SYSTEM.CATALOG available at the end of its execution.
> - To the extent we have or introduce cross-system-table invariants, check 
> them and offer a repair or reconstruction option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to