[ 
https://issues.apache.org/jira/browse/PHOENIX-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236598#comment-17236598
 ] 

Viraj Jasani commented on PHOENIX-6086:
---------------------------------------

[~ckulkarni] Thanks for bring this up, all nice thoughts worth consideration!
{quote} # Any DDLs issued since the upgrade began would be lost when we restore 
the snapshot of SYSTEM.CATALOG{quote}
I agree, we don't have any other ways of temporarily storing ongoing DDLs 
during metadata upgrade, which makes is quite difficult to lose the only 
reliable source of truth for DDLs: SYSTEM.CATALOG. Similar reasoning is 
applicable to SYSTEM.SEQUENCE, the only reliable source for sequence operations.
{quote}I think for now, maybe it is safer to just keep the snapshots around and 
allow the operator to decide how to handle the upgrade failure rather than 
blindly forcing a restore from snapshots.
{quote}
This sounds better than systematically handling snapshot restores. To make it 
simpler for operator to identify all snapshots of their interest, perhaps we 
can provide one fat warn log if metadata upgrade was not successful and log all 
snapshots that have been created so far. We could also document this warning 
log somewhere (doc which is closer for operators to follow) and when metadata 
upgrade fails, the first thing they can do is search for this specific warn log 
and determine next actions accordingly. I hope warning log might be good enough 
for operator rather than something more fancy like creating another system 
table (or maybe some in memory data structure and GET API?) to store snapshot 
details during upgrade (it might bring it's own complexities).

Prior to this Jira, we used to restore only SYSCAT and so far this Jira just 
extended it to all other important system tables. However, it seems cons of 
having auto-restore for all system tables clearly outweighs pros. If the 
consensus remains same, let me reopen this Jira very soon.

Thanks

> Take a snapshot of all SYSTEM tables before attempting to upgrade them
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-6086
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6086
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Assignee: Viraj Jasani
>            Priority: Critical
>             Fix For: 5.1.0, 4.16.0
>
>         Attachments: PHOENIX-6086.4.x.000.patch, 
> PHOENIX-6086.master.000.patch, PHOENIX-6086.master.002.patch, 
> PHOENIX-6086.master.003.patch
>
>
> Currently we only take a snapshot of SYSTEM.CATALOG before attempting to 
> upgrade it (see 
> [this|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3718]).
>  From 4.15 onwards we also store critical metadata information in other 
> SYSTEM tables like SYSTEM.CHILD_LINK, so it is beneficial to also snapshot 
> those tables before upgrading them henceforth.
> We also currently don't take a snapshot of SYSTEM.CATALOG on receiving an 
> [UpgradeRequiredException|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3685-L3707]
>  which we should do.
> In case of any errors during the upgrade, we restore SYSTEM.CATALOG from this 
> snapshot and we should extend this to all tables. In cases where the table 
> didn't exist before the upgrade, we need to ensure it is dropped so that a 
> subsequent upgrade attempt can start afresh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to