[
https://issues.apache.org/jira/browse/PHOENIX-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236598#comment-17236598
]
Viraj Jasani commented on PHOENIX-6086:
---------------------------------------
[~ckulkarni] Thanks for bring this up, all nice thoughts worth consideration!
{quote} # Any DDLs issued since the upgrade began would be lost when we restore
the snapshot of SYSTEM.CATALOG{quote}
I agree, we don't have any other ways of temporarily storing ongoing DDLs
during metadata upgrade, which makes is quite difficult to lose the only
reliable source of truth for DDLs: SYSTEM.CATALOG. Similar reasoning is
applicable to SYSTEM.SEQUENCE, the only reliable source for sequence operations.
{quote}I think for now, maybe it is safer to just keep the snapshots around and
allow the operator to decide how to handle the upgrade failure rather than
blindly forcing a restore from snapshots.
{quote}
This sounds better than systematically handling snapshot restores. To make it
simpler for operator to identify all snapshots of their interest, perhaps we
can provide one fat warn log if metadata upgrade was not successful and log all
snapshots that have been created so far. We could also document this warning
log somewhere (doc which is closer for operators to follow) and when metadata
upgrade fails, the first thing they can do is search for this specific warn log
and determine next actions accordingly. I hope warning log might be good enough
for operator rather than something more fancy like creating another system
table (or maybe some in memory data structure and GET API?) to store snapshot
details during upgrade (it might bring it's own complexities).
Prior to this Jira, we used to restore only SYSCAT and so far this Jira just
extended it to all other important system tables. However, it seems cons of
having auto-restore for all system tables clearly outweighs pros. If the
consensus remains same, let me reopen this Jira very soon.
Thanks
> Take a snapshot of all SYSTEM tables before attempting to upgrade them
> ----------------------------------------------------------------------
>
> Key: PHOENIX-6086
> URL: https://issues.apache.org/jira/browse/PHOENIX-6086
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.15.0
> Reporter: Chinmay Kulkarni
> Assignee: Viraj Jasani
> Priority: Critical
> Fix For: 5.1.0, 4.16.0
>
> Attachments: PHOENIX-6086.4.x.000.patch,
> PHOENIX-6086.master.000.patch, PHOENIX-6086.master.002.patch,
> PHOENIX-6086.master.003.patch
>
>
> Currently we only take a snapshot of SYSTEM.CATALOG before attempting to
> upgrade it (see
> [this|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3718]).
> From 4.15 onwards we also store critical metadata information in other
> SYSTEM tables like SYSTEM.CHILD_LINK, so it is beneficial to also snapshot
> those tables before upgrading them henceforth.
> We also currently don't take a snapshot of SYSTEM.CATALOG on receiving an
> [UpgradeRequiredException|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3685-L3707]
> which we should do.
> In case of any errors during the upgrade, we restore SYSTEM.CATALOG from this
> snapshot and we should extend this to all tables. In cases where the table
> didn't exist before the upgrade, we need to ensure it is dropped so that a
> subsequent upgrade attempt can start afresh.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)