[ 
https://issues.apache.org/jira/browse/PHOENIX-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200324#comment-17200324
 ] 

Chinmay Kulkarni commented on PHOENIX-6141:
-------------------------------------------

[~larsh] Good point about SYSCAT not having the source of truth in this case. 
One downside of having a 2PC is an additional RPC for marking the SYSCAT row as 
not tentative. 
We can piggyback the first "mark as tentative" along with the RPC to write 
other metadata rows in SYSCAT. We might have to change the order of RPCs here 
since as of now we send the RPC to SYSTEM.CHILD_LINK first.

In addition to that, we will now keep both the tentative SYSCAT lining row as 
well as the "committed" SYSTEM.CHILD_LINK linking row. Maybe we can have a 
background job that removes old enough SYSCAT linking rows which are already 
present in SYSTEM.CHILD_LINK to prevent double storage. On top of this, a read 
repair step as you mentioned, to remove orphan SYSTEM.CHILD_LINK rows during 
read repair also sounds good

> Ensure consistency between SYSTEM.CATALOG and SYSTEM.CHILD_LINK
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-6141
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6141
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Priority: Blocker
>             Fix For: 4.17.0
>
>
> Before 4.15, "CREATE/DROP VIEW" was an atomic operation since we were issuing 
> batch mutations on just the 1 SYSTEM.CATALOG region. In 4.15 we introduced 
> SYSTEM.CHILD_LINK to store the parent->child links and so a CREATE VIEW is no 
> longer atomic since it consists of 2 separate RPCs  (1 to SYSTEM.CHILD_LINK 
> to add the linking row and another to SYSTEM.CATALOG to write metadata for 
> the new view). 
> If the second RPC i.e. the RPC to write metadata to SYSTEM.CATALOG fails 
> after the 1st RPC has already gone through, there will be an inconsistency 
> between both metadata tables. We will see orphan parent->child linking rows 
> in SYSTEM.CHILD_LINK in this case. This can cause the following issues:
> # ALTER TABLE calls on the base table will fail
> # DROP TABLE without CASCADE will fail
> # The upgrade path has calls like UpgradeUtil.upgradeTable() which will fail
> # Any metadata consistency checks can be thrown off
> # Unnecessary extra storage of orphan links
> The first 3 issues happen because we wrongly deduce that a base table has 
> child views due to the orphan linking rows.
> This Jira aims at trying to come up with a way to make mutations among 
> SYSTEM.CATALOG and SYSTEM.CHILD_LINK an atomic transaction. We can use a 
> 2-phase commit approach like in global indexing or also potentially explore 
> using a transaction manager. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to