[jira] [Reopened] (PHOENIX-5527) Unverified index rows should not be deleted due to replication lag

Kadir OZDEMIR (Jira) Thu, 17 Oct 2019 19:56:17 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kadir OZDEMIR reopened PHOENIX-5527:
------------------------------------

In the original design for consistent indexes, we do three phase write. In the 
first phase, we write full index rows with unverified status, then we write 
data table rows, and finally we overwrite the index row status and set it to 
unverified. All these writes get the same timestamp so that index and data 
table entries have the same timestamp for consistency.  This timestamp is the 
wall clock time of the server at the time data table row is read to prepare 
index mutations.

Now if an index row is replicated before its data row and is scanned at the 
destination, this row can be deleted by read repair. The delete timestamp will 
be the same as the existing row timestamp. Since deletes always trump puts when 
the timestamps are the same, even if the data row is replicated later, it will 
not be visible. To reduce the occurrences of this event, we set the delete time 
to 7 days as a stopgap solution for now. However, the side effect of this would 
be the increase in the number of unverified rows and unnecessary read repairs.

There is a better solution for this replication lag problem as follows:

 1. Instead of writing full index row in the first phase, write it at the last 
phase. So, in the first phase, we just write unverified status for the index 
row. In the last row, we do full row index write at the last phase.

2. The timestamp of the unverified row is the timestamp of the index full row 
(and also the data table row) minus 1. This will make sure that if the 
unverified row is deleted by read repair, it will not mask the verified row.

This change does not impact correctness of the design. Now, if the index row is 
replicated before the data table row and is scanned, it can be deleted safely 
as this will only delete the unverified status. When the full index row is 
replicated, it will be visible to scans. 

This also improves overall design in terms of efficiency. In the presence of 
concurrent writes, we skip the last write phase. These writes leave the index 
writes in unverified status. Similarly, if the first or second phase write 
fails, we do not proceed with the third phase. 

Since with this change, we will be writing only the empty column for index 
tables in these failure cases , the storage usage will be improved as we will 
write less index data.

The actual fix for the replication lag should be not to replicate index tables 
index tables in the first place, and to derive them form the data table writes 
as we do on the local cluster.  When we have the actual fix, we may remove 
subtraction 1 from unverified row timestamp (although we may also want to keep 
it as it can protect the index rows against deletions by some crazy race 
conditions). 

The patch for this attached. I run the tests locally and all passed except one 
test failure of a newly introduced IT (EmptyColumnIT). The patch is quite small 
and straightforward. I am hoping to get a +1 quickly from one of you, 
[~gjacoby], [~vincentpoon],[~abhishek.chouhan], [~larsh].

> Unverified index rows should not be deleted due to replication lag 
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-5527
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5527
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5527.master.001.patch
>
>
> The current default delete time for unverified index rows is 10 minutes. If 
> an index table row is replicated before its data table row and the 
> replication row is unverified at the time of replication, it can be deleted 
> when it is scanned on the destination cluster. To prevent these deletes due 
> to replication lag issues, we should increase the default time to 7 days. 
> This value is configurable using the configuration parameter,  
> phoenix.global.index.row.age.threshold.to.delete.ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (PHOENIX-5527) Unverified index rows should not be deleted due to replication lag

Reply via email to