[ 
https://issues.apache.org/jira/browse/PHOENIX-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR updated PHOENIX-5743:
-----------------------------------
    Attachment: PHOENIX-5743.4.x-HBase-1.3.001.patch

> Concurrent read repairs on the same index row should be idempotent
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-5743
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5743
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Critical
>         Attachments: PHOENIX-5743.4.x-HBase-1.3.001.patch, 
> PHOENIX-5743.master.001.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> It is possible that two or more read repairs can work on the same row. 
> Regardless of how many read repairs concurrently happen on this row, the end 
> result should be the same.  The current implementation does not satisfy this 
> property in one case. This can happen with the following steps:
>  # An update on a data table row fails due to the data table row write 
> failure (the phase two write). Since the phase 1 (unverified index write) has 
> completed here, this leaves an unverified row in the index table.
>  # Two (or more) concurrent queries on this table scans this unverified index 
> row. 
>  # Each query triggers a separate read repair activity.
>  # The first one deletes the unverified row correctly.
>  # The subsequent ones may leave a wrong delete marker which corrupts this 
> index row.
> Step 5 can happen because of two bugs in deleteRowIfAgedEnough() in 
> GlobalIndexChecker.GlobalIndexScanner:
>  # "deleteRowScan.setTimeRange(0, ts + 1);" should read 
> "deleteRowScan.setTimeRange(ts, ts + 1);". This will make sure that the first 
> read repair will retrieve the cells of the unverified row with the timestamp 
> ts but the subsequent read repair gets either the same set of cells the first 
> one got, or no cell (i.e., empty row).
>  # If the unverified row has been already deleted, deleteRowIfAgedEnough() 
> should do nothing and return. However, the current implementation either the 
> read repair will retrieve the previous row version (i.e., previous to the 
> unverified row) and leaves DeleteColumn markers for wrong cells,  or it will 
> get no cells (if no previous row version exists) and leaves a DeleteFamily 
> marker which will deletes all previous versions of the row if such rows are 
> inserted back by index rebuild.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to