Kadir OZDEMIR created PHOENIX-5813:
--------------------------------------
Summary: Index read repair should not interfere with concurrent
updates
Key: PHOENIX-5813
URL: https://issues.apache.org/jira/browse/PHOENIX-5813
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.14.3, 5.0.0
Reporter: Kadir OZDEMIR
Let \{1, a, x, y} be a row in the data table. Let the first column be the only
pk column and the second column be the only indexed column of the table, and
finally the forth column be the only covered column by the index for this
table. The corresponding row in the index table would be \{a, 1, y}.
Now, let the same data table row be mutated and the new state of the row be
\{1, b, x, y}. The index row \{a, 1, y} is not valid any more in the index
table and needs to be deleted. Thus, the prepared index mutations will include
the delete row mutation for the row key \{a, 1} and a put mutation, that is,
put \{b, 1, y} for the new row.
Let \{1, c, x, y} be another mutation on the same row that arrives before the
previous mutation updates the data table. This means that the prepared index
mutations will include the delete row mutation for the row key \{a, 1} and a
put mutation, that is, put \{c, 1, y}. However, the last update should have
deleted index row \{b, 1} instead of \{a, 1}. To prevent this,
IndexRegionObserver maintains a collection of data table row keys for each
pending data table row update in order to detect concurrent updates, and skips
the third write phase for them. In the first update phase, index rows are made
unverified and in the third update phase, they are verified or deleted. The
read-repair operation on these unverified rows will lead to proper resolution
of these concurrent updates.
Therefore, two or more pending updates from different batches on the same data
row are concurrent if and only if for all of these updates the data table row
state is read from HBase under a Phoenix level row lock and for none of them
the row lock has been acquired the second time for updating the data table. In
other words, all of them are in the first update phase concurrently. For
concurrent updates, the first two update phases are done but the last update
phase is skipped. This means the data table row will be updated by these
updates but the corresponding index table rows will be left with the unverified
status. Then, the read repair process will repair these unverified index rows
during scans.
For the example given above, \{1, b, x, y} and \{1, c, x, y} are concurrent
updates (on the same data table row). As explained above, the index rows
generated for these updates should be left unverified. Now assume that a scan
on the index table detects that index row \{1, b, x, y} is unverified while the
concurrent updates are in progress, and the index row is repaired from the data
table. It is possible that the read repair gets the row \{1, b, x, y} from the
data table. Then it will rebuild the corresponding index row which is the row
\{b, 1, y} and will make the row verified. This rebuild may happen just after
the row \{b, 1, y} is made unverified by the concurrent updates. This means
that the repair will overwrite the result of the concurrent updates.
This scan will return \{b, 1, y} to the client. Then this scan may also detect
that \{c, 1, y} is also unverified. By the time, this row is repaired, the data
table row could be \{1, c, x, y}. This means the corresponding index row \{c,
1, y} will be made verified by the read repair and also returned to the client
for the same scan. However, only one these index rows should have been returned
to the client.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)