[
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994307#comment-15994307
]
Vincent Poon commented on PHOENIX-3824:
---------------------------------------
[~lhofhansl] it turned out that the two are related. Short summary is,
normally when you do an update to a data table row, in the preBatchMutate hook
you generate the index update (so you can write it to WAL). To get the index
update, you grab the current state of the row (since you're in preBatchMutate,
it's the pre-update state of the row). That way, you can figure out the
existing index row, and issue a Delete for it, and then Put the new index row.
Well when you're doing an index rebuild, all your data table rows are written
already. So when you "grab the current state of the row", it's the same as the
mutation you're replaying. Since nothing has 'changed', so to speak, the
delete isn't issued. Hence you end up with the extra index row.
PHOENIX-3806 then gets triggered because there's some logic to handle
out-of-order updates. The way they handle out-of-order-updates is, if you get
a mutation that isn't the latest timestamp (i.e. backwards in time), the code
the rolls up through each version up to present. That way you know the present
index state, and if it has changed, you hide your current (back in time) index
update by issuing a Delete after your Put. If you have many versions, this
"roll up" ends up being done for each one, hence the arithmetic summation
problem.
I believe the simple fix is to make sure you don't scan for newer versions when
you "grab the current state of the row". There's actually code that tries to
do that but I think there's a bug. I'm still writing proper tests, etc, but I
think that should fix it.
I haven't figured out PHOENIX-3825, though. I don't know if the code is built
to handle that, and actually it's tricky to make it work with this one.
> Mutable Index partial rebuild adds more than one index row for updated data
> row
> -------------------------------------------------------------------------------
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
> Issue Type: Bug
> Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)