[ 
https://issues.apache.org/jira/browse/IGNITE-25652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987143#comment-17987143
 ] 

Ivan Bessonov edited comment on IGNITE-25652 at 7/1/25 8:48 AM:
----------------------------------------------------------------

[~sergeychugunov] I'm not aware of any mathematically-formal way to prove the 
the state was too broad. We used a common sense and tests that clearly showed 
the presence of a race. Let me try to explain it to you more thoroughly:
 * Let's take a look at "{{{}finishTail(){}}}".
 * It cad do a "{{{}needReplaceInner = READY;{}}}", but only if 
"{{{}needReplaceInner == TRUE{}}}"
 * Later in code there's a "{{{}if (needMergeEmptyBranch == TRUE) {{}}}"
 * That branch of code contains "{{{}return RETRY;{}}}"
 * Very important - we *do not* return the value of "{{{}needReplaceInner{}}}" 
back, it stays as "{{{}READY{}}}"
 * On the next retry we check the condition "{{{}needReplaceInner == TRUE{}}}" 
and if doesn't pass, because value is already "{{{}READY{}}}"
 ** Thus we don't call "{{{}if (!isInnerKeyInTail()) {{}}}", *and* later 
execute "{{{}replaceInner();{}}}" on a *wrong* node
 * Then we don't execute "{{{}replaceInner();{}}}" on a *right* node and end up 
with a corrupted tree

I hope the situation is clear, and the fact that "ready" state is unnecessary 
is clear as well. Of course, there were two ways of fixing the bug, and we 
decided to make code less confusing. Please reach me if you have more questions.

EDIT: I might have made a mistake when saying that we don't  execute  
{{{}replaceInner();{}}}" on a right node, there's probably no need to do that. 
Anyway, I suggest taking a look at situation like this very closely. There's a 
clear data race, it's just hard to describe precisely right now.


was (Author: ibessonov):
[~sergeychugunov] I'm not aware of any mathematically-formal way to prove the 
the state was too broad. We used a common sense and tests that clearly showed 
the presence of a race. Let me try to explain it to you more thoroughly:
 * Let's take a look at "{{{}finishTail(){}}}".
 * It cad do a "{{{}needReplaceInner = READY;{}}}", but only if 
"{{{}needReplaceInner == TRUE{}}}"
 * Later in code there's a "{{{}if (needMergeEmptyBranch == TRUE) {{}}}"
 * That branch of code contains "{{{}return RETRY;{}}}"
 * Very important - we *do not* return the value of "{{{}needReplaceInner{}}}" 
back, it stays as "{{{}READY{}}}"
 * On the next retry we check the condition "{{{}needReplaceInner == TRUE{}}}" 
and if doesn't pass, because value is already "{{{}READY{}}}"
 ** Thus we don't call "{{{}if (!isInnerKeyInTail()) {{}}}", *and* later 
execute "{{{}replaceInner();{}}}" on a *wrong* node
 * Then we don't execute "{{{}replaceInner();{}}}" on a *right* node and end up 
with a corrupted tree

I hope the situation is clear, and the fact that "ready" state is unnecessary 
is clear as well. Of course, there were two ways of fixing the bug, and we 
decided to make code less confusing. Please reach me if you have more questions.

 

EDIT: I might have made a mistake when saying that we don't  execute  
{{{}replaceInner();{}}}" on a right node, there's probably no need to do that. 
Anyway, I suggest taking a look at situation like this very closely. There's a 
clear data race, it's just hard to describe precisely right now.

"

> Fix BPlusTree corruption during concurrent removes (AI2)
> --------------------------------------------------------
>
>                 Key: IGNITE-25652
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25652
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Philipp Shergalis
>            Assignee: Philipp Shergalis
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Port fix from AI3 https://issues.apache.org/jira/browse/IGNITE-23588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to