Hello Tidy Bot, Zoltan Chovan, Ashwani Raina, Kudu Jenkins, Anonymous Coward 
(763),

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#6).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
......................................................................

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart
from incorrect order, there are no other errors in the tree. All inserted
elements are present, but CBTree::GetCopy cannot find some due to the incorrect
ordering.

This is not a unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue (only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores (TSO).
ARM, in contrast, allows stores to different memory locations to be observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 6 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not enough
  to flush all previous changes. It is also very important not to reorder any
  write before it. So instead of Release_Store (which is practically equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
  check if the version is unchanged, then a load_acquire (equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the c++ standard functions will make the code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the missing
functions was reimplemented. The header is gone since, and even chromium uses a
solution based on std::atomic (see base/atomicops_internals_portable.h in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction layer, that
is basically just "function aliases" at this point.

Reason for removing PACKED:

Although the padding of the Nodes was turn of, the following happened (both on
ARM and x86):
+ NodeBase was 16 byte, so there was no padding after the base class.
+ LeafNode and InternalNode was 249, 244 bytes (used in memrowset and
  deltamemstore)
+ When we reserved a new node, we used arena_->AllocateBytesAligned(...), with
  8 bytes alignment.

So the end result was exactly the same, as if there were no "PACKED" attribute.
Also compiler kept warning about "&version might be unaligned", but it were
never unaligned.
By removing the packed PACKED we change nothing in the memory layout. (neither
on ARM or x86). It just gets rid of the old compiler warning. (std::atomic
disable packing anyway).

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/concurrent_btree.h
1 file changed, 76 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/6
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka <zmarto...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan <zcho...@cloudera.com>
Gerrit-Reviewer: Zoltan Martonka <zmarto...@cloudera.com>

Reply via email to