This is an automated email from the ASF dual-hosted git repository.
Yicong-Huang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new bb795a0c8f fix(amber): avoid AtomicInteger.get_and_set deadlock (#5010)
bb795a0c8f is described below
commit bb795a0c8febbdaecd6878dcbb81498e8b5c0338
Author: nathant27 <[email protected]>
AuthorDate: Wed May 13 01:17:37 2026 -0700
fix(amber): avoid AtomicInteger.get_and_set deadlock (#5010)
### What changes were proposed in this PR?
Fixed deadlock happening in AtomicInteger.get_and_set
(amber/src/main/python/core/util/atomic.py).
Before, get_and_set acquired non-reentrant lock and then accessed value
property, which attempts to grab the same lock, causing a deadlock.
After the change, get_and_set now accesses the objects self._value
property directly inside the existing critical section, which avoids the
nested lock acquisition.
-Chose to do the inline change instead of changing to Reentrant lock to
avoid potentially unnecessary overhead and because it seems like the
more appropriate way to access in this context, since we're already
grabbing the lock anyway in get_and_set
### Any related issues, documentation, or discussions?
Fixes #4794
### How was this PR tested?
in src/test/python/core/util/test_atomic.py
test_get_and_set_currently_deadlocks_on_non_reentrant_lock, bug pinned
test
after the fixes now fails on both asserts lines 99 to 106 on removed
test(test_get_and_set_currently_deadlocks_on_non_reentrant_lock)
```
assert worker.is_alive(), (
"worker thread exited unexpectedly — get_and_set neither
deadlocked "
"nor completed; the test no longer pins the documented bug."
)
assert not completed.is_set(), (
"get_and_set unexpectedly returned — the deadlock bug appears
fixed; "
"delete this pinned test along with the xfail below."
)
```
This is expected behavior because worker should not be alive after
fixing deadlock, and should be completed.
#### 2 changes to test_atomic.py
**REPLACED**
test_get_and_set_currently_deadlocks_on_non_reentrant_lock
WITH
test_get_and_set_does_not_deadlock_on_non_reentrant_lock
- Mostly the same functionality of old test, but instead of checking if
deadlocks, checks if it does not deadlock by asserting "not
worker.is_alive()" and "completed.is_set" from the thread. Basically
just swapped the two asserts at the end
**UPDATED** test_get_and_set_should_return_old_value_and_replace_state
- REMOVED the xfail with strict = true, because now passes
### Was this PR authored or co-authored using generative AI tooling?
No
---
amber/src/main/python/core/util/atomic.py | 2 +-
amber/src/test/python/core/util/test_atomic.py | 26 +++-----------------------
2 files changed, 4 insertions(+), 24 deletions(-)
diff --git a/amber/src/main/python/core/util/atomic.py
b/amber/src/main/python/core/util/atomic.py
index a73619489f..55fecb0f45 100644
--- a/amber/src/main/python/core/util/atomic.py
+++ b/amber/src/main/python/core/util/atomic.py
@@ -53,6 +53,6 @@ class AtomicInteger:
def get_and_set(self, v):
with self._lock:
- old_value = self.value
+ old_value = self._value
self._value = int(v)
return old_value
diff --git a/amber/src/test/python/core/util/test_atomic.py
b/amber/src/test/python/core/util/test_atomic.py
index 0f6b393233..824b25b671 100644
--- a/amber/src/test/python/core/util/test_atomic.py
+++ b/amber/src/test/python/core/util/test_atomic.py
@@ -65,13 +65,7 @@ class TestAtomicIntegerSingleThreaded:
a.value = "100"
assert a.value == 100
- def test_get_and_set_currently_deadlocks_on_non_reentrant_lock(self):
- # Bug pin: get_and_set acquires self._lock and then reads self.value,
- # which is a property that ALSO tries to acquire self._lock. The lock
- # is a non-reentrant threading.Lock, so the call deadlocks the moment
- # it is invoked. Document via thread + timeout so the test surfaces
- # the deadlock without hanging the whole suite, and pair it with an
- # xfail-strict test below that asserts the intended contract.
+ def test_get_and_set_does_not_deadlock_on_non_reentrant_lock(self):
a = AtomicInteger(10)
started = threading.Event()
completed = threading.Event()
@@ -96,23 +90,9 @@ class TestAtomicIntegerSingleThreaded:
assert not errors, (
f"get_and_set raised before reaching the deadlock spin:
{errors[0]!r}"
)
- assert worker.is_alive(), (
- "worker thread exited unexpectedly — get_and_set neither
deadlocked "
- "nor completed; the test no longer pins the documented bug."
- )
- assert not completed.is_set(), (
- "get_and_set unexpectedly returned — the deadlock bug appears
fixed; "
- "delete this pinned test along with the xfail below."
- )
+ assert not worker.is_alive()
+ assert completed.is_set()
- @pytest.mark.xfail(
- strict=True,
- reason=(
- "Known bug: AtomicInteger.get_and_set deadlocks because it holds "
- "the non-reentrant lock while accessing the value property. "
- "This xfail flips to XPASS when the bug is fixed."
- ),
- )
@pytest.mark.timeout(2)
def test_get_and_set_should_return_old_value_and_replace_state(self):
a = AtomicInteger(10)