chenBright opened a new pull request, #3286:
URL: https://github.com/apache/brpc/pull/3286
### What problem does this PR solve?
Issue Number: resolve #3051
Problem Summary:
The current bthread RWLock is implemented with a go-like state machine
where the reader count is bumped first and only rolled back if everything
later succeeds. As pointed out in #3051, this state is **not safely
reversible on a partial failure**:
* When read times out while a writer holds the lock, the pre-incremented
reader counter is left dangling. Subsequent writers can no longer acquire
the lock, even after every legitimate reader has left.
* For the same structural reason, bthread rwlock cannot offer correct
`try_*` / `timed_*` APIs at all -- any cancellable wait risks corrupting
the lock's accounting.
PR #1031 already proposed a writer-priority rwlock that fits this
requirement;
this PR is built on top of that proposal.
### What is changed and the side effects?
Changed:
**Reimplement `bthread_rwlock_t` on top of a writer-priority algorithm
derived from PR #1031.** The lock state is split into:
* `lock_word` (butex): bit 31 = writer held, bits 0..30 = reader count;
* `writer_wait_count` (butex): in-flight writer counter used by readers
to honor writer-priority;
* `writer_queue_mutex`: serializes writers competing for `lock_word`.
Every wait/park step is fully reversible, so the lock now correctly
supports `try_*` / `timed_*` operations:
* `bthread_rwlock_tryrdlock` / `bthread_rwlock_timedrdlock`
* `bthread_rwlock_trywrlock` / `bthread_rwlock_timedwrlock`
* `bthread_rwlock_unlock` (auto-dispatches to read/write unlock by
inspecting `lock_word`).
On any failure path (`EBUSY`, `ETIMEDOUT`, `EINTR`-leading-to-fail) the
side effects are rolled back via `rwlock_wrlock_cleanup` (release the
inner
mutex if held; decrement `writer_wait_count`; wake parked readers if we
were the last in-flight writer).
**Replace the unconditional broadcast in unlock with selective wakeup**:
* `unrdlock`: only the last reader (`lock_word` 1 -> 0) does
`butex_wake(lock_word)` for the at-most-one writer parked on it
(writers are serialized by `writer_queue_mutex`, so a single wake
suffices). Earlier readers do not wake anyone.
* `unwrlock`: deliberately unlocks `writer_queue_mutex` *first* (so the
next queued writer -- which has already self-accounted in
`writer_wait_count` -- gets a chance), and only does
`butex_wake_all(writer_wait_count)` when *we* were the last in-flight
writer (`fetch_sub` returned 1). The intentional ordering preserves
strict writer-priority and is documented in detail in the source.
This eliminates the previous "wake everybody, let them re-contend"
pattern for both readers and writers, which removes a stampede on
contended workloads.
Side effects:
- Performance effects:
- Breaking backward compatibility:
---
### Check List:
- Please make sure your changes are compilable.
- When providing us with a new feature, it is best to add related tests.
- Please follow [Contributor Covenant Code of
Conduct](https://github.com/apache/brpc/blob/master/CODE_OF_CONDUCT.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]