Hi, Peter, Paolo,
On 2023/1/10 δΈε10:45, Peter Xu wrote:
On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote:
Hi, Peter and Paolo,
Hi, Chuang, Paolo,
I'm sorry I didn't reply to your email in time. I was infected with
COVID-19 two weeks ago, so I couldn't think about the problems discussed
in your email for a long time.π·
On 2023/1/4 δΈε1:43, Peter Xu wrote:
Hi, Paolo,
On Wed, Dec 28, 2022 at 09:27:50AM +0100, Paolo Bonzini wrote:
Il ven 23 dic 2022, 16:54 Peter Xu ha scritto:
This is not valid because the transaction could happen in *another*
thread.
In that case memory_region_transaction_depth() will be > 0, but RCU is
needed.
Do you mean the code is wrong, or the comment? Note that the code has
checked rcu_read_locked() where introduced in patch 1, but maybe
something
else was missed?
The assertion is wrong. It will succeed even if RCU is unlocked in this
thread but a transaction is in progress in another thread.
IIUC this is the case where the context:
(1) doesn't have RCU read lock held, and,
(2) doesn't have BQL held.
Is it safe at all to reference any flatview in such a context? The thing
is I think the flatview pointer can be freed anytime if both locks are
not
taken.
Perhaps you can check (memory_region_transaction_depth() > 0 &&
!qemu_mutex_iothread_locked()) || rcu_read_locked() instead?
What if one thread calls address_space_to_flatview() with BQL held but
not
RCU read lock held? I assume it's a legal operation, but it seems to be
able to trigger the assert already?
Thanks,
I'm not sure whether I understand the content of your discussion correctly,
so here I want to explain my understanding firstly.
From my perspective, Paolo thinks that when thread 1 is in a transaction,
thread 2 will trigger the assertion when accessing the flatview without
holding RCU read lock, although sometimes the thread 2's access to flatview
is legal. So Paolo suggests checking (memory_region_transaction_depth() > 0
&& !qemu_mutex_iothread_locked()) || rcu_read_locked() instead.
And Peter thinks that as long as no thread holds the BQL or RCU read lock,
the old flatview can be released (actually executed by the rcu thread with
BQL held). When thread 1 is in a transaction, if thread 2 access the
flatview
with BQL held but not RCU read lock held, it's a legal operation. In this
legal case, it seems that both my code and Paolo's code will trigger
assertion.
IIUC your original patch is fine in this case (BQL held, RCU not held), as
long as depth==0. IMHO what we want to trap here is when BQL held (while
RCU is not) and depth>0 which can cause unpredictable side effect of using
obsolete flatview.
I don't quite understand the side effects of depth>0 when BQL is held (while
RCU is not).
In my perspective, both BQL and RCU can ensure that the flatview will not be
released when the worker thread accesses the flatview, because before the rcu
thread releases the flatview, it will make sure itself holding BQL and the
worker thread not holding RCU. So, whether the depth is 0 or not, as long as
BQL or RCU is held, the worker thread will not access the obsolete flatview
(IIUC 'obsolete' means that flatview is released).
To summarize, the original check didn't consider BQL, and if to consider
BQL I think it should be something like:
/* Guarantees valid access to the flatview, either lock works */
assert(BQL_HELD() || RCU_HELD());
/*
* Guarantees any BQL holder is not reading obsolete flatview (e.g. when
* during vm load)
*/
if (BQL_HELD())
assert(depth==0);
IIUC it can be merged into:
assert((BQL_HELD() && depth==0) || RCU_HELD());
IMHO assert(BQL_HELD() || RCU_HELD()) is enough..
Or you think that once a mr transaction is in progress, the old flatview has
been obsolete? If we regard flatview as obsolete when a mr transaction is in
progress, How can RCU ensure that flatview is not obsolete?
What does Paolo think of this check?
Thanks!
I'm not sure if I have a good understanding of your emails? I think
checking(memory_region_transaction_get_depth() == 0 || rcu_read_locked() ||
qemu_mutex_iothread_locked()) should cover the case you discussed.
This seems still problematic too? Since the assert can pass even if
neither BQL nor RCU is held (as long as depth==0).
Thanks,