Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-05 Thread Paolo Bonzini


On 05/07/2016 00:31, Emilio G. Cota wrote:
> My mistake. An atomic_read here isn't needed: as the commit message
> points out, we only need atomic_read when tb_lock isn't held. In this
> case tb_lock is held, so we only use atomic accesses for writing
> to the array.

It's harmless though.  In C11 and C++11 it would even be required, so I
think it's better to add it even though our compilers don't yet enforce it.

Paolo



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-04 Thread Emilio G. Cota
On Sat, Jul 02, 2016 at 08:09:35 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota  writes:
> 
> > On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> >> From: Sergey Fedorov 
> > (snip)
> >> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState 
> >> *cpu,
> >> is executed. */
> >>  cpu_get_tb_cpu_state(env, , _base, );
> >>  tb_lock();
> >> -tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> >> +tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
> >>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
> >>   tb->flags != flags)) {
> >>  tb = tb_find_slow(cpu, pc, cs_base, flags);
> >> diff --git a/translate-all.c b/translate-all.c
> >> index eaa95e4..1fcfe79 100644
> >> --- a/translate-all.c
> >> +++ b/translate-all.c
> >> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
> >> tb_page_addr_t page_addr)
> >>  invalidate_page_bitmap(p);
> >>  }
> >>
> >> +/* Ensure that we won't find the TB in the shared hash table
> >> + * if we con't see it in CPU's local cache.
> >
> > s/con't/can't/
> >
> >> + * Pairs with smp_rmb() in tb_find_slow(). */
> >> +smp_wmb();
> >
> > This fence is already embedded in qht_remove, since it internally
> > calls seqlock_write_end() on a successful removal, so we could get
> > away with a comment instead of emitting a redundant fence.
> > However, if qht ever changed its implementation this would have
> > to be taken into account. So I'd be OK with emitting the
> > fence here too.
> >
> >> +
> >>  /* remove the TB from the hash list */
> >>  h = tb_jmp_cache_hash_func(tb->pc);
> >>  CPU_FOREACH(cpu) {
> >>  if (cpu->tb_jmp_cache[h] == tb) {
> >
> > Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {
> 
> Oops, good catch.

My mistake. An atomic_read here isn't needed: as the commit message
points out, we only need atomic_read when tb_lock isn't held. In this
case tb_lock is held, so we only use atomic accesses for writing
to the array.

E.



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-04 Thread Emilio G. Cota
On Fri, Jul 01, 2016 at 17:32:01 -0700, Richard Henderson wrote:
> On 07/01/2016 05:17 PM, Emilio G. Cota wrote:
> >On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> >>From: Sergey Fedorov 
> >(snip)
> >>@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState 
> >>*cpu,
> >>is executed. */
> >> cpu_get_tb_cpu_state(env, , _base, );
> >> tb_lock();
> >>-tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> >>+tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
> >> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
> >>  tb->flags != flags)) {
> >> tb = tb_find_slow(cpu, pc, cs_base, flags);
> >>diff --git a/translate-all.c b/translate-all.c
> >>index eaa95e4..1fcfe79 100644
> >>--- a/translate-all.c
> >>+++ b/translate-all.c
> >>@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
> >>tb_page_addr_t page_addr)
> >> invalidate_page_bitmap(p);
> >> }
> >>
> >>+/* Ensure that we won't find the TB in the shared hash table
> >>+ * if we con't see it in CPU's local cache.
> >
> >s/con't/can't/
> >
> >>+ * Pairs with smp_rmb() in tb_find_slow(). */
> >>+smp_wmb();
> >
> >This fence is already embedded in qht_remove, since it internally
> >calls seqlock_write_end() on a successful removal...
> 
> No.  There's stuff that happens after qht_remove and before this barrier:
> tb_page_remove and invalidate_page_bitmap.

I can't see how that "tb page" stuff you refer to is relevant here.

AFAICT the barrier pairing in this patch only applies to tb_jmp_cache
and qht. If as you say it does not, then all comments and the commit message
are wrong.

What am I missing?

Emilio



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-02 Thread Alex Bennée

Emilio G. Cota  writes:

> On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov 
> (snip)
>> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState 
>> *cpu,
>> is executed. */
>>  cpu_get_tb_cpu_state(env, , _base, );
>>  tb_lock();
>> -tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>> +tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>   tb->flags != flags)) {
>>  tb = tb_find_slow(cpu, pc, cs_base, flags);
>> diff --git a/translate-all.c b/translate-all.c
>> index eaa95e4..1fcfe79 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
>> tb_page_addr_t page_addr)
>>  invalidate_page_bitmap(p);
>>  }
>>
>> +/* Ensure that we won't find the TB in the shared hash table
>> + * if we con't see it in CPU's local cache.
>
> s/con't/can't/
>
>> + * Pairs with smp_rmb() in tb_find_slow(). */
>> +smp_wmb();
>
> This fence is already embedded in qht_remove, since it internally
> calls seqlock_write_end() on a successful removal, so we could get
> away with a comment instead of emitting a redundant fence.
> However, if qht ever changed its implementation this would have
> to be taken into account. So I'd be OK with emitting the
> fence here too.
>
>> +
>>  /* remove the TB from the hash list */
>>  h = tb_jmp_cache_hash_func(tb->pc);
>>  CPU_FOREACH(cpu) {
>>  if (cpu->tb_jmp_cache[h] == tb) {
>
> Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {

Oops, good catch.

>
>> -cpu->tb_jmp_cache[h] = NULL;
>> +atomic_set(>tb_jmp_cache[h], NULL);
>
> Other than that,
>
>   Reviewed-by: Emilio G. Cota 


--
Alex Bennée



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-01 Thread Richard Henderson

On 07/01/2016 05:17 PM, Emilio G. Cota wrote:

On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:

From: Sergey Fedorov 

(snip)

@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
is executed. */
 cpu_get_tb_cpu_state(env, , _base, );
 tb_lock();
-tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
  tb->flags != flags)) {
 tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index eaa95e4..1fcfe79 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 invalidate_page_bitmap(p);
 }

+/* Ensure that we won't find the TB in the shared hash table
+ * if we con't see it in CPU's local cache.


s/con't/can't/


+ * Pairs with smp_rmb() in tb_find_slow(). */
+smp_wmb();


This fence is already embedded in qht_remove, since it internally
calls seqlock_write_end() on a successful removal...


No.  There's stuff that happens after qht_remove and before this barrier: 
tb_page_remove and invalidate_page_bitmap.



r~



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-01 Thread Emilio G. Cota
On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> From: Sergey Fedorov 
(snip)
> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState 
> *cpu,
> is executed. */
>  cpu_get_tb_cpu_state(env, , _base, );
>  tb_lock();
> -tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> +tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>   tb->flags != flags)) {
>  tb = tb_find_slow(cpu, pc, cs_base, flags);
> diff --git a/translate-all.c b/translate-all.c
> index eaa95e4..1fcfe79 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
> tb_page_addr_t page_addr)
>  invalidate_page_bitmap(p);
>  }
>  
> +/* Ensure that we won't find the TB in the shared hash table
> + * if we con't see it in CPU's local cache.

s/con't/can't/

> + * Pairs with smp_rmb() in tb_find_slow(). */
> +smp_wmb();

This fence is already embedded in qht_remove, since it internally
calls seqlock_write_end() on a successful removal, so we could get
away with a comment instead of emitting a redundant fence.
However, if qht ever changed its implementation this would have
to be taken into account. So I'd be OK with emitting the
fence here too.

> +
>  /* remove the TB from the hash list */
>  h = tb_jmp_cache_hash_func(tb->pc);
>  CPU_FOREACH(cpu) {
>  if (cpu->tb_jmp_cache[h] == tb) {

Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {

> -cpu->tb_jmp_cache[h] = NULL;
> +atomic_set(>tb_jmp_cache[h], NULL);

Other than that,

  Reviewed-by: Emilio G. Cota 



Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-01 Thread Richard Henderson

On 07/01/2016 09:16 AM, Alex Bennée wrote:

From: Sergey Fedorov 

First, ensure atomicity of CPU's 'tb_jmp_cache' access by:
 * using atomic_read() to look up a TB when not holding 'tb_lock';
 * using atomic_write() to remove a TB from each CPU's local cache on
   TB invalidation.

Second, add some memory barriers to ensure we don't put the TB being
invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in
CPU's local cache because it is being invalidated by some other thread
then it must not be found in the shared TB hash table. Otherwise we'd
put it back to CPU's local cache.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
[AJB: fixed missing atomic set, tweak title]
Signed-off-by: Alex Bennée 

---
AJB:
  - tweak title
  - fixed missing set of tb_jmp_cache
---
 cpu-exec.c  | 9 +++--
 translate-all.c | 7 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

2016-07-01 Thread Alex Bennée
From: Sergey Fedorov 

First, ensure atomicity of CPU's 'tb_jmp_cache' access by:
 * using atomic_read() to look up a TB when not holding 'tb_lock';
 * using atomic_write() to remove a TB from each CPU's local cache on
   TB invalidation.

Second, add some memory barriers to ensure we don't put the TB being
invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in
CPU's local cache because it is being invalidated by some other thread
then it must not be found in the shared TB hash table. Otherwise we'd
put it back to CPU's local cache.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
[AJB: fixed missing atomic set, tweak title]
Signed-off-by: Alex Bennée 

---
AJB:
  - tweak title
  - fixed missing set of tb_jmp_cache
---
 cpu-exec.c  | 9 +++--
 translate-all.c | 7 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d..10ce1cb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -285,6 +285,11 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 {
 TranslationBlock *tb;
 
+/* Ensure that we won't find a TB in the shared hash table
+ * if it is being invalidated by some other thread.
+ * Otherwise we'd put it back to CPU's local cache.
+ * Pairs with smp_wmb() in tb_phys_invalidate(). */
+smp_rmb();
 tb = tb_find_physical(cpu, pc, cs_base, flags);
 if (tb) {
 goto found;
@@ -315,7 +320,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 
 found:
 /* we add the TB in the virtual pc hash table */
-cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+atomic_set(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
 return tb;
 }
 
@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
is executed. */
 cpu_get_tb_cpu_state(env, , _base, );
 tb_lock();
-tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+tb = atomic_read(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
  tb->flags != flags)) {
 tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index eaa95e4..1fcfe79 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 invalidate_page_bitmap(p);
 }
 
+/* Ensure that we won't find the TB in the shared hash table
+ * if we con't see it in CPU's local cache.
+ * Pairs with smp_rmb() in tb_find_slow(). */
+smp_wmb();
+
 /* remove the TB from the hash list */
 h = tb_jmp_cache_hash_func(tb->pc);
 CPU_FOREACH(cpu) {
 if (cpu->tb_jmp_cache[h] == tb) {
-cpu->tb_jmp_cache[h] = NULL;
+atomic_set(>tb_jmp_cache[h], NULL);
 }
 }
 
-- 
2.7.4