On Mon, Jul 17, 2017 at 17:40:29 -1000, Richard Henderson wrote: > On 07/17/2017 02:27 PM, Emilio G. Cota wrote: > >On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote: > >>On 07/16/2017 10:03 AM, Emilio G. Cota wrote: > >>>@@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, > >>>tb_page_addr_t page_addr) > >>> assert_tb_locked(); > >>>- atomic_set(&tb->invalid, true); > >>>- > >>> /* remove the TB from the hash list */ > >>> phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > >>> h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); > >>> qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); > >>>+ /* > >>>+ * Mark the TB as invalid *after* it's been removed from tb_hash, > >>>which > >>>+ * eliminates the need to check this bit on lookups. > >>>+ */ > >>>+ tb->invalid = true; > >> > >>I believe you need atomic_store_release here. Previously we were relying on > >>the lock acquisition in qht_remove to provide the required memory barrier. > >> > >>We definitely need to make sure this reaches memory before we zap the TB in > >>the CPU_FOREACH loop. > > > >After this patch tb->invalid is only read/set with tb_lock held, so no need > >for > >atomics while accessing it. > > I think there's a path by which we do get stale data. > For threads A and B, > > (A) Lookup succeeds for TB in hash without tb_lock > (B) Removes TB from hash > (B) Sets tb->invalid > (B) Clears FORALL_CPU jmp_cache > (A) Store TB into local jmp_cache > > ... and since we never check for invalid again, there's nothing to evict TB > from the jmp_cache again.
Ouch. Yes I see it now. What threw me off was that in lookup_tb_ptr we're not checking tb->invalid, and that biased me into thinking that it's not needed. But I should have tried harder. Also, that's a bug, and yet another reason to have tb_lookup, so that we fix these things at once in one place. > Here's a plan that will make me happy: > > (1) Drop this patch, leaving the set of tb->invalid (or CF_INVALID) in place. > (2) Include CF_INVALID in the mask of bits compared in tb_lookup__cpu_state. > (a) At this point in the patch set that's just > > (tb->flags & CF_INVALID) == 0 > > (b) Later in the patch series when CF_PARALLEL is introduced > (and CF_HASH_MASK, lets call it, instead of the cf_mask > function you have now), this becomes > > (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask > > So that we continue to check CF_INVALID each time we lookup a TB, but now we > get it for free as a part of the other flags check. With the annoying atomic_read thrown in there :-) but yes, will do. E.