On 08/07/16 17:07, Paolo Bonzini wrote: > > On 08/07/2016 14:32, Sergey Fedorov wrote: >>>>>> I think we can do even better. One option is using a separate tiny lock >>>>>> to protect direct jump set/reset instead of tb_lock. >>>> If you have to use a separate tiny lock, you don't gain anything compared >>>> to the two critical sections, do you? >> If we have a separate lock for direct jump set/reset then we can do fast >> TB lookup + direct jump patching without taking tb_lock at all. How much >> this would reduce lock contention largely depends on the workload we use. > Yeah, it probably would be easy enough that it's hard to object to it > (unlike the other idea below, which I'm not very comfortable with, at > least without seeing patches). > > The main advantage would be that this tiny lock could be a spinlock > rather than a mutex.
Well, the problem is more subtle than we thought: tb_find_fast() can race with tb_phys_invalidate(). The first tb_find_phys() out of the lock can return a TB which is being invalidated. Then a direct jump can be set up to this TB. It can happen after concurrent tb_phys_invalidate() resets all the direct jumps to the TB. Thus we can end up with a direct jump to an invalidated TB. Even extending tb_lock critical section wouldn't help if at least one tb_find_phys() is performed out of the lock. Kind regards, Sergey > >>> The one below is even more complicated. I'm all for simple lock-free >>> stacks (e.g. QSLIST_INSERT_HEAD_ATOMIC and QSLIST_MOVE_ATOMIC), but >>> lock-free >>> lists are too much, especially if with the complicated first/next mechanism >>> of TCG's chained block lists. >> Direct jump handling code is pretty isolated and self-contained. It >> would require to back out of tb_remove_from_jmp_list() and sprinkle a >> couple of atomic_rcu_read()/atomic_rcu_set() with some comments, I >> think. Maybe it could be easier to justify looking at actual patches.