On Mon, Jul 10, 2017 at 17:33:07 -0400, Paolo Bonzini wrote: > > > I agree that it would be nice to have the same mechanism for all. > > > > The main hurdle I see is how to allow for concurrent code generation while > > minimizing flushes of the single, fixed-size[*] code_gen_buffer. > > In user-mode this is tricky because there is no way to bound the number > > of threads that might be spawned by the guest code (I don't think reading > > /proc/sys/kernel/threads-max is a viable solution here). > > > > Switching to a "__thread *tcg_ctx_ptr" model will help minimize > > user-mode/softmmu differences though. The only remaining difference would be > > that user-mode would need tb_lock() around tb_gen_code, whereas softmmu > > wouldn't, but everything else would be the same. > > Hmm, tb_gen_code is already protected by mmap_lock in linux-user, so you > wouldn't > get any parallelism. On the other hand, you could just say that the > fixed-size > code_gen_buffer is protected by mmap_lock, which doesn't exist for softmmu.
Yes. tb_lock/mmap_lock, or like they're called in some asserts, memory_lock. A way to get some parallelism in user-mode given the constraints would be to share regions among TCG threads. Threads would still need to take a per-region lock, but it wouldn't be a global lock so that would scale better. I'm not sure we really need that much parallelism for code generation in user-mode, though. So I wouldn't focus on this until seeing benchmarks that have a clear bottleneck due to "memory_lock". E.