On Tue, 14 Mar 2017 13:34:38 +1100 Benjamin Herrenschmidt <b...@kernel.crashing.org> wrote:
> On Tue, 2017-03-14 at 11:49 +1000, Nicholas Piggin wrote: > > On Tue, 14 Mar 2017 10:31:08 +1100 > > > Benjamin Herrenschmidt <b...@kernel.crashing.org> wrote: > > > > > On Mon, 2017-03-13 at 03:13 +1000, Nicholas Piggin wrote: > > > > Hi, > > > > > > > > Just after the previous two fixes, I would like to propose changing > > > > the way we do doorbell vs interrupt controller IPIs, and add support > > > > for global doorbells supported by POWER9 in HV mode. > > > > > > > > After this, the platform code knows about doorbells and interrupt > > > > controller IPIs, rather than they know about each other. > > > > > > A few things come to mind: > > > > > > - We don't want to use doorbells under KVM. They are going to turn > > > into traps and be emulated, slower than using H_IPI, at least on P9. > > > Even for core only doorbells. I'm not sure how to convey that to the > > > guest. > > > > msgsndp will be okay, won't it? Guest just chooses that based on > > HVMODE (which pseries platform knows is core only). > > No. It will suck. Because KVM can run each guest thread on a different core, > the HW won't work, so we have to disable it and trap the instructions & > emulate > them. We really don't want P9 guests to use it under KVM (it's fine under > pHyp). Ah, gotcha. > > > - On PP9 DD1 we need a CI load instead of msgsync (a DARN instruction > > > would do too if it works) > > > > Yes, Paul pointed this out too. I'll add an alt patch for it. Apparently > > also msgsync needs lwsync afterwards for DD2. > > Odd. Ok. > > > > - Can we get rid of the atomic ops for manipulating the IPI mux ? What > > > about a cache line per message and just set/clear ? If we clear in the > > > doorbell handler before we call the respective targets, we shouldn't > > > "lose" messages no ? As long as the actual handlers "loop" as necessary > > > of course. > > > > Yes I think that would work. Good idea. A single cacheline with messages > > being independently stored bytes within it might work better, so the > > receiver CPU does not have to go through and load multiple cachelines > > to check for messages. It could load up to 8 message types with one load. > > Ok. But we need to make sure we use multiple stores to not lose messages. > > Ie. > > - Load all > - For each byte if set > - clear byte > - then call handler Yes. I think that will be okay because we shouldn't get any load-hit-store issues. I'll do some benchmarking anyway. Thanks, Nick