> > > > Signed-off-by: Anand K Mistry <amis...@google.com> > > Signed-off-by: Anand K Mistry <amis...@chromium.org> > > Two SoBs by you, why?
Tooling issues probably. Not intentional. > > > --- > > Background: > > IBPB is slow on some CPUs. > > > > More detailed background: > > On some CPUs, issuing an IBPB can cause the address space switch to be > > 10x more expensive (yes, 10x, not 10%). > > Which CPUs are those?! AMD A4-9120C. Probably the A6-9220C too, but I don't have one of those machines to test with, > > > On a system that makes heavy use of processes, this can cause a very > > significant performance hit. > > You're not really trying to convince reviewers for why you need to add > more complexity to an already too complex and confusing code. "some > CPUs" and "can cause" is not good enough. On a simple ping-ping test between two processes (using a pair of pipes), a process switch is ~7us with IBPB disabled. But with it enabled, it increases to around 80us (tested with the powersave CPU governor). On Chrome's IPC system, a perftest running 50,000 ping-pong messages: without IBPB 5579.49 ms with IBPB 21396 ms (~4x difference) And, doing video playback in the browser (which is already very optimised), the IBPB hit turns out to be ~2.5% of CPU cycles. Doing a webrtc video call (tested using http://appr.tc), it's ~9% of CPU cycles. I don't have exact numbers, but it's worse on some real VC apps. > > > I understand this is likely to be very contentious. Obviously, this > > isn't ready for code review, but I'm hoping to get some thoughts on the > > problem and this approach. > > Yes, in the absence of hard performance data, I'm not convinced at all. With this change, I can get a >80% reduction in CPU cycles consumed by IBPB. A video call on my test device goes from ~9% to ~0.80% cycles used by IBPB. It doesn't sound like much, but it's a significant difference on these devices.