Hi Thomas,

On 21.05.20 22:05, Thomas Gleixner wrote:
From: Thomas Gleixner <t...@linutronix.de>

Device interrupts which go through do_IRQ() or the spurious interrupt
handler have their separate entry code on 64 bit for no good reason.

Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in
pt_regs. Further the vector number is forced to fit into an u8 and is
complemented and offset by 0x80 so it's in the signed character
range. Otherwise GAS would expand the pushq to a 5 byte instruction for any
vector > 0x7F.

Treat the vector number like an error code and hand it to the C function as
argument. This allows to get rid of the extra entry code in a later step.

Simplify the error code push magic by implementing the pushq imm8 via a
'.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the
pushq imm8 is sign extending the resulting error code needs to be truncated
to 8 bits in C code.

Originally-by: Andy Lutomirski <l...@kernel.org>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Acked-by: Andy Lutomirski <l...@kernel.org>

I'm currently trying to understand a performance regression with ScyllaDB on i3en.3xlarge (KVM based VM on Skylake) which we reliably bisected down to this commit:

  https://github.com/scylladb/scylla/issues/7036

What we're seeing is that syscalls such as membarrier() take forever (0-10 µs would be normal):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 53.26   12.458881      185953        67           membarrier
 15.79    3.693651       17843       207        49 read
 11.17    2.613350       67008        39           io_pgetevents
 10.89    2.547772       11795       216           timerfd_settime
  6.91    1.616802       11073       146           rt_sigprocmask
  1.39    0.325955        3542        92           timer_settime
  0.36    0.083691         526       159           io_submit
  0.22    0.051399         535        96           write
  0.00    0.000783          37        21           sendmsg
  0.00    0.000057           3        18         9 ioctl
------ ----------- ----------- --------- --------- ----------------
100.00   23.392341                  1061        58 total

That again seems to stem from a vastly slowed down smp_call_function_many_cond():

Samples: 218K of event 'cpu-clock', 4000 Hz
Overhead  Shared Object        Symbol
  94.51%  [kernel]             [k] smp_call_function_many_cond
   0.76%  [kernel]             [k] __do_softirq
   0.32%  [kernel]             [k] native_queued_spin_lock_slowpath
[...]

which is stuck in

       │     csd_lock_wait():
       │             smp_cond_load_acquire(&csd->flags, !(VAL &
  0.00 │       mov    0x8(%rcx),%edx
  0.00 │       and    $0x1,%edx
       │     ↓ je     2b9
       │     rep_nop():
  0.70 │2af:   pause
       │     csd_lock_wait():
 92.82 │       mov    0x8(%rcx),%edx
  6.48 │       and    $0x1,%edx
  0.00 │     ↑ jne    2af
  0.00 │2b9: ↑ jmp    282


Given the patch at hand I was expecting lost IPIs, but I can't quite see anything getting lost.

Do you have any further pointers I could look at?


Thanks,

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


Reply via email to