On Wed, Nov 11, 2020 at 06:46:37PM +0000, Andrew Cooper wrote:

> Well...
> 
> static_calls are a newer, and more generic, form of pvops.  Most of the
> magic is to do with inlining small fragments, but static calls can do
> that now too, IIRC?

If you're referring to this glorious hack:

  
https://lkml.kernel.org/r/20201110101307.go2...@hirez.programming.kicks-ass.net

that only 'works' because it's a single instruction. That is,
static_call can only poke single instructions. They cannot replace a
call with "PUSHF; POP" / "PUSH; POPF" for example. They also cannot do
NOP padding for 'short' sequences.

Paravirt, like alternatives, are special in that they only happen once,
before SMP bringup.

> >> Something really disguisting we could do is recognise the indirect call
> >> offset and emit an extra ORC entry for RIP+1. So the cases are:
> >>
> >>    CALL *pv_ops.save_fl    -- 7 bytes IIRC
> >>    CALL $imm;              -- 5 bytes
> >>    PUSHF; POP %[RE]AX      -- 2 bytes
> >>
> >> so the RIP+1 (the POP insn) will only ever exist in this case. The
> >> indirect and direct call cases would never land on that IP.
> > I had a similar idea, and a bit of deja vu - we may have talked about
> > this before.  At least I know we talked about doing something similar
> > for alternatives which muck with the stack.

Vague memories... luckily we managed to get alternatives to a state
where they match, which is much saner.

> The main complexity with pvops is that the
> 
>     CALL *pv_ops.save_fl
> 
> form needs to be usable from extremely early in the day (pre general
> patching), hence the use of function pointers and some non-standard ABIs.

The performance rasins mentioned below are a large part of the
non-standard ABI (eg CALLEE_SAVE)

> For performance reasons, the end result of this pvop wants to be `pushf;
> pop %[re]ax` in then native case, and `call xen_pv_save_fl` in the Xen
> case, but this doesn't mean that the compiled instruction needs to be a
> function pointer to begin with.

Not sure emitting the native code would be feasible.. also
cpu_usergs_sysret64 is 6 bytes.

> Would objtool have an easier time coping if this were implemented in
> terms of a static call?

I doubt it, the big problem is that there is no visibility into the
actual alternative text. Runtime patching fragments into static call
would have the exact same problem.

Something that _might_ maybe work is trying to morph the immediate
fragments into an alternative. That is, instead of this:

static inline notrace unsigned long arch_local_save_flags(void)
{
        return PVOP_CALLEE0(unsigned long, irq.save_fl);
}

Write it something like:

static inline notrace unsigned long arch_local_save_flags(void)
{
        PVOP_CALL_ARGS;
        PVOP_TEST_NULL(irq.save_fl);
        asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
                                        "PUSHF; POP _ASM_AX",
                                        X86_FEATURE_NATIVE)
                            : CLBR_RET_REG, ASM_CALL_CONSTRAINT
                            : paravirt_type(irq.save_fl.func),
                              paravirt_clobber(PVOP_CALLEE_CLOBBERS)
                            : "memory", "cc");
        return __eax;
}

And then we have to teach objtool how to deal with conflicting
alternatives...

That would remove most (all, if we can figure out a form that deals with
the spinlock fragments) of paravirt_patch.c

Hmm?

Reply via email to