Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Jessica Yu

+++ Miroslav Benes [06/04/16 10:43 +0200]:

On Wed, 6 Apr 2016, Miroslav Benes wrote:


Anyway I see there are some new comments on github. I'll look at those.
But I'd prefer to discuss all the relevant things (that is kpatch
unspecific) here. It would make it easier.


And you do (after seeing dates of the posts there), sorry for the noise.

Jessica, I think I am perfectly fine with introducing some arch-specific
code because of this problem.

We used generic apply_relocate_add() because it was a single
arch-independent entry point. There is no such things for paravirt_ops,
alternatives, jump labels and such things. In fact only module_finalize()
is there and that is not enough. So some arch-specific code in livepatch
seems to be unnecessary.



Yeah, unfortunately that appears to be the case..Luckily I don't think
we need to add much code; the calls should be similar to the way we
call apply_relocate_add(), just pass in the right sections.

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Jessica Yu

+++ Miroslav Benes [06/04/16 10:43 +0200]:

On Wed, 6 Apr 2016, Miroslav Benes wrote:


Anyway I see there are some new comments on github. I'll look at those.
But I'd prefer to discuss all the relevant things (that is kpatch
unspecific) here. It would make it easier.


And you do (after seeing dates of the posts there), sorry for the noise.

Jessica, I think I am perfectly fine with introducing some arch-specific
code because of this problem.

We used generic apply_relocate_add() because it was a single
arch-independent entry point. There is no such things for paravirt_ops,
alternatives, jump labels and such things. In fact only module_finalize()
is there and that is not enough. So some arch-specific code in livepatch
seems to be unnecessary.



Yeah, unfortunately that appears to be the case..Luckily I don't think
we need to add much code; the calls should be similar to the way we
call apply_relocate_add(), just pass in the right sections.

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Jessica Yu

+++ Jessica Yu [05/04/16 15:19 -0400]:

+++ Miroslav Benes [05/04/16 15:07 +0200]:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
  instruction which is patched by apply_paravirt(), even though the
  patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
  apply a klp_reloc to the instruction in F' which was already patched
  by apply_paravirt() in step 1.  This results in undefined behavior
  because it tries to patch the original instruction but instead
  patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...


I don't have a 100% clear understanding of the whole picture either,
but I'll try to help clarify up some things..


1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


Just to dispel some confusion over this, kpatch isn't "smart" enough
yet to differentiate between exported and non-exported symbols, as
Evgenii already mentioned. Just global and local, and whether the
symbol belongs to a module or vmlinux. So that means dynrelas are
indeed being created for the pv_*_ops symbols, despite the fact they
are exported.


Gah, slight mistake, I should have mentioned that the above case only
applies to patches to modules (i.e. there is no mechanism to detect
exported symbols for patches to modules), but for vmlinux patches
there is (and those stay normal relas). I'm sorry if I confused
anybody. The relevant dynrela generation code is here if anyone's
interested:
https://github.com/dynup/kpatch/blob/master/kpatch-build/create-diff-object.c#L2661

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Jessica Yu

+++ Jessica Yu [05/04/16 15:19 -0400]:

+++ Miroslav Benes [05/04/16 15:07 +0200]:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
  instruction which is patched by apply_paravirt(), even though the
  patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
  apply a klp_reloc to the instruction in F' which was already patched
  by apply_paravirt() in step 1.  This results in undefined behavior
  because it tries to patch the original instruction but instead
  patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...


I don't have a 100% clear understanding of the whole picture either,
but I'll try to help clarify up some things..


1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


Just to dispel some confusion over this, kpatch isn't "smart" enough
yet to differentiate between exported and non-exported symbols, as
Evgenii already mentioned. Just global and local, and whether the
symbol belongs to a module or vmlinux. So that means dynrelas are
indeed being created for the pv_*_ops symbols, despite the fact they
are exported.


Gah, slight mistake, I should have mentioned that the above case only
applies to patches to modules (i.e. there is no mechanism to detect
exported symbols for patches to modules), but for vmlinux patches
there is (and those stay normal relas). I'm sorry if I confused
anybody. The relevant dynrela generation code is here if anyone's
interested:
https://github.com/dynup/kpatch/blob/master/kpatch-build/create-diff-object.c#L2661

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> On Wed, Apr 06, 2016 at 02:09:01PM +0200, Miroslav Benes wrote:
> > On Wed, 6 Apr 2016, Chris J Arges wrote:
> > 
> > > I think this approach needs more thought and my code has bug(s).
> > 
> > And indeed there is...
> > 
> > long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, 
> > unsigned long arg) = NULL;
> > 
> > Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
> > static.
> > 
> > kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
> > variable from the patch module.
> > 
> > Miroslav
> >
> 
> Well that was the bug, I was really stumped why it was giving me a wierd
> address for a function. Once I changed my pointer name to something else it
> worked, so there was no difference to these approaches. I also had to modify
> the symbol lookup to happen in the livepatch so we ensure that the module is
> loaded in this case and not get a NULL deref.

Just a remark. With this change there is a call to kallsyms_lookup_name 
for each call to patched function. This is not optimal. What we do in 
kgraft is that we register a module notifier which calls 
kallsyms_lookup_name when to-be-patched module arrives. It is not nice but 
it works.

Miroslav

> 
> The fixed code is here:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works.2/
> 
> This out of tree patch doesn't have the same failure as building a patch with
> kpatch-build which is what we expect since it doesn't have livepatch relocs. 
> In
> addition I tested with the kvm module loaded _after_ the livepatch module and
> no failure was observed.
> 
> --chris
> 



Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> On Wed, Apr 06, 2016 at 02:09:01PM +0200, Miroslav Benes wrote:
> > On Wed, 6 Apr 2016, Chris J Arges wrote:
> > 
> > > I think this approach needs more thought and my code has bug(s).
> > 
> > And indeed there is...
> > 
> > long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, 
> > unsigned long arg) = NULL;
> > 
> > Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
> > static.
> > 
> > kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
> > variable from the patch module.
> > 
> > Miroslav
> >
> 
> Well that was the bug, I was really stumped why it was giving me a wierd
> address for a function. Once I changed my pointer name to something else it
> worked, so there was no difference to these approaches. I also had to modify
> the symbol lookup to happen in the livepatch so we ensure that the module is
> loaded in this case and not get a NULL deref.

Just a remark. With this change there is a call to kallsyms_lookup_name 
for each call to patched function. This is not optimal. What we do in 
kgraft is that we register a module notifier which calls 
kallsyms_lookup_name when to-be-patched module arrives. It is not nice but 
it works.

Miroslav

> 
> The fixed code is here:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works.2/
> 
> This out of tree patch doesn't have the same failure as building a patch with
> kpatch-build which is what we expect since it doesn't have livepatch relocs. 
> In
> addition I tested with the kvm module loaded _after_ the livepatch module and
> no failure was observed.
> 
> --chris
> 



Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Chris J Arges
On Wed, Apr 06, 2016 at 02:09:01PM +0200, Miroslav Benes wrote:
> On Wed, 6 Apr 2016, Chris J Arges wrote:
> 
> > I think this approach needs more thought and my code has bug(s).
> 
> And indeed there is...
> 
> long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, unsigned 
> long arg) = NULL;
> 
> Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
> static.
> 
> kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
> variable from the patch module.
> 
> Miroslav
>

Well that was the bug, I was really stumped why it was giving me a wierd
address for a function. Once I changed my pointer name to something else it
worked, so there was no difference to these approaches. I also had to modify
the symbol lookup to happen in the livepatch so we ensure that the module is
loaded in this case and not get a NULL deref.

The fixed code is here:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works.2/

This out of tree patch doesn't have the same failure as building a patch with
kpatch-build which is what we expect since it doesn't have livepatch relocs. In
addition I tested with the kvm module loaded _after_ the livepatch module and
no failure was observed.

--chris



Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Chris J Arges
On Wed, Apr 06, 2016 at 02:09:01PM +0200, Miroslav Benes wrote:
> On Wed, 6 Apr 2016, Chris J Arges wrote:
> 
> > I think this approach needs more thought and my code has bug(s).
> 
> And indeed there is...
> 
> long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, unsigned 
> long arg) = NULL;
> 
> Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
> static.
> 
> kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
> variable from the patch module.
> 
> Miroslav
>

Well that was the bug, I was really stumped why it was giving me a wierd
address for a function. Once I changed my pointer name to something else it
worked, so there was no difference to these approaches. I also had to modify
the symbol lookup to happen in the livepatch so we ensure that the module is
loaded in this case and not get a NULL deref.

The fixed code is here:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works.2/

This out of tree patch doesn't have the same failure as building a patch with
kpatch-build which is what we expect since it doesn't have livepatch relocs. In
addition I tested with the kvm module loaded _after_ the livepatch module and
no failure was observed.

--chris



Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> I think this approach needs more thought and my code has bug(s).

And indeed there is...

long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, unsigned 
long arg) = NULL;

Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
static.

kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
variable from the patch module.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> I think this approach needs more thought and my code has bug(s).

And indeed there is...

long (*__kvm_arch_vm_ioctl)(struct file *filp, unsigned long ioctl, unsigned 
long arg) = NULL;

Use a different name than __kvm_arch_vm_ioctl and (ideally) make it 
static.

kallsyms_lookup_name("__kvm_arch_vm_ioctl") returns the address of this 
variable from the patch module.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Chris J Arges
On Wed, Apr 06, 2016 at 11:09:04AM +0200, Miroslav Benes wrote:
> On Wed, 6 Apr 2016, Chris J Arges wrote:
> 
> > On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> > > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > > 
> > > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > > applied to the "patch module", whereas the above code deals with the
> > > > initialization order of the "patched module".  This distinction
> > > > originally confused me as well, until Jessica set me straight.
> > > > 
> > > > Let me try to illustrate the problem with an example.  Imagine you have
> > > > a patch module P which applies a patch to module M.  P replaces M's
> > > > function F with a new function F', which uses paravirt ops.
> > > > 
> > > > 1) Patch P is loaded before module M.  P's new function F' has an
> > > >instruction which is patched by apply_paravirt(), even though the
> > > >patch hasn't been applied yet.
> > > > 
> > > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > > >apply a klp_reloc to the instruction in F' which was already patched
> > > >by apply_paravirt() in step 1.  This results in undefined behavior
> > > >because it tries to patch the original instruction but instead
> > > >patches the new paravirt instruction.
> > > > 
> > > > So the above patch makes no difference because the paravirt module
> > > > loading order doesn't really matter.
> > > 
> > > Hi,
> > > 
> > > we are trying really hard to understand the actual culprit here and as it 
> > > is quite confusing I have several questions/comments...
> > > 
> > > 1. can you provide dynrela sections of the patch module from 
> > > https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> > > kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> > > exported) symbols from the first look. The problem should be there only 
> > > if 
> > > you want to patch a function which reference some paravirt_ops unexported 
> > > symbol. For that symbol dynrela should be created.
> > > 
> > > 2. I am almost 100 percent sure that the second problem Chris mentions in 
> > > github post is something different. If he patches only 
> > > kvm_arch_vm_ioctl() 
> > > so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> > > problem. Because it is a trivial livepatching case. There are no dynrelas 
> > > and no alternatives in the patch module. The crash is suspicious and we 
> > > have a problem somewhere. Chris, can you please provide more info about 
> > > that? That is how exactly you used kallsyms_lookup_name() and so on...
> > > 
> > 
> > Miroslav,
> > 
> > In my original comment I was trying to see if this was a kpatch-build 
> > specific
> > issue and that's why I tried to reproduce using just a simple out of tree 
> > built
> > livepatch module. For this case I copied kvm_arch_vm_ioctl into
> > __kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
> > base kernel and kvm.ko module.  However, for the crashing and non-crashing
> > cases I used two slightly different base kernels and livepatch code.
> > 
> > I just re-tested this using the latest livepatch.git/for-next code and have 
> > the
> > following:
> > 
> > This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
> > symbol and then call it from my livepatch:
> > http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/
> > 
> > This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
> > livepatch just call it directly:
> > http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/
> > 
> > Where livepatch.c is the livepatch and kernel.patch is the base kernel 
> > patch for
> > both directories.
> 
> Thank you, Chris.
> 
> First (before to reproduce it) I have to ask one thing. What is the order 
> of module loading? Do you first load kvm.ko and then livepatch or the 
> opposite.
> 

I just tested this experimentally, in both cases kvm and kvm_intel was loaded
before the livepatch module was inserted. The bug is triggered only when I
actually start the VM and only when using my kallsyms_lookup_name hack.

> It does not matter in the first case where the function is exported. 
> Because of it there would be a dependency of the modules so once you load 
> livepatch module kvm.ko is loaded before it. This is the only way to do 
> it. Undefined symbols of livepatch module are thus easily resolved.
> 
> The situation differs in the second case though. You use 
> kallsyms_lookup_name to circumvent dynrela for __kvm_arch_vm_ioctl (it is 
> not exported now). So when you load livepatch module kallsyms_lookup_name 
> would in fact fail and the kernel would then crash. Although the bug would 
> be different (NULL pointer dereference) and I guess you are aware of that 
> because you have a printk there. But just to make sure...
> 
> Thanks
> Miroslav

I 

Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Chris J Arges
On Wed, Apr 06, 2016 at 11:09:04AM +0200, Miroslav Benes wrote:
> On Wed, 6 Apr 2016, Chris J Arges wrote:
> 
> > On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> > > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > > 
> > > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > > applied to the "patch module", whereas the above code deals with the
> > > > initialization order of the "patched module".  This distinction
> > > > originally confused me as well, until Jessica set me straight.
> > > > 
> > > > Let me try to illustrate the problem with an example.  Imagine you have
> > > > a patch module P which applies a patch to module M.  P replaces M's
> > > > function F with a new function F', which uses paravirt ops.
> > > > 
> > > > 1) Patch P is loaded before module M.  P's new function F' has an
> > > >instruction which is patched by apply_paravirt(), even though the
> > > >patch hasn't been applied yet.
> > > > 
> > > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > > >apply a klp_reloc to the instruction in F' which was already patched
> > > >by apply_paravirt() in step 1.  This results in undefined behavior
> > > >because it tries to patch the original instruction but instead
> > > >patches the new paravirt instruction.
> > > > 
> > > > So the above patch makes no difference because the paravirt module
> > > > loading order doesn't really matter.
> > > 
> > > Hi,
> > > 
> > > we are trying really hard to understand the actual culprit here and as it 
> > > is quite confusing I have several questions/comments...
> > > 
> > > 1. can you provide dynrela sections of the patch module from 
> > > https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> > > kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> > > exported) symbols from the first look. The problem should be there only 
> > > if 
> > > you want to patch a function which reference some paravirt_ops unexported 
> > > symbol. For that symbol dynrela should be created.
> > > 
> > > 2. I am almost 100 percent sure that the second problem Chris mentions in 
> > > github post is something different. If he patches only 
> > > kvm_arch_vm_ioctl() 
> > > so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> > > problem. Because it is a trivial livepatching case. There are no dynrelas 
> > > and no alternatives in the patch module. The crash is suspicious and we 
> > > have a problem somewhere. Chris, can you please provide more info about 
> > > that? That is how exactly you used kallsyms_lookup_name() and so on...
> > > 
> > 
> > Miroslav,
> > 
> > In my original comment I was trying to see if this was a kpatch-build 
> > specific
> > issue and that's why I tried to reproduce using just a simple out of tree 
> > built
> > livepatch module. For this case I copied kvm_arch_vm_ioctl into
> > __kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
> > base kernel and kvm.ko module.  However, for the crashing and non-crashing
> > cases I used two slightly different base kernels and livepatch code.
> > 
> > I just re-tested this using the latest livepatch.git/for-next code and have 
> > the
> > following:
> > 
> > This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
> > symbol and then call it from my livepatch:
> > http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/
> > 
> > This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
> > livepatch just call it directly:
> > http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/
> > 
> > Where livepatch.c is the livepatch and kernel.patch is the base kernel 
> > patch for
> > both directories.
> 
> Thank you, Chris.
> 
> First (before to reproduce it) I have to ask one thing. What is the order 
> of module loading? Do you first load kvm.ko and then livepatch or the 
> opposite.
> 

I just tested this experimentally, in both cases kvm and kvm_intel was loaded
before the livepatch module was inserted. The bug is triggered only when I
actually start the VM and only when using my kallsyms_lookup_name hack.

> It does not matter in the first case where the function is exported. 
> Because of it there would be a dependency of the modules so once you load 
> livepatch module kvm.ko is loaded before it. This is the only way to do 
> it. Undefined symbols of livepatch module are thus easily resolved.
> 
> The situation differs in the second case though. You use 
> kallsyms_lookup_name to circumvent dynrela for __kvm_arch_vm_ioctl (it is 
> not exported now). So when you load livepatch module kallsyms_lookup_name 
> would in fact fail and the kernel would then crash. Although the bug would 
> be different (NULL pointer dereference) and I guess you are aware of that 
> because you have a printk there. But just to make sure...
> 
> Thanks
> Miroslav

I 

Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > 
> > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > applied to the "patch module", whereas the above code deals with the
> > > initialization order of the "patched module".  This distinction
> > > originally confused me as well, until Jessica set me straight.
> > > 
> > > Let me try to illustrate the problem with an example.  Imagine you have
> > > a patch module P which applies a patch to module M.  P replaces M's
> > > function F with a new function F', which uses paravirt ops.
> > > 
> > > 1) Patch P is loaded before module M.  P's new function F' has an
> > >instruction which is patched by apply_paravirt(), even though the
> > >patch hasn't been applied yet.
> > > 
> > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > >apply a klp_reloc to the instruction in F' which was already patched
> > >by apply_paravirt() in step 1.  This results in undefined behavior
> > >because it tries to patch the original instruction but instead
> > >patches the new paravirt instruction.
> > > 
> > > So the above patch makes no difference because the paravirt module
> > > loading order doesn't really matter.
> > 
> > Hi,
> > 
> > we are trying really hard to understand the actual culprit here and as it 
> > is quite confusing I have several questions/comments...
> > 
> > 1. can you provide dynrela sections of the patch module from 
> > https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> > kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> > exported) symbols from the first look. The problem should be there only if 
> > you want to patch a function which reference some paravirt_ops unexported 
> > symbol. For that symbol dynrela should be created.
> > 
> > 2. I am almost 100 percent sure that the second problem Chris mentions in 
> > github post is something different. If he patches only kvm_arch_vm_ioctl() 
> > so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> > problem. Because it is a trivial livepatching case. There are no dynrelas 
> > and no alternatives in the patch module. The crash is suspicious and we 
> > have a problem somewhere. Chris, can you please provide more info about 
> > that? That is how exactly you used kallsyms_lookup_name() and so on...
> > 
> 
> Miroslav,
> 
> In my original comment I was trying to see if this was a kpatch-build specific
> issue and that's why I tried to reproduce using just a simple out of tree 
> built
> livepatch module. For this case I copied kvm_arch_vm_ioctl into
> __kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
> base kernel and kvm.ko module.  However, for the crashing and non-crashing
> cases I used two slightly different base kernels and livepatch code.
> 
> I just re-tested this using the latest livepatch.git/for-next code and have 
> the
> following:
> 
> This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
> symbol and then call it from my livepatch:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/
> 
> This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
> livepatch just call it directly:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/
> 
> Where livepatch.c is the livepatch and kernel.patch is the base kernel patch 
> for
> both directories.

Thank you, Chris.

First (before to reproduce it) I have to ask one thing. What is the order 
of module loading? Do you first load kvm.ko and then livepatch or the 
opposite.

It does not matter in the first case where the function is exported. 
Because of it there would be a dependency of the modules so once you load 
livepatch module kvm.ko is loaded before it. This is the only way to do 
it. Undefined symbols of livepatch module are thus easily resolved.

The situation differs in the second case though. You use 
kallsyms_lookup_name to circumvent dynrela for __kvm_arch_vm_ioctl (it is 
not exported now). So when you load livepatch module kallsyms_lookup_name 
would in fact fail and the kernel would then crash. Although the bug would 
be different (NULL pointer dereference) and I guess you are aware of that 
because you have a printk there. But just to make sure...

Thanks
Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Chris J Arges wrote:

> On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > 
> > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > applied to the "patch module", whereas the above code deals with the
> > > initialization order of the "patched module".  This distinction
> > > originally confused me as well, until Jessica set me straight.
> > > 
> > > Let me try to illustrate the problem with an example.  Imagine you have
> > > a patch module P which applies a patch to module M.  P replaces M's
> > > function F with a new function F', which uses paravirt ops.
> > > 
> > > 1) Patch P is loaded before module M.  P's new function F' has an
> > >instruction which is patched by apply_paravirt(), even though the
> > >patch hasn't been applied yet.
> > > 
> > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > >apply a klp_reloc to the instruction in F' which was already patched
> > >by apply_paravirt() in step 1.  This results in undefined behavior
> > >because it tries to patch the original instruction but instead
> > >patches the new paravirt instruction.
> > > 
> > > So the above patch makes no difference because the paravirt module
> > > loading order doesn't really matter.
> > 
> > Hi,
> > 
> > we are trying really hard to understand the actual culprit here and as it 
> > is quite confusing I have several questions/comments...
> > 
> > 1. can you provide dynrela sections of the patch module from 
> > https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> > kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> > exported) symbols from the first look. The problem should be there only if 
> > you want to patch a function which reference some paravirt_ops unexported 
> > symbol. For that symbol dynrela should be created.
> > 
> > 2. I am almost 100 percent sure that the second problem Chris mentions in 
> > github post is something different. If he patches only kvm_arch_vm_ioctl() 
> > so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> > problem. Because it is a trivial livepatching case. There are no dynrelas 
> > and no alternatives in the patch module. The crash is suspicious and we 
> > have a problem somewhere. Chris, can you please provide more info about 
> > that? That is how exactly you used kallsyms_lookup_name() and so on...
> > 
> 
> Miroslav,
> 
> In my original comment I was trying to see if this was a kpatch-build specific
> issue and that's why I tried to reproduce using just a simple out of tree 
> built
> livepatch module. For this case I copied kvm_arch_vm_ioctl into
> __kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
> base kernel and kvm.ko module.  However, for the crashing and non-crashing
> cases I used two slightly different base kernels and livepatch code.
> 
> I just re-tested this using the latest livepatch.git/for-next code and have 
> the
> following:
> 
> This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
> symbol and then call it from my livepatch:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/
> 
> This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
> livepatch just call it directly:
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/
> 
> Where livepatch.c is the livepatch and kernel.patch is the base kernel patch 
> for
> both directories.

Thank you, Chris.

First (before to reproduce it) I have to ask one thing. What is the order 
of module loading? Do you first load kvm.ko and then livepatch or the 
opposite.

It does not matter in the first case where the function is exported. 
Because of it there would be a dependency of the modules so once you load 
livepatch module kvm.ko is loaded before it. This is the only way to do 
it. Undefined symbols of livepatch module are thus easily resolved.

The situation differs in the second case though. You use 
kallsyms_lookup_name to circumvent dynrela for __kvm_arch_vm_ioctl (it is 
not exported now). So when you load livepatch module kallsyms_lookup_name 
would in fact fail and the kernel would then crash. Although the bug would 
be different (NULL pointer dereference) and I guess you are aware of that 
because you have a printk there. But just to make sure...

Thanks
Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Miroslav Benes wrote:

> On Wed, 6 Apr 2016, Miroslav Benes wrote:
> 
> > Anyway I see there are some new comments on github. I'll look at those. 
> > But I'd prefer to discuss all the relevant things (that is kpatch 
> > unspecific) here. It would make it easier.
> 
> And you do (after seeing dates of the posts there), sorry for the noise.
> 
> Jessica, I think I am perfectly fine with introducing some arch-specific 
> code because of this problem.
> 
> We used generic apply_relocate_add() because it was a single 
> arch-independent entry point. There is no such things for paravirt_ops, 
> alternatives, jump labels and such things. In fact only module_finalize() 
> is there and that is not enough. So some arch-specific code in livepatch 
> seems to be unnecessary.

s/unnecessary/necessary/


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Miroslav Benes wrote:

> On Wed, 6 Apr 2016, Miroslav Benes wrote:
> 
> > Anyway I see there are some new comments on github. I'll look at those. 
> > But I'd prefer to discuss all the relevant things (that is kpatch 
> > unspecific) here. It would make it easier.
> 
> And you do (after seeing dates of the posts there), sorry for the noise.
> 
> Jessica, I think I am perfectly fine with introducing some arch-specific 
> code because of this problem.
> 
> We used generic apply_relocate_add() because it was a single 
> arch-independent entry point. There is no such things for paravirt_ops, 
> alternatives, jump labels and such things. In fact only module_finalize() 
> is there and that is not enough. So some arch-specific code in livepatch 
> seems to be unnecessary.

s/unnecessary/necessary/


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Miroslav Benes wrote:

> Anyway I see there are some new comments on github. I'll look at those. 
> But I'd prefer to discuss all the relevant things (that is kpatch 
> unspecific) here. It would make it easier.

And you do (after seeing dates of the posts there), sorry for the noise.

Jessica, I think I am perfectly fine with introducing some arch-specific 
code because of this problem.

We used generic apply_relocate_add() because it was a single 
arch-independent entry point. There is no such things for paravirt_ops, 
alternatives, jump labels and such things. In fact only module_finalize() 
is there and that is not enough. So some arch-specific code in livepatch 
seems to be unnecessary.

Cheers,
Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Wed, 6 Apr 2016, Miroslav Benes wrote:

> Anyway I see there are some new comments on github. I'll look at those. 
> But I'd prefer to discuss all the relevant things (that is kpatch 
> unspecific) here. It would make it easier.

And you do (after seeing dates of the posts there), sorry for the noise.

Jessica, I think I am perfectly fine with introducing some arch-specific 
code because of this problem.

We used generic apply_relocate_add() because it was a single 
arch-independent entry point. There is no such things for paravirt_ops, 
alternatives, jump labels and such things. In fact only module_finalize() 
is there and that is not enough. So some arch-specific code in livepatch 
seems to be unnecessary.

Cheers,
Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Tue, 5 Apr 2016, Jessica Yu wrote:

> +++ Miroslav Benes [05/04/16 15:07 +0200]:
> > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > 
> > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > applied to the "patch module", whereas the above code deals with the
> > > initialization order of the "patched module".  This distinction
> > > originally confused me as well, until Jessica set me straight.
> > > 
> > > Let me try to illustrate the problem with an example.  Imagine you have
> > > a patch module P which applies a patch to module M.  P replaces M's
> > > function F with a new function F', which uses paravirt ops.
> > > 
> > > 1) Patch P is loaded before module M.  P's new function F' has an
> > >instruction which is patched by apply_paravirt(), even though the
> > >patch hasn't been applied yet.
> > > 
> > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > >apply a klp_reloc to the instruction in F' which was already patched
> > >by apply_paravirt() in step 1.  This results in undefined behavior
> > >because it tries to patch the original instruction but instead
> > >patches the new paravirt instruction.
> > > 
> > > So the above patch makes no difference because the paravirt module
> > > loading order doesn't really matter.
> > 
> > Hi,
> > 
> > we are trying really hard to understand the actual culprit here and as it
> > is quite confusing I have several questions/comments...
> 
> I don't have a 100% clear understanding of the whole picture either,
> but I'll try to help clarify up some things..
> 
> > 1. can you provide dynrela sections of the patch module from
> > https://github.com/dynup/kpatch/issues/580? What is interesting is that
> > kvm_arch_vm_ioctl() function contains rela records only for trivial (==
> > exported) symbols from the first look. The problem should be there only if
> > you want to patch a function which reference some paravirt_ops unexported
> > symbol. For that symbol dynrela should be created.
> 
> Just to dispel some confusion over this, kpatch isn't "smart" enough
> yet to differentiate between exported and non-exported symbols, as
> Evgenii already mentioned. Just global and local, and whether the
> symbol belongs to a module or vmlinux. So that means dynrelas are
> indeed being created for the pv_*_ops symbols, despite the fact they
> are exported.

Ah, ok. So for vmlinux you do not generate dynrelas for exported symbols 
while for modules there is no such logic. This makes the problem only more 
pronounced. We were wondering how it was possible there were crashes in 
this case as there were only exported symbols there. This is an 
explanation.

> So this is part of the problem, since apply_paravirt() runs first (as
> part of module_finalize()), and dynrelas are written second,
> livepatch is clobbering/overwriting those paravirt patch sites that
> apply_paravirt had fixed up already.

Yes.

> If you skip generating dynrelas that affect those paravirt patch
> sites, I can verify that the crash disappears, since in that case
> we're not stepping over those paravirt patch sites with our dynrelas
> (just to test and verify the problem, I kind of cheated and forced
> kpatch to not generate dynrelas for pv_*_ops symbols, and the panics
> disappear). Another temporary solution was to not apply the dynrela
> if the target memory is not all zero's. These approaches aren't
> reliable enough to serve as a permanent solution but they do give us
> a better idea of what's happening. Here are some more relevant
> comments -

Agreed.

> https://github.com/dynup/kpatch/issues/580#issuecomment-199314636

As Josh correctly pointed out there could be paravirt instructions that
need relocations. Maybe spinlocks. And as I browsed through the code I got 
the feeling that the relocations were needed just for the very application 
of the paravirt instruction.

See PVOP_VCALL* macros and especially PARAVIRT_CALL. There is a relocation 
record for this call in an object file. Must be.

Anyway I see there are some new comments on github. I'll look at those. 
But I'd prefer to discuss all the relevant things (that is kpatch 
unspecific) here. It would make it easier.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-06 Thread Miroslav Benes
On Tue, 5 Apr 2016, Jessica Yu wrote:

> +++ Miroslav Benes [05/04/16 15:07 +0200]:
> > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> > 
> > > So I think this doesn't fix the problem.  Dynamic relocations are
> > > applied to the "patch module", whereas the above code deals with the
> > > initialization order of the "patched module".  This distinction
> > > originally confused me as well, until Jessica set me straight.
> > > 
> > > Let me try to illustrate the problem with an example.  Imagine you have
> > > a patch module P which applies a patch to module M.  P replaces M's
> > > function F with a new function F', which uses paravirt ops.
> > > 
> > > 1) Patch P is loaded before module M.  P's new function F' has an
> > >instruction which is patched by apply_paravirt(), even though the
> > >patch hasn't been applied yet.
> > > 
> > > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> > >apply a klp_reloc to the instruction in F' which was already patched
> > >by apply_paravirt() in step 1.  This results in undefined behavior
> > >because it tries to patch the original instruction but instead
> > >patches the new paravirt instruction.
> > > 
> > > So the above patch makes no difference because the paravirt module
> > > loading order doesn't really matter.
> > 
> > Hi,
> > 
> > we are trying really hard to understand the actual culprit here and as it
> > is quite confusing I have several questions/comments...
> 
> I don't have a 100% clear understanding of the whole picture either,
> but I'll try to help clarify up some things..
> 
> > 1. can you provide dynrela sections of the patch module from
> > https://github.com/dynup/kpatch/issues/580? What is interesting is that
> > kvm_arch_vm_ioctl() function contains rela records only for trivial (==
> > exported) symbols from the first look. The problem should be there only if
> > you want to patch a function which reference some paravirt_ops unexported
> > symbol. For that symbol dynrela should be created.
> 
> Just to dispel some confusion over this, kpatch isn't "smart" enough
> yet to differentiate between exported and non-exported symbols, as
> Evgenii already mentioned. Just global and local, and whether the
> symbol belongs to a module or vmlinux. So that means dynrelas are
> indeed being created for the pv_*_ops symbols, despite the fact they
> are exported.

Ah, ok. So for vmlinux you do not generate dynrelas for exported symbols 
while for modules there is no such logic. This makes the problem only more 
pronounced. We were wondering how it was possible there were crashes in 
this case as there were only exported symbols there. This is an 
explanation.

> So this is part of the problem, since apply_paravirt() runs first (as
> part of module_finalize()), and dynrelas are written second,
> livepatch is clobbering/overwriting those paravirt patch sites that
> apply_paravirt had fixed up already.

Yes.

> If you skip generating dynrelas that affect those paravirt patch
> sites, I can verify that the crash disappears, since in that case
> we're not stepping over those paravirt patch sites with our dynrelas
> (just to test and verify the problem, I kind of cheated and forced
> kpatch to not generate dynrelas for pv_*_ops symbols, and the panics
> disappear). Another temporary solution was to not apply the dynrela
> if the target memory is not all zero's. These approaches aren't
> reliable enough to serve as a permanent solution but they do give us
> a better idea of what's happening. Here are some more relevant
> comments -

Agreed.

> https://github.com/dynup/kpatch/issues/580#issuecomment-199314636

As Josh correctly pointed out there could be paravirt instructions that
need relocations. Maybe spinlocks. And as I browsed through the code I got 
the feeling that the relocations were needed just for the very application 
of the paravirt instruction.

See PVOP_VCALL* macros and especially PARAVIRT_CALL. There is a relocation 
record for this call in an object file. Must be.

Anyway I see there are some new comments on github. I'll look at those. 
But I'd prefer to discuss all the relevant things (that is kpatch 
unspecific) here. It would make it easier.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Chris J Arges
On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> 
> > So I think this doesn't fix the problem.  Dynamic relocations are
> > applied to the "patch module", whereas the above code deals with the
> > initialization order of the "patched module".  This distinction
> > originally confused me as well, until Jessica set me straight.
> > 
> > Let me try to illustrate the problem with an example.  Imagine you have
> > a patch module P which applies a patch to module M.  P replaces M's
> > function F with a new function F', which uses paravirt ops.
> > 
> > 1) Patch P is loaded before module M.  P's new function F' has an
> >instruction which is patched by apply_paravirt(), even though the
> >patch hasn't been applied yet.
> > 
> > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> >apply a klp_reloc to the instruction in F' which was already patched
> >by apply_paravirt() in step 1.  This results in undefined behavior
> >because it tries to patch the original instruction but instead
> >patches the new paravirt instruction.
> > 
> > So the above patch makes no difference because the paravirt module
> > loading order doesn't really matter.
> 
> Hi,
> 
> we are trying really hard to understand the actual culprit here and as it 
> is quite confusing I have several questions/comments...
> 
> 1. can you provide dynrela sections of the patch module from 
> https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> exported) symbols from the first look. The problem should be there only if 
> you want to patch a function which reference some paravirt_ops unexported 
> symbol. For that symbol dynrela should be created.
> 
> 2. I am almost 100 percent sure that the second problem Chris mentions in 
> github post is something different. If he patches only kvm_arch_vm_ioctl() 
> so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> problem. Because it is a trivial livepatching case. There are no dynrelas 
> and no alternatives in the patch module. The crash is suspicious and we 
> have a problem somewhere. Chris, can you please provide more info about 
> that? That is how exactly you used kallsyms_lookup_name() and so on...
> 

Miroslav,

In my original comment I was trying to see if this was a kpatch-build specific
issue and that's why I tried to reproduce using just a simple out of tree built
livepatch module. For this case I copied kvm_arch_vm_ioctl into
__kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
base kernel and kvm.ko module.  However, for the crashing and non-crashing
cases I used two slightly different base kernels and livepatch code.

I just re-tested this using the latest livepatch.git/for-next code and have the
following:

This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
symbol and then call it from my livepatch:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/

This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
livepatch just call it directly:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/

Where livepatch.c is the livepatch and kernel.patch is the base kernel patch for
both directories.

--chris


Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Chris J Arges
On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> 
> > So I think this doesn't fix the problem.  Dynamic relocations are
> > applied to the "patch module", whereas the above code deals with the
> > initialization order of the "patched module".  This distinction
> > originally confused me as well, until Jessica set me straight.
> > 
> > Let me try to illustrate the problem with an example.  Imagine you have
> > a patch module P which applies a patch to module M.  P replaces M's
> > function F with a new function F', which uses paravirt ops.
> > 
> > 1) Patch P is loaded before module M.  P's new function F' has an
> >instruction which is patched by apply_paravirt(), even though the
> >patch hasn't been applied yet.
> > 
> > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> >apply a klp_reloc to the instruction in F' which was already patched
> >by apply_paravirt() in step 1.  This results in undefined behavior
> >because it tries to patch the original instruction but instead
> >patches the new paravirt instruction.
> > 
> > So the above patch makes no difference because the paravirt module
> > loading order doesn't really matter.
> 
> Hi,
> 
> we are trying really hard to understand the actual culprit here and as it 
> is quite confusing I have several questions/comments...
> 
> 1. can you provide dynrela sections of the patch module from 
> https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> exported) symbols from the first look. The problem should be there only if 
> you want to patch a function which reference some paravirt_ops unexported 
> symbol. For that symbol dynrela should be created.
> 
> 2. I am almost 100 percent sure that the second problem Chris mentions in 
> github post is something different. If he patches only kvm_arch_vm_ioctl() 
> so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> problem. Because it is a trivial livepatching case. There are no dynrelas 
> and no alternatives in the patch module. The crash is suspicious and we 
> have a problem somewhere. Chris, can you please provide more info about 
> that? That is how exactly you used kallsyms_lookup_name() and so on...
> 

Miroslav,

In my original comment I was trying to see if this was a kpatch-build specific
issue and that's why I tried to reproduce using just a simple out of tree built
livepatch module. For this case I copied kvm_arch_vm_ioctl into
__kvm_arch_vm_ioctl (to simplify patching). I then built and installed this
base kernel and kvm.ko module.  However, for the crashing and non-crashing
cases I used two slightly different base kernels and livepatch code.

I just re-tested this using the latest livepatch.git/for-next code and have the
following:

This crashes if I use kallsyms_lookup_name to find the __kvm_arch_vm_ioctl
symbol and then call it from my livepatch:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.crashes/

This works if I export __kvm_arch_vm_ioctl in the base kernel, and in the
livepatch just call it directly:
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl.works/

Where livepatch.c is the livepatch and kernel.patch is the base kernel patch for
both directories.

--chris


Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Jessica Yu

+++ Miroslav Benes [05/04/16 15:07 +0200]:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
   instruction which is patched by apply_paravirt(), even though the
   patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
   apply a klp_reloc to the instruction in F' which was already patched
   by apply_paravirt() in step 1.  This results in undefined behavior
   because it tries to patch the original instruction but instead
   patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...


I don't have a 100% clear understanding of the whole picture either,
but I'll try to help clarify up some things..


1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


Just to dispel some confusion over this, kpatch isn't "smart" enough
yet to differentiate between exported and non-exported symbols, as
Evgenii already mentioned. Just global and local, and whether the
symbol belongs to a module or vmlinux. So that means dynrelas are
indeed being created for the pv_*_ops symbols, despite the fact they
are exported.

So this is part of the problem, since apply_paravirt() runs first (as
part of module_finalize()), and dynrelas are written second,
livepatch is clobbering/overwriting those paravirt patch sites that
apply_paravirt had fixed up already.

If you skip generating dynrelas that affect those paravirt patch
sites, I can verify that the crash disappears, since in that case
we're not stepping over those paravirt patch sites with our dynrelas
(just to test and verify the problem, I kind of cheated and forced
kpatch to not generate dynrelas for pv_*_ops symbols, and the panics
disappear). Another temporary solution was to not apply the dynrela
if the target memory is not all zero's. These approaches aren't
reliable enough to serve as a permanent solution but they do give us
a better idea of what's happening. Here are some more relevant
comments -
https://github.com/dynup/kpatch/issues/580#issuecomment-199314636
https://github.com/dynup/kpatch/issues/580#issuecomment-202395452

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Jessica Yu

+++ Miroslav Benes [05/04/16 15:07 +0200]:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
   instruction which is patched by apply_paravirt(), even though the
   patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
   apply a klp_reloc to the instruction in F' which was already patched
   by apply_paravirt() in step 1.  This results in undefined behavior
   because it tries to patch the original instruction but instead
   patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...


I don't have a 100% clear understanding of the whole picture either,
but I'll try to help clarify up some things..


1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


Just to dispel some confusion over this, kpatch isn't "smart" enough
yet to differentiate between exported and non-exported symbols, as
Evgenii already mentioned. Just global and local, and whether the
symbol belongs to a module or vmlinux. So that means dynrelas are
indeed being created for the pv_*_ops symbols, despite the fact they
are exported.

So this is part of the problem, since apply_paravirt() runs first (as
part of module_finalize()), and dynrelas are written second,
livepatch is clobbering/overwriting those paravirt patch sites that
apply_paravirt had fixed up already.

If you skip generating dynrelas that affect those paravirt patch
sites, I can verify that the crash disappears, since in that case
we're not stepping over those paravirt patch sites with our dynrelas
(just to test and verify the problem, I kind of cheated and forced
kpatch to not generate dynrelas for pv_*_ops symbols, and the panics
disappear). Another temporary solution was to not apply the dynrela
if the target memory is not all zero's. These approaches aren't
reliable enough to serve as a permanent solution but they do give us
a better idea of what's happening. Here are some more relevant
comments -
https://github.com/dynup/kpatch/issues/580#issuecomment-199314636
https://github.com/dynup/kpatch/issues/580#issuecomment-202395452

Jessica


Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Evgenii Shatokhin

05.04.2016 16:07, Miroslav Benes пишет:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
instruction which is patched by apply_paravirt(), even though the
patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
apply a klp_reloc to the instruction in F' which was already patched
by apply_paravirt() in step 1.  This results in undefined behavior
because it tries to patch the original instruction but instead
patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...

1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


As far as I understand it, create-diff-object does not check if a symbol 
is exported or not when a patch for a module rather than for vmlinux is 
being prepared.


In such cases, it only checks if the referenced global symbol is defined 
somewhere in that module and if not, creates a dynrela for it.


The code is in kpatch_create_dynamic_rela_sections() from 
https://github.com/dynup/kpatch/blob/master/kpatch-build/create-diff-object.c.




2. I am almost 100 percent sure that the second problem Chris mentions in
github post is something different. If he patches only kvm_arch_vm_ioctl()
so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no
problem. Because it is a trivial livepatching case. There are no dynrelas
and no alternatives in the patch module. The crash is suspicious and we
have a problem somewhere. Chris, can you please provide more info about
that? That is how exactly you used kallsyms_lookup_name() and so on...

3. Now I'll try to explain what really happens here... because of 1. and
the way the alternatives and relocations are implemented the only
problematic case is when one wants to patch a module which introduces its
own alternatives. Which is probably the case of KVM. Why?

When alternative is used, the call to some pv_*_ops.function is placed
somewhere. The address of this location is stored in a separate elf
section and when apply_paravirt() is called it takes the address and
place the new code there (or it does not, it depends :)). When the
alternative is in some module and pv_*_ops is exported, which is the
usual case, there is no problem. No dynrela needs to be used when a user
wants to patch such function with the alternative.

The only problem is when the function uses unexported pv_*_ops (or some
other alternative symbol) from its own object file. When the user wants to
patch this one there is a need for dynrela.

So what really happens is that we load a patch module, we do not apply
our relocations but we do apply alternatives to the patch module, which is
problematic because there is a reference to something which is not yet
resolved (that is unexported pv_*_ops). When a patched module arrives we
happily resolve relocations but since we do not apply alternatives again
there is a rubbish in the patching code.

Is this all correct?


Jessica proposed some novel fixes here:

https://github.com/dynup/kpatch/issues/580#issuecomment-183001652


Yes, I think that fix should be the same we have for relocations. To
postpone the alternatives applications. Maybe it is even possible to do it
in a similar way. And yes on one hand it is gonna be ugly, on the other
hand it is gonna be consistent with relocations.


I think the *real* problem here (and one that we've seen before) is that
we have a feature which allows you to load a patch to a module before
loading the module itself.  That really goes against the grain of how
module dependencies work.  It has already given us several headaches and
it makes the livepatch code a lot more complex.

I really think we need to take another hard look about whether it's
really worth it.  My current feeling is that it's not.


I can only say that maybe about 1/3 of kgraft patches we currently have
are for modules (1/3 is probably not correct but it is definitely

Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Evgenii Shatokhin

05.04.2016 16:07, Miroslav Benes пишет:

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:


So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
instruction which is patched by apply_paravirt(), even though the
patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
apply a klp_reloc to the instruction in F' which was already patched
by apply_paravirt() in step 1.  This results in undefined behavior
because it tries to patch the original instruction but instead
patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Hi,

we are trying really hard to understand the actual culprit here and as it
is quite confusing I have several questions/comments...

1. can you provide dynrela sections of the patch module from
https://github.com/dynup/kpatch/issues/580? What is interesting is that
kvm_arch_vm_ioctl() function contains rela records only for trivial (==
exported) symbols from the first look. The problem should be there only if
you want to patch a function which reference some paravirt_ops unexported
symbol. For that symbol dynrela should be created.


As far as I understand it, create-diff-object does not check if a symbol 
is exported or not when a patch for a module rather than for vmlinux is 
being prepared.


In such cases, it only checks if the referenced global symbol is defined 
somewhere in that module and if not, creates a dynrela for it.


The code is in kpatch_create_dynamic_rela_sections() from 
https://github.com/dynup/kpatch/blob/master/kpatch-build/create-diff-object.c.




2. I am almost 100 percent sure that the second problem Chris mentions in
github post is something different. If he patches only kvm_arch_vm_ioctl()
so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no
problem. Because it is a trivial livepatching case. There are no dynrelas
and no alternatives in the patch module. The crash is suspicious and we
have a problem somewhere. Chris, can you please provide more info about
that? That is how exactly you used kallsyms_lookup_name() and so on...

3. Now I'll try to explain what really happens here... because of 1. and
the way the alternatives and relocations are implemented the only
problematic case is when one wants to patch a module which introduces its
own alternatives. Which is probably the case of KVM. Why?

When alternative is used, the call to some pv_*_ops.function is placed
somewhere. The address of this location is stored in a separate elf
section and when apply_paravirt() is called it takes the address and
place the new code there (or it does not, it depends :)). When the
alternative is in some module and pv_*_ops is exported, which is the
usual case, there is no problem. No dynrela needs to be used when a user
wants to patch such function with the alternative.

The only problem is when the function uses unexported pv_*_ops (or some
other alternative symbol) from its own object file. When the user wants to
patch this one there is a need for dynrela.

So what really happens is that we load a patch module, we do not apply
our relocations but we do apply alternatives to the patch module, which is
problematic because there is a reference to something which is not yet
resolved (that is unexported pv_*_ops). When a patched module arrives we
happily resolve relocations but since we do not apply alternatives again
there is a rubbish in the patching code.

Is this all correct?


Jessica proposed some novel fixes here:

https://github.com/dynup/kpatch/issues/580#issuecomment-183001652


Yes, I think that fix should be the same we have for relocations. To
postpone the alternatives applications. Maybe it is even possible to do it
in a similar way. And yes on one hand it is gonna be ugly, on the other
hand it is gonna be consistent with relocations.


I think the *real* problem here (and one that we've seen before) is that
we have a feature which allows you to load a patch to a module before
loading the module itself.  That really goes against the grain of how
module dependencies work.  It has already given us several headaches and
it makes the livepatch code a lot more complex.

I really think we need to take another hard look about whether it's
really worth it.  My current feeling is that it's not.


I can only say that maybe about 1/3 of kgraft patches we currently have
are for modules (1/3 is probably not correct but it is definitely

Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Josh Poimboeuf
On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> 
> > So I think this doesn't fix the problem.  Dynamic relocations are
> > applied to the "patch module", whereas the above code deals with the
> > initialization order of the "patched module".  This distinction
> > originally confused me as well, until Jessica set me straight.
> > 
> > Let me try to illustrate the problem with an example.  Imagine you have
> > a patch module P which applies a patch to module M.  P replaces M's
> > function F with a new function F', which uses paravirt ops.
> > 
> > 1) Patch P is loaded before module M.  P's new function F' has an
> >instruction which is patched by apply_paravirt(), even though the
> >patch hasn't been applied yet.
> > 
> > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> >apply a klp_reloc to the instruction in F' which was already patched
> >by apply_paravirt() in step 1.  This results in undefined behavior
> >because it tries to patch the original instruction but instead
> >patches the new paravirt instruction.
> > 
> > So the above patch makes no difference because the paravirt module
> > loading order doesn't really matter.
> 
> Hi,
> 
> we are trying really hard to understand the actual culprit here and as it 
> is quite confusing I have several questions/comments...
> 
> 1. can you provide dynrela sections of the patch module from 
> https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> exported) symbols from the first look. The problem should be there only if 
> you want to patch a function which reference some paravirt_ops unexported 
> symbol. For that symbol dynrela should be created.

Agreed, kpatch-build could be smarter here.  However the problem still
exists for non-exported symbols, at least in theory.

> 2. I am almost 100 percent sure that the second problem Chris mentions in 
> github post is something different. If he patches only kvm_arch_vm_ioctl() 
> so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> problem. Because it is a trivial livepatching case. There are no dynrelas 
> and no alternatives in the patch module. The crash is suspicious and we 
> have a problem somewhere. Chris, can you please provide more info about 
> that? That is how exactly you used kallsyms_lookup_name() and so on...
> 
> 3. Now I'll try to explain what really happens here... because of 1. and 
> the way the alternatives and relocations are implemented the only 
> problematic case is when one wants to patch a module which introduces its 
> own alternatives. Which is probably the case of KVM. Why?
> 
> When alternative is used, the call to some pv_*_ops.function is placed 
> somewhere. The address of this location is stored in a separate elf 
> section and when apply_paravirt() is called it takes the address and 
> place the new code there (or it does not, it depends :)). When the 
> alternative is in some module and pv_*_ops is exported, which is the 
> usual case, there is no problem. No dynrela needs to be used when a user 
> wants to patch such function with the alternative.
> 
> The only problem is when the function uses unexported pv_*_ops (or some 
> other alternative symbol) from its own object file. When the user wants to 
> patch this one there is a need for dynrela.
> 
> So what really happens is that we load a patch module, we do not apply 
> our relocations but we do apply alternatives to the patch module, which is 
> problematic because there is a reference to something which is not yet 
> resolved (that is unexported pv_*_ops). When a patched module arrives we 
> happily resolve relocations but since we do not apply alternatives again 
> there is a rubbish in the patching code.
> 
> Is this all correct?

That all sounds mostly right to me, except it sounds like you're
conflating alternatives with paravirt patching, when in fact they're two
separate mechanisms.  Paravirt patches are specific to virtualization,
but alternatives can be used on both virt and HW.

And then there's also jump labels.

In any case I don't think we should get too bogged down into details.
Even it's not currently possible to hit this bug today, it could easily
come back to bite us later.

> > Jessica proposed some novel fixes here:
> >
> >https://github.com/dynup/kpatch/issues/580#issuecomment-183001652
> 
> Yes, I think that fix should be the same we have for relocations. To 
> postpone the alternatives applications. Maybe it is even possible to do it 
> in a similar way. And yes on one hand it is gonna be ugly, on the other 
> hand it is gonna be consistent with relocations. 

Yeah.  If we made it consistent with how we do relocations in Jessica's
arch-independent patches, it looks like apply_paravirt() and
apply_alternatives() already take section data as input, so we might be
able to 

Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Josh Poimboeuf
On Tue, Apr 05, 2016 at 03:07:13PM +0200, Miroslav Benes wrote:
> On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> 
> > So I think this doesn't fix the problem.  Dynamic relocations are
> > applied to the "patch module", whereas the above code deals with the
> > initialization order of the "patched module".  This distinction
> > originally confused me as well, until Jessica set me straight.
> > 
> > Let me try to illustrate the problem with an example.  Imagine you have
> > a patch module P which applies a patch to module M.  P replaces M's
> > function F with a new function F', which uses paravirt ops.
> > 
> > 1) Patch P is loaded before module M.  P's new function F' has an
> >instruction which is patched by apply_paravirt(), even though the
> >patch hasn't been applied yet.
> > 
> > 2) Module M is loaded.  Before applying the patch, livepatch tries to
> >apply a klp_reloc to the instruction in F' which was already patched
> >by apply_paravirt() in step 1.  This results in undefined behavior
> >because it tries to patch the original instruction but instead
> >patches the new paravirt instruction.
> > 
> > So the above patch makes no difference because the paravirt module
> > loading order doesn't really matter.
> 
> Hi,
> 
> we are trying really hard to understand the actual culprit here and as it 
> is quite confusing I have several questions/comments...
> 
> 1. can you provide dynrela sections of the patch module from 
> https://github.com/dynup/kpatch/issues/580? What is interesting is that 
> kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
> exported) symbols from the first look. The problem should be there only if 
> you want to patch a function which reference some paravirt_ops unexported 
> symbol. For that symbol dynrela should be created.

Agreed, kpatch-build could be smarter here.  However the problem still
exists for non-exported symbols, at least in theory.

> 2. I am almost 100 percent sure that the second problem Chris mentions in 
> github post is something different. If he patches only kvm_arch_vm_ioctl() 
> so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
> problem. Because it is a trivial livepatching case. There are no dynrelas 
> and no alternatives in the patch module. The crash is suspicious and we 
> have a problem somewhere. Chris, can you please provide more info about 
> that? That is how exactly you used kallsyms_lookup_name() and so on...
> 
> 3. Now I'll try to explain what really happens here... because of 1. and 
> the way the alternatives and relocations are implemented the only 
> problematic case is when one wants to patch a module which introduces its 
> own alternatives. Which is probably the case of KVM. Why?
> 
> When alternative is used, the call to some pv_*_ops.function is placed 
> somewhere. The address of this location is stored in a separate elf 
> section and when apply_paravirt() is called it takes the address and 
> place the new code there (or it does not, it depends :)). When the 
> alternative is in some module and pv_*_ops is exported, which is the 
> usual case, there is no problem. No dynrela needs to be used when a user 
> wants to patch such function with the alternative.
> 
> The only problem is when the function uses unexported pv_*_ops (or some 
> other alternative symbol) from its own object file. When the user wants to 
> patch this one there is a need for dynrela.
> 
> So what really happens is that we load a patch module, we do not apply 
> our relocations but we do apply alternatives to the patch module, which is 
> problematic because there is a reference to something which is not yet 
> resolved (that is unexported pv_*_ops). When a patched module arrives we 
> happily resolve relocations but since we do not apply alternatives again 
> there is a rubbish in the patching code.
> 
> Is this all correct?

That all sounds mostly right to me, except it sounds like you're
conflating alternatives with paravirt patching, when in fact they're two
separate mechanisms.  Paravirt patches are specific to virtualization,
but alternatives can be used on both virt and HW.

And then there's also jump labels.

In any case I don't think we should get too bogged down into details.
Even it's not currently possible to hit this bug today, it could easily
come back to bite us later.

> > Jessica proposed some novel fixes here:
> >
> >https://github.com/dynup/kpatch/issues/580#issuecomment-183001652
> 
> Yes, I think that fix should be the same we have for relocations. To 
> postpone the alternatives applications. Maybe it is even possible to do it 
> in a similar way. And yes on one hand it is gonna be ugly, on the other 
> hand it is gonna be consistent with relocations. 

Yeah.  If we made it consistent with how we do relocations in Jessica's
arch-independent patches, it looks like apply_paravirt() and
apply_alternatives() already take section data as input, so we might be
able to 

Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Miroslav Benes
On Mon, 4 Apr 2016, Josh Poimboeuf wrote:

> So I think this doesn't fix the problem.  Dynamic relocations are
> applied to the "patch module", whereas the above code deals with the
> initialization order of the "patched module".  This distinction
> originally confused me as well, until Jessica set me straight.
> 
> Let me try to illustrate the problem with an example.  Imagine you have
> a patch module P which applies a patch to module M.  P replaces M's
> function F with a new function F', which uses paravirt ops.
> 
> 1) Patch P is loaded before module M.  P's new function F' has an
>instruction which is patched by apply_paravirt(), even though the
>patch hasn't been applied yet.
> 
> 2) Module M is loaded.  Before applying the patch, livepatch tries to
>apply a klp_reloc to the instruction in F' which was already patched
>by apply_paravirt() in step 1.  This results in undefined behavior
>because it tries to patch the original instruction but instead
>patches the new paravirt instruction.
> 
> So the above patch makes no difference because the paravirt module
> loading order doesn't really matter.

Hi,

we are trying really hard to understand the actual culprit here and as it 
is quite confusing I have several questions/comments...

1. can you provide dynrela sections of the patch module from 
https://github.com/dynup/kpatch/issues/580? What is interesting is that 
kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
exported) symbols from the first look. The problem should be there only if 
you want to patch a function which reference some paravirt_ops unexported 
symbol. For that symbol dynrela should be created.

2. I am almost 100 percent sure that the second problem Chris mentions in 
github post is something different. If he patches only kvm_arch_vm_ioctl() 
so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
problem. Because it is a trivial livepatching case. There are no dynrelas 
and no alternatives in the patch module. The crash is suspicious and we 
have a problem somewhere. Chris, can you please provide more info about 
that? That is how exactly you used kallsyms_lookup_name() and so on...

3. Now I'll try to explain what really happens here... because of 1. and 
the way the alternatives and relocations are implemented the only 
problematic case is when one wants to patch a module which introduces its 
own alternatives. Which is probably the case of KVM. Why?

When alternative is used, the call to some pv_*_ops.function is placed 
somewhere. The address of this location is stored in a separate elf 
section and when apply_paravirt() is called it takes the address and 
place the new code there (or it does not, it depends :)). When the 
alternative is in some module and pv_*_ops is exported, which is the 
usual case, there is no problem. No dynrela needs to be used when a user 
wants to patch such function with the alternative.

The only problem is when the function uses unexported pv_*_ops (or some 
other alternative symbol) from its own object file. When the user wants to 
patch this one there is a need for dynrela.

So what really happens is that we load a patch module, we do not apply 
our relocations but we do apply alternatives to the patch module, which is 
problematic because there is a reference to something which is not yet 
resolved (that is unexported pv_*_ops). When a patched module arrives we 
happily resolve relocations but since we do not apply alternatives again 
there is a rubbish in the patching code.

Is this all correct?

> Jessica proposed some novel fixes here:
>
>https://github.com/dynup/kpatch/issues/580#issuecomment-183001652

Yes, I think that fix should be the same we have for relocations. To 
postpone the alternatives applications. Maybe it is even possible to do it 
in a similar way. And yes on one hand it is gonna be ugly, on the other 
hand it is gonna be consistent with relocations. 

> I think the *real* problem here (and one that we've seen before) is that
> we have a feature which allows you to load a patch to a module before
> loading the module itself.  That really goes against the grain of how
> module dependencies work.  It has already given us several headaches and
> it makes the livepatch code a lot more complex.
> 
> I really think we need to take another hard look about whether it's
> really worth it.  My current feeling is that it's not.

I can only say that maybe about 1/3 of kgraft patches we currently have 
are for modules (1/3 is probably not correct but it is definitely 
non-trivial number). It would be really unfortunate to load all the 
to-be-patched modules when a patch module is applied.

This does not mean that we cannot solve it in another way as you propose 
below. I have to think about it.

Miroslav

> If we were able to get rid of that "feature", yes, the livepatch code
> would be simpler, but there might be another awesome benefit: I suspect
> we'd also be 

Re: Bug with paravirt ops and livepatches

2016-04-05 Thread Miroslav Benes
On Mon, 4 Apr 2016, Josh Poimboeuf wrote:

> So I think this doesn't fix the problem.  Dynamic relocations are
> applied to the "patch module", whereas the above code deals with the
> initialization order of the "patched module".  This distinction
> originally confused me as well, until Jessica set me straight.
> 
> Let me try to illustrate the problem with an example.  Imagine you have
> a patch module P which applies a patch to module M.  P replaces M's
> function F with a new function F', which uses paravirt ops.
> 
> 1) Patch P is loaded before module M.  P's new function F' has an
>instruction which is patched by apply_paravirt(), even though the
>patch hasn't been applied yet.
> 
> 2) Module M is loaded.  Before applying the patch, livepatch tries to
>apply a klp_reloc to the instruction in F' which was already patched
>by apply_paravirt() in step 1.  This results in undefined behavior
>because it tries to patch the original instruction but instead
>patches the new paravirt instruction.
> 
> So the above patch makes no difference because the paravirt module
> loading order doesn't really matter.

Hi,

we are trying really hard to understand the actual culprit here and as it 
is quite confusing I have several questions/comments...

1. can you provide dynrela sections of the patch module from 
https://github.com/dynup/kpatch/issues/580? What is interesting is that 
kvm_arch_vm_ioctl() function contains rela records only for trivial (== 
exported) symbols from the first look. The problem should be there only if 
you want to patch a function which reference some paravirt_ops unexported 
symbol. For that symbol dynrela should be created.

2. I am almost 100 percent sure that the second problem Chris mentions in 
github post is something different. If he patches only kvm_arch_vm_ioctl() 
so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no 
problem. Because it is a trivial livepatching case. There are no dynrelas 
and no alternatives in the patch module. The crash is suspicious and we 
have a problem somewhere. Chris, can you please provide more info about 
that? That is how exactly you used kallsyms_lookup_name() and so on...

3. Now I'll try to explain what really happens here... because of 1. and 
the way the alternatives and relocations are implemented the only 
problematic case is when one wants to patch a module which introduces its 
own alternatives. Which is probably the case of KVM. Why?

When alternative is used, the call to some pv_*_ops.function is placed 
somewhere. The address of this location is stored in a separate elf 
section and when apply_paravirt() is called it takes the address and 
place the new code there (or it does not, it depends :)). When the 
alternative is in some module and pv_*_ops is exported, which is the 
usual case, there is no problem. No dynrela needs to be used when a user 
wants to patch such function with the alternative.

The only problem is when the function uses unexported pv_*_ops (or some 
other alternative symbol) from its own object file. When the user wants to 
patch this one there is a need for dynrela.

So what really happens is that we load a patch module, we do not apply 
our relocations but we do apply alternatives to the patch module, which is 
problematic because there is a reference to something which is not yet 
resolved (that is unexported pv_*_ops). When a patched module arrives we 
happily resolve relocations but since we do not apply alternatives again 
there is a rubbish in the patching code.

Is this all correct?

> Jessica proposed some novel fixes here:
>
>https://github.com/dynup/kpatch/issues/580#issuecomment-183001652

Yes, I think that fix should be the same we have for relocations. To 
postpone the alternatives applications. Maybe it is even possible to do it 
in a similar way. And yes on one hand it is gonna be ugly, on the other 
hand it is gonna be consistent with relocations. 

> I think the *real* problem here (and one that we've seen before) is that
> we have a feature which allows you to load a patch to a module before
> loading the module itself.  That really goes against the grain of how
> module dependencies work.  It has already given us several headaches and
> it makes the livepatch code a lot more complex.
> 
> I really think we need to take another hard look about whether it's
> really worth it.  My current feeling is that it's not.

I can only say that maybe about 1/3 of kgraft patches we currently have 
are for modules (1/3 is probably not correct but it is definitely 
non-trivial number). It would be really unfortunate to load all the 
to-be-patched modules when a patch module is applied.

This does not mean that we cannot solve it in another way as you propose 
below. I have to think about it.

Miroslav

> If we were able to get rid of that "feature", yes, the livepatch code
> would be simpler, but there might be another awesome benefit: I suspect
> we'd also be 

Re: Bug with paravirt ops and livepatches

2016-04-04 Thread Jessica Yu

+++ Josh Poimboeuf [04/04/16 11:14 -0500]:

On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:

On Fri, 1 Apr 2016, Chris J Arges wrote:

> Loading, please wait...
> starting version 229
> [1.182869] random: udevadm urandom read with 2 bits of entropy available
> [1.241404] BUG: unable to handle kernel paging request at c000f35f

Gah, we surely can't change pages with RO PTE. Thanks for such a prompt
testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?

The patch below should fix that by marking the module RO (and relevant
parts NX) only when it's guaranteed that .text is not going to be modified
any more (and includes the error handling fix Miroslav spotted as well).

Thanks.



diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..430606d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }

-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)

/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }

 /* Is this module of this name done loading?  No locks held. */
@@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct 
load_info *info)
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);

-   /* Set RO and NX regions */
-   module_enable_ro(mod);
-   module_enable_nx(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
 * but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;

-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);

flush_module_icache(mod);

@@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;

+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto coming_cleanup;
+
+   /* Set RO and NX regions */
+   module_enable_ro(mod);
+   module_enable_nx(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,


So I think this doesn't fix the problem. Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
  instruction which is patched by apply_paravirt(), even though the
  patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
  apply a klp_reloc to the instruction in F' which was already patched
  by apply_paravirt() in step 1. This results in undefined behavior
  because it tries to patch the original instruction but instead
  patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Yup, exactly, the crux of the issue is the fact that we're hacking the
module load order and delying certain operations (e.g. applying
relocations) in order to allow patching of a not-yet-loaded module.

In this case, the problem can be briefly summarized as follows:
For any patch module, apply_paravirt/alternatives needs to come after
apply_relocate_add (recall that these operations affect the
replacement functions in the patch module). Right now we have it the
other way around: apply_paravirt() is patching our replacement
functions in module_finalize() before the to-be-patched module is
loaded and apply_relocate_add() can be called. 


The hack fix I had involved delaying the apply_paravirt() call
(basically calling is_livepatch_module() in the x86 code, and skipping
the paravirt/alternative calls if true), and when the to-be-patched
module loads, having livepatch finish up those apply_{paravirt,alternative}
calls just 

Re: Bug with paravirt ops and livepatches

2016-04-04 Thread Jessica Yu

+++ Josh Poimboeuf [04/04/16 11:14 -0500]:

On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:

On Fri, 1 Apr 2016, Chris J Arges wrote:

> Loading, please wait...
> starting version 229
> [1.182869] random: udevadm urandom read with 2 bits of entropy available
> [1.241404] BUG: unable to handle kernel paging request at c000f35f

Gah, we surely can't change pages with RO PTE. Thanks for such a prompt
testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?

The patch below should fix that by marking the module RO (and relevant
parts NX) only when it's guaranteed that .text is not going to be modified
any more (and includes the error handling fix Miroslav spotted as well).

Thanks.



diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..430606d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }

-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)

/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }

 /* Is this module of this name done loading?  No locks held. */
@@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct 
load_info *info)
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);

-   /* Set RO and NX regions */
-   module_enable_ro(mod);
-   module_enable_nx(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
 * but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;

-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);

flush_module_icache(mod);

@@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;

+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto coming_cleanup;
+
+   /* Set RO and NX regions */
+   module_enable_ro(mod);
+   module_enable_nx(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,


So I think this doesn't fix the problem. Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
  instruction which is patched by apply_paravirt(), even though the
  patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
  apply a klp_reloc to the instruction in F' which was already patched
  by apply_paravirt() in step 1. This results in undefined behavior
  because it tries to patch the original instruction but instead
  patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.


Yup, exactly, the crux of the issue is the fact that we're hacking the
module load order and delying certain operations (e.g. applying
relocations) in order to allow patching of a not-yet-loaded module.

In this case, the problem can be briefly summarized as follows:
For any patch module, apply_paravirt/alternatives needs to come after
apply_relocate_add (recall that these operations affect the
replacement functions in the patch module). Right now we have it the
other way around: apply_paravirt() is patching our replacement
functions in module_finalize() before the to-be-patched module is
loaded and apply_relocate_add() can be called. 


The hack fix I had involved delaying the apply_paravirt() call
(basically calling is_livepatch_module() in the x86 code, and skipping
the paravirt/alternative calls if true), and when the to-be-patched
module loads, having livepatch finish up those apply_{paravirt,alternative}
calls just 

Re: Bug with paravirt ops and livepatches

2016-04-04 Thread Josh Poimboeuf
On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:
> On Fri, 1 Apr 2016, Chris J Arges wrote:
> 
> > Loading, please wait...
> > starting version 229
> > [1.182869] random: udevadm urandom read with 2 bits of entropy available
> > [1.241404] BUG: unable to handle kernel paging request at 
> > c000f35f
> 
> Gah, we surely can't change pages with RO PTE. Thanks for such a prompt 
> testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?
> 
> The patch below should fix that by marking the module RO (and relevant 
> parts NX) only when it's guaranteed that .text is not going to be modified 
> any more (and includes the error handling fix Miroslav spotted as well).
> 
> Thanks.
> 
> 
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 5f71aa6..430606d 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
>   return 0;
>  }
>  
> -static int post_relocation(struct module *mod, const struct load_info *info)
> +static void post_relocation(struct module *mod, const struct load_info *info)
>  {
>   /* Sort exception table now relocations are done. */
>   sort_extable(mod->extable, mod->extable + mod->num_exentries);
> @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> struct load_info *info)
>  
>   /* Setup kallsyms-specific fields. */
>   add_kallsyms(mod, info);
> -
> - /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
>  }
>  
>  /* Is this module of this name done loading?  No locks held. */
> @@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, 
> struct load_info *info)
>   /* This relies on module_mutex for list integrity. */
>   module_bug_finalize(info->hdr, info->sechdrs, mod);
>  
> - /* Set RO and NX regions */
> - module_enable_ro(mod);
> - module_enable_nx(mod);
> -
>   /* Mark state as coming so strong_try_module_get() ignores us,
>* but kallsyms etc. can see us. */
>   mod->state = MODULE_STATE_COMING;
> @@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err < 0)
>   goto free_modinfo;
>  
> - err = post_relocation(mod, info);
> - if (err < 0)
> - goto free_modinfo;
> + post_relocation(mod, info);
>  
>   flush_module_icache(mod);
>  
> @@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err)
>   goto bug_cleanup;
>  
> + /* Arch-specific module finalizing. */
> + err = module_finalize(info->hdr, info->sechdrs, mod);
> + if (err)
> + goto coming_cleanup;
> +
> + /* Set RO and NX regions */
> + module_enable_ro(mod);
> + module_enable_nx(mod);
> +
>   /* Module is ready to execute: parsing args may do that. */
>   after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> -32768, 32767, mod,

So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
   instruction which is patched by apply_paravirt(), even though the
   patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
   apply a klp_reloc to the instruction in F' which was already patched
   by apply_paravirt() in step 1.  This results in undefined behavior
   because it tries to patch the original instruction but instead
   patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.

Jessica proposed some novel fixes here:
   
   https://github.com/dynup/kpatch/issues/580#issuecomment-183001652

But I get the feeling that any fix would be quite ugly and brittle.

I think the *real* problem here (and one that we've seen before) is that
we have a feature which allows you to load a patch to a module before
loading the module itself.  That really goes against the grain of how
module dependencies work.  It has already given us several headaches and
it makes the livepatch code a lot more complex.

I really think we need to take another hard look about whether it's
really worth it.  My current feeling is that it's not.

If we were able to get rid of that "feature", yes, the livepatch code
would be simpler, but there might be another awesome benefit: I suspect
we'd also be able to get rid of the need for specialized patch creation

Re: Bug with paravirt ops and livepatches

2016-04-04 Thread Josh Poimboeuf
On Fri, Apr 01, 2016 at 09:35:34PM +0200, Jiri Kosina wrote:
> On Fri, 1 Apr 2016, Chris J Arges wrote:
> 
> > Loading, please wait...
> > starting version 229
> > [1.182869] random: udevadm urandom read with 2 bits of entropy available
> > [1.241404] BUG: unable to handle kernel paging request at 
> > c000f35f
> 
> Gah, we surely can't change pages with RO PTE. Thanks for such a prompt 
> testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?
> 
> The patch below should fix that by marking the module RO (and relevant 
> parts NX) only when it's guaranteed that .text is not going to be modified 
> any more (and includes the error handling fix Miroslav spotted as well).
> 
> Thanks.
> 
> 
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 5f71aa6..430606d 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
>   return 0;
>  }
>  
> -static int post_relocation(struct module *mod, const struct load_info *info)
> +static void post_relocation(struct module *mod, const struct load_info *info)
>  {
>   /* Sort exception table now relocations are done. */
>   sort_extable(mod->extable, mod->extable + mod->num_exentries);
> @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> struct load_info *info)
>  
>   /* Setup kallsyms-specific fields. */
>   add_kallsyms(mod, info);
> -
> - /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
>  }
>  
>  /* Is this module of this name done loading?  No locks held. */
> @@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, 
> struct load_info *info)
>   /* This relies on module_mutex for list integrity. */
>   module_bug_finalize(info->hdr, info->sechdrs, mod);
>  
> - /* Set RO and NX regions */
> - module_enable_ro(mod);
> - module_enable_nx(mod);
> -
>   /* Mark state as coming so strong_try_module_get() ignores us,
>* but kallsyms etc. can see us. */
>   mod->state = MODULE_STATE_COMING;
> @@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err < 0)
>   goto free_modinfo;
>  
> - err = post_relocation(mod, info);
> - if (err < 0)
> - goto free_modinfo;
> + post_relocation(mod, info);
>  
>   flush_module_icache(mod);
>  
> @@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err)
>   goto bug_cleanup;
>  
> + /* Arch-specific module finalizing. */
> + err = module_finalize(info->hdr, info->sechdrs, mod);
> + if (err)
> + goto coming_cleanup;
> +
> + /* Set RO and NX regions */
> + module_enable_ro(mod);
> + module_enable_nx(mod);
> +
>   /* Module is ready to execute: parsing args may do that. */
>   after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> -32768, 32767, mod,

So I think this doesn't fix the problem.  Dynamic relocations are
applied to the "patch module", whereas the above code deals with the
initialization order of the "patched module".  This distinction
originally confused me as well, until Jessica set me straight.

Let me try to illustrate the problem with an example.  Imagine you have
a patch module P which applies a patch to module M.  P replaces M's
function F with a new function F', which uses paravirt ops.

1) Patch P is loaded before module M.  P's new function F' has an
   instruction which is patched by apply_paravirt(), even though the
   patch hasn't been applied yet.

2) Module M is loaded.  Before applying the patch, livepatch tries to
   apply a klp_reloc to the instruction in F' which was already patched
   by apply_paravirt() in step 1.  This results in undefined behavior
   because it tries to patch the original instruction but instead
   patches the new paravirt instruction.

So the above patch makes no difference because the paravirt module
loading order doesn't really matter.

Jessica proposed some novel fixes here:
   
   https://github.com/dynup/kpatch/issues/580#issuecomment-183001652

But I get the feeling that any fix would be quite ugly and brittle.

I think the *real* problem here (and one that we've seen before) is that
we have a feature which allows you to load a patch to a module before
loading the module itself.  That really goes against the grain of how
module dependencies work.  It has already given us several headaches and
it makes the livepatch code a lot more complex.

I really think we need to take another hard look about whether it's
really worth it.  My current feeling is that it's not.

If we were able to get rid of that "feature", yes, the livepatch code
would be simpler, but there might be another awesome benefit: I suspect
we'd also be able to get rid of the need for specialized patch creation

Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Jiri Kosina
On Fri, 1 Apr 2016, Chris J Arges wrote:

> Loading, please wait...
> starting version 229
> [1.182869] random: udevadm urandom read with 2 bits of entropy available
> [1.241404] BUG: unable to handle kernel paging request at c000f35f

Gah, we surely can't change pages with RO PTE. Thanks for such a prompt 
testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?

The patch below should fix that by marking the module RO (and relevant 
parts NX) only when it's guaranteed that .text is not going to be modified 
any more (and includes the error handling fix Miroslav spotted as well).

Thanks.



diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..430606d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }
 
-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 
/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }
 
 /* Is this module of this name done loading?  No locks held. */
@@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct 
load_info *info)
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);
 
-   /* Set RO and NX regions */
-   module_enable_ro(mod);
-   module_enable_nx(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
 * but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;
 
-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);
 
flush_module_icache(mod);
 
@@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;
 
+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto coming_cleanup;
+
+   /* Set RO and NX regions */
+   module_enable_ro(mod);
+   module_enable_nx(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Jiri Kosina
On Fri, 1 Apr 2016, Chris J Arges wrote:

> Loading, please wait...
> starting version 229
> [1.182869] random: udevadm urandom read with 2 bits of entropy available
> [1.241404] BUG: unable to handle kernel paging request at c000f35f

Gah, we surely can't change pages with RO PTE. Thanks for such a prompt 
testing. You do have CONFIG_DEBUG_SET_MODULE_RONX set, don't you?

The patch below should fix that by marking the module RO (and relevant 
parts NX) only when it's guaranteed that .text is not going to be modified 
any more (and includes the error handling fix Miroslav spotted as well).

Thanks.



diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..430606d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }
 
-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 
/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }
 
 /* Is this module of this name done loading?  No locks held. */
@@ -3441,10 +3438,6 @@ static int complete_formation(struct module *mod, struct 
load_info *info)
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);
 
-   /* Set RO and NX regions */
-   module_enable_ro(mod);
-   module_enable_nx(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
 * but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3562,9 +3555,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;
 
-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);
 
flush_module_icache(mod);
 
@@ -3589,6 +3580,15 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;
 
+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto coming_cleanup;
+
+   /* Set RO and NX regions */
+   module_enable_ro(mod);
+   module_enable_nx(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Chris J Arges
On Fri, Apr 01, 2016 at 05:46:52PM +0200, Miroslav Benes wrote:
> On Fri, 1 Apr 2016, Jiri Kosina wrote:
> 
> > On Tue, 29 Mar 2016, Jiri Kosina wrote:
> > 
> > > Agreed; I think we should be safe applying all the alternatives (with 
> > > paravirt being really just a special case of those) to the coming module 
> > > at the very last phase; they really are required only during runtime, 
> > > but nothing else should be depending on them. Right? If anyone is able 
> > > to come up with and counter-example, please speak up :)
> > 
> > So I have quickly gone through all the architectures that actually do 
> > overload __weak module_finalize() by their own implementation, and except 
> > for applying self-modifying code changes and registering unwind tables, 
> > there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> > done before relocations have been written.
> > 
> > Is the (completely untested) sort-of-a-patch below a complete rubbish 
> > (on top of current livepatching.git's for-next)?
> > 
> > 
> > 
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 5f71aa6..c003648 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
> > return 0;
> >  }
> >  
> > -static int post_relocation(struct module *mod, const struct load_info 
> > *info)
> > +static void post_relocation(struct module *mod, const struct load_info 
> > *info)
> >  {
> > /* Sort exception table now relocations are done. */
> > sort_extable(mod->extable, mod->extable + mod->num_exentries);
> > @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> > struct load_info *info)
> >  
> > /* Setup kallsyms-specific fields. */
> > add_kallsyms(mod, info);
> > -
> > -   /* Arch-specific module finalizing. */
> > -   return module_finalize(info->hdr, info->sechdrs, mod);
> >  }
> >  
> >  /* Is this module of this name done loading?  No locks held. */
> > @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err < 0)
> > goto free_modinfo;
> >  
> > -   err = post_relocation(mod, info);
> > -   if (err < 0)
> > -   goto free_modinfo;
> > +   post_relocation(mod, info);
> >  
> > flush_module_icache(mod);
> >  
> > @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err)
> > goto bug_cleanup;
> >  
> > +   /* Arch-specific module finalizing. */
> > +   err = module_finalize(info->hdr, info->sechdrs, mod);
> > +   if (err)
> > +   goto bug_cleanup;
> 
> goto coming_cleanup;
> 
> Otherwise it looks ok. I'll give it a proper look on Monday though.
> 
> Miroslav
>

Loading, please wait...
starting version 229
[1.182869] random: udevadm urandom read with 2 bits of entropy available
[1.241404] BUG: unable to handle kernel paging request at c000f35f
[1.242760] IP: [] __memcpy+0x17/0x20
[1.243870] PGD 1e09067 PUD 1e0b067 PMD 1edb0067 PTE 1ee61161
[1.245172] Oops: 0003 [#1] SMP 
[1.245975] Modules linked in: floppy(+) pata_acpi
[1.247086] CPU: 0 PID: 135 Comm: systemd-udevd Not tainted 4.5.0+ #3
[1.248176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[1.249765] task: 88001f36 ti: 88001ed4 task.ti: 
88001ed4
[1.251097] RIP: 0010:[]  [] 
__memcpy+0x17/0x20
[1.252534] RSP: 0018:88001ed43b78  EFLAGS: 00010002
[1.253441] RAX: c000f35f RBX: c0011fa8 RCX: 0007
[1.254584] RDX: 0007 RSI: 88001ed43ba2 RDI: c000f35f
[1.255736] RBP: 88001ed43b90 R08: 81a046c6 R09: 81063f77
[1.256899] R10: 81f33f00 R11: 81f33ee0 R12: 0246
[1.258042] R13: c0011fb4 R14: 81c70005 R15: 81c6
[1.259195] FS:  7fc8b518e8c0() GS:88001fc0() 
knlGS:
[1.260564] CS:  0010 DS:  ES:  CR0: 80050033
[1.261488] CR2: c000f35f CR3: 1ed2d000 CR4: 06f0
[1.262587] Stack:
[1.263039]  810350f2 c0011fa8 88001ed43ba2 
88001ed43cc0
[1.264565]  8103584d 401f0f0007c63ec0 ff57a000 

[1.266084]  0001 88001ed43c9f 1600 
88001ed43bf0
[1.267630] Call Trace:
[1.268139]  [] ? text_poke_early+0x22/0x40
[1.269065]  [] apply_paravirt.part.2+0xad/0x140
[1.270046]  [] ? set_pte_vaddr_pud+0x3c/0x50
[1.270995]  [] ? set_pte_vaddr+0x63/0xa0
[1.271909]  [] ? redo_fd_request+0x122f/0x13ea [floppy]
[1.272933]  [] ? __native_set_fixmap+0x28/0x40
[1.273860]  [] ? native_set_fixmap+0x3a/0x40
[1.274765]  [] ? 0xc0008000
[1.285259]  [] ? redo_fd_request+0x13ea/0x13ea [floppy]
[1.286251]  [] ? 

Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Chris J Arges
On Fri, Apr 01, 2016 at 05:46:52PM +0200, Miroslav Benes wrote:
> On Fri, 1 Apr 2016, Jiri Kosina wrote:
> 
> > On Tue, 29 Mar 2016, Jiri Kosina wrote:
> > 
> > > Agreed; I think we should be safe applying all the alternatives (with 
> > > paravirt being really just a special case of those) to the coming module 
> > > at the very last phase; they really are required only during runtime, 
> > > but nothing else should be depending on them. Right? If anyone is able 
> > > to come up with and counter-example, please speak up :)
> > 
> > So I have quickly gone through all the architectures that actually do 
> > overload __weak module_finalize() by their own implementation, and except 
> > for applying self-modifying code changes and registering unwind tables, 
> > there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> > done before relocations have been written.
> > 
> > Is the (completely untested) sort-of-a-patch below a complete rubbish 
> > (on top of current livepatching.git's for-next)?
> > 
> > 
> > 
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 5f71aa6..c003648 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
> > return 0;
> >  }
> >  
> > -static int post_relocation(struct module *mod, const struct load_info 
> > *info)
> > +static void post_relocation(struct module *mod, const struct load_info 
> > *info)
> >  {
> > /* Sort exception table now relocations are done. */
> > sort_extable(mod->extable, mod->extable + mod->num_exentries);
> > @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> > struct load_info *info)
> >  
> > /* Setup kallsyms-specific fields. */
> > add_kallsyms(mod, info);
> > -
> > -   /* Arch-specific module finalizing. */
> > -   return module_finalize(info->hdr, info->sechdrs, mod);
> >  }
> >  
> >  /* Is this module of this name done loading?  No locks held. */
> > @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err < 0)
> > goto free_modinfo;
> >  
> > -   err = post_relocation(mod, info);
> > -   if (err < 0)
> > -   goto free_modinfo;
> > +   post_relocation(mod, info);
> >  
> > flush_module_icache(mod);
> >  
> > @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err)
> > goto bug_cleanup;
> >  
> > +   /* Arch-specific module finalizing. */
> > +   err = module_finalize(info->hdr, info->sechdrs, mod);
> > +   if (err)
> > +   goto bug_cleanup;
> 
> goto coming_cleanup;
> 
> Otherwise it looks ok. I'll give it a proper look on Monday though.
> 
> Miroslav
>

Loading, please wait...
starting version 229
[1.182869] random: udevadm urandom read with 2 bits of entropy available
[1.241404] BUG: unable to handle kernel paging request at c000f35f
[1.242760] IP: [] __memcpy+0x17/0x20
[1.243870] PGD 1e09067 PUD 1e0b067 PMD 1edb0067 PTE 1ee61161
[1.245172] Oops: 0003 [#1] SMP 
[1.245975] Modules linked in: floppy(+) pata_acpi
[1.247086] CPU: 0 PID: 135 Comm: systemd-udevd Not tainted 4.5.0+ #3
[1.248176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[1.249765] task: 88001f36 ti: 88001ed4 task.ti: 
88001ed4
[1.251097] RIP: 0010:[]  [] 
__memcpy+0x17/0x20
[1.252534] RSP: 0018:88001ed43b78  EFLAGS: 00010002
[1.253441] RAX: c000f35f RBX: c0011fa8 RCX: 0007
[1.254584] RDX: 0007 RSI: 88001ed43ba2 RDI: c000f35f
[1.255736] RBP: 88001ed43b90 R08: 81a046c6 R09: 81063f77
[1.256899] R10: 81f33f00 R11: 81f33ee0 R12: 0246
[1.258042] R13: c0011fb4 R14: 81c70005 R15: 81c6
[1.259195] FS:  7fc8b518e8c0() GS:88001fc0() 
knlGS:
[1.260564] CS:  0010 DS:  ES:  CR0: 80050033
[1.261488] CR2: c000f35f CR3: 1ed2d000 CR4: 06f0
[1.262587] Stack:
[1.263039]  810350f2 c0011fa8 88001ed43ba2 
88001ed43cc0
[1.264565]  8103584d 401f0f0007c63ec0 ff57a000 

[1.266084]  0001 88001ed43c9f 1600 
88001ed43bf0
[1.267630] Call Trace:
[1.268139]  [] ? text_poke_early+0x22/0x40
[1.269065]  [] apply_paravirt.part.2+0xad/0x140
[1.270046]  [] ? set_pte_vaddr_pud+0x3c/0x50
[1.270995]  [] ? set_pte_vaddr+0x63/0xa0
[1.271909]  [] ? redo_fd_request+0x122f/0x13ea [floppy]
[1.272933]  [] ? __native_set_fixmap+0x28/0x40
[1.273860]  [] ? native_set_fixmap+0x3a/0x40
[1.274765]  [] ? 0xc0008000
[1.285259]  [] ? redo_fd_request+0x13ea/0x13ea [floppy]
[1.286251]  [] ? 

Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Chris J Arges
On Fri, Apr 01, 2016 at 05:46:52PM +0200, Miroslav Benes wrote:
> On Fri, 1 Apr 2016, Jiri Kosina wrote:
> 
> > On Tue, 29 Mar 2016, Jiri Kosina wrote:
> > 
> > > Agreed; I think we should be safe applying all the alternatives (with 
> > > paravirt being really just a special case of those) to the coming module 
> > > at the very last phase; they really are required only during runtime, 
> > > but nothing else should be depending on them. Right? If anyone is able 
> > > to come up with and counter-example, please speak up :)
> > 
> > So I have quickly gone through all the architectures that actually do 
> > overload __weak module_finalize() by their own implementation, and except 
> > for applying self-modifying code changes and registering unwind tables, 
> > there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> > done before relocations have been written.
> > 
> > Is the (completely untested) sort-of-a-patch below a complete rubbish 
> > (on top of current livepatching.git's for-next)?
> > 
> > 
> > 
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 5f71aa6..c003648 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
> > return 0;
> >  }
> >  
> > -static int post_relocation(struct module *mod, const struct load_info 
> > *info)
> > +static void post_relocation(struct module *mod, const struct load_info 
> > *info)
> >  {
> > /* Sort exception table now relocations are done. */
> > sort_extable(mod->extable, mod->extable + mod->num_exentries);
> > @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> > struct load_info *info)
> >  
> > /* Setup kallsyms-specific fields. */
> > add_kallsyms(mod, info);
> > -
> > -   /* Arch-specific module finalizing. */
> > -   return module_finalize(info->hdr, info->sechdrs, mod);
> >  }
> >  
> >  /* Is this module of this name done loading?  No locks held. */
> > @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err < 0)
> > goto free_modinfo;
> >  
> > -   err = post_relocation(mod, info);
> > -   if (err < 0)
> > -   goto free_modinfo;
> > +   post_relocation(mod, info);
> >  
> > flush_module_icache(mod);
> >  
> > @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err)
> > goto bug_cleanup;
> >  
> > +   /* Arch-specific module finalizing. */
> > +   err = module_finalize(info->hdr, info->sechdrs, mod);
> > +   if (err)
> > +   goto bug_cleanup;
> 
> goto coming_cleanup;
> 
> Otherwise it looks ok. I'll give it a proper look on Monday though.
> 
> Miroslav

I'll test this out and see if it fixes the original issue.
--chris


Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Chris J Arges
On Fri, Apr 01, 2016 at 05:46:52PM +0200, Miroslav Benes wrote:
> On Fri, 1 Apr 2016, Jiri Kosina wrote:
> 
> > On Tue, 29 Mar 2016, Jiri Kosina wrote:
> > 
> > > Agreed; I think we should be safe applying all the alternatives (with 
> > > paravirt being really just a special case of those) to the coming module 
> > > at the very last phase; they really are required only during runtime, 
> > > but nothing else should be depending on them. Right? If anyone is able 
> > > to come up with and counter-example, please speak up :)
> > 
> > So I have quickly gone through all the architectures that actually do 
> > overload __weak module_finalize() by their own implementation, and except 
> > for applying self-modifying code changes and registering unwind tables, 
> > there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> > done before relocations have been written.
> > 
> > Is the (completely untested) sort-of-a-patch below a complete rubbish 
> > (on top of current livepatching.git's for-next)?
> > 
> > 
> > 
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 5f71aa6..c003648 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
> > return 0;
> >  }
> >  
> > -static int post_relocation(struct module *mod, const struct load_info 
> > *info)
> > +static void post_relocation(struct module *mod, const struct load_info 
> > *info)
> >  {
> > /* Sort exception table now relocations are done. */
> > sort_extable(mod->extable, mod->extable + mod->num_exentries);
> > @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> > struct load_info *info)
> >  
> > /* Setup kallsyms-specific fields. */
> > add_kallsyms(mod, info);
> > -
> > -   /* Arch-specific module finalizing. */
> > -   return module_finalize(info->hdr, info->sechdrs, mod);
> >  }
> >  
> >  /* Is this module of this name done loading?  No locks held. */
> > @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err < 0)
> > goto free_modinfo;
> >  
> > -   err = post_relocation(mod, info);
> > -   if (err < 0)
> > -   goto free_modinfo;
> > +   post_relocation(mod, info);
> >  
> > flush_module_icache(mod);
> >  
> > @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> > char __user *uargs,
> > if (err)
> > goto bug_cleanup;
> >  
> > +   /* Arch-specific module finalizing. */
> > +   err = module_finalize(info->hdr, info->sechdrs, mod);
> > +   if (err)
> > +   goto bug_cleanup;
> 
> goto coming_cleanup;
> 
> Otherwise it looks ok. I'll give it a proper look on Monday though.
> 
> Miroslav

I'll test this out and see if it fixes the original issue.
--chris


Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Miroslav Benes
On Fri, 1 Apr 2016, Jiri Kosina wrote:

> On Tue, 29 Mar 2016, Jiri Kosina wrote:
> 
> > Agreed; I think we should be safe applying all the alternatives (with 
> > paravirt being really just a special case of those) to the coming module 
> > at the very last phase; they really are required only during runtime, 
> > but nothing else should be depending on them. Right? If anyone is able 
> > to come up with and counter-example, please speak up :)
> 
> So I have quickly gone through all the architectures that actually do 
> overload __weak module_finalize() by their own implementation, and except 
> for applying self-modifying code changes and registering unwind tables, 
> there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> done before relocations have been written.
> 
> Is the (completely untested) sort-of-a-patch below a complete rubbish 
> (on top of current livepatching.git's for-next)?
> 
> 
> 
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 5f71aa6..c003648 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
>   return 0;
>  }
>  
> -static int post_relocation(struct module *mod, const struct load_info *info)
> +static void post_relocation(struct module *mod, const struct load_info *info)
>  {
>   /* Sort exception table now relocations are done. */
>   sort_extable(mod->extable, mod->extable + mod->num_exentries);
> @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> struct load_info *info)
>  
>   /* Setup kallsyms-specific fields. */
>   add_kallsyms(mod, info);
> -
> - /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
>  }
>  
>  /* Is this module of this name done loading?  No locks held. */
> @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err < 0)
>   goto free_modinfo;
>  
> - err = post_relocation(mod, info);
> - if (err < 0)
> - goto free_modinfo;
> + post_relocation(mod, info);
>  
>   flush_module_icache(mod);
>  
> @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err)
>   goto bug_cleanup;
>  
> + /* Arch-specific module finalizing. */
> + err = module_finalize(info->hdr, info->sechdrs, mod);
> + if (err)
> + goto bug_cleanup;

goto coming_cleanup;

Otherwise it looks ok. I'll give it a proper look on Monday though.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Miroslav Benes
On Fri, 1 Apr 2016, Jiri Kosina wrote:

> On Tue, 29 Mar 2016, Jiri Kosina wrote:
> 
> > Agreed; I think we should be safe applying all the alternatives (with 
> > paravirt being really just a special case of those) to the coming module 
> > at the very last phase; they really are required only during runtime, 
> > but nothing else should be depending on them. Right? If anyone is able 
> > to come up with and counter-example, please speak up :)
> 
> So I have quickly gone through all the architectures that actually do 
> overload __weak module_finalize() by their own implementation, and except 
> for applying self-modifying code changes and registering unwind tables, 
> there doesn't seem to be any relevant heavy-lifting, that'd need to be 
> done before relocations have been written.
> 
> Is the (completely untested) sort-of-a-patch below a complete rubbish 
> (on top of current livepatching.git's for-next)?
> 
> 
> 
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 5f71aa6..c003648 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
>   return 0;
>  }
>  
> -static int post_relocation(struct module *mod, const struct load_info *info)
> +static void post_relocation(struct module *mod, const struct load_info *info)
>  {
>   /* Sort exception table now relocations are done. */
>   sort_extable(mod->extable, mod->extable + mod->num_exentries);
> @@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
> struct load_info *info)
>  
>   /* Setup kallsyms-specific fields. */
>   add_kallsyms(mod, info);
> -
> - /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
>  }
>  
>  /* Is this module of this name done loading?  No locks held. */
> @@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err < 0)
>   goto free_modinfo;
>  
> - err = post_relocation(mod, info);
> - if (err < 0)
> - goto free_modinfo;
> + post_relocation(mod, info);
>  
>   flush_module_icache(mod);
>  
> @@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   if (err)
>   goto bug_cleanup;
>  
> + /* Arch-specific module finalizing. */
> + err = module_finalize(info->hdr, info->sechdrs, mod);
> + if (err)
> + goto bug_cleanup;

goto coming_cleanup;

Otherwise it looks ok. I'll give it a proper look on Monday though.

Miroslav


Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Jiri Kosina
On Tue, 29 Mar 2016, Jiri Kosina wrote:

> Agreed; I think we should be safe applying all the alternatives (with 
> paravirt being really just a special case of those) to the coming module 
> at the very last phase; they really are required only during runtime, 
> but nothing else should be depending on them. Right? If anyone is able 
> to come up with and counter-example, please speak up :)

So I have quickly gone through all the architectures that actually do 
overload __weak module_finalize() by their own implementation, and except 
for applying self-modifying code changes and registering unwind tables, 
there doesn't seem to be any relevant heavy-lifting, that'd need to be 
done before relocations have been written.

Is the (completely untested) sort-of-a-patch below a complete rubbish 
(on top of current livepatching.git's for-next)?




diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..c003648 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }
 
-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 
/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }
 
 /* Is this module of this name done loading?  No locks held. */
@@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;
 
-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);
 
flush_module_icache(mod);
 
@@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;
 
+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto bug_cleanup;
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-04-01 Thread Jiri Kosina
On Tue, 29 Mar 2016, Jiri Kosina wrote:

> Agreed; I think we should be safe applying all the alternatives (with 
> paravirt being really just a special case of those) to the coming module 
> at the very last phase; they really are required only during runtime, 
> but nothing else should be depending on them. Right? If anyone is able 
> to come up with and counter-example, please speak up :)

So I have quickly gone through all the architectures that actually do 
overload __weak module_finalize() by their own implementation, and except 
for applying self-modifying code changes and registering unwind tables, 
there doesn't seem to be any relevant heavy-lifting, that'd need to be 
done before relocations have been written.

Is the (completely untested) sort-of-a-patch below a complete rubbish 
(on top of current livepatching.git's for-next)?




diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa6..c003648 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3211,7 +3211,7 @@ int __weak module_finalize(const Elf_Ehdr *hdr,
return 0;
 }
 
-static int post_relocation(struct module *mod, const struct load_info *info)
+static void post_relocation(struct module *mod, const struct load_info *info)
 {
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -3222,9 +3222,6 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 
/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
-
-   /* Arch-specific module finalizing. */
-   return module_finalize(info->hdr, info->sechdrs, mod);
 }
 
 /* Is this module of this name done loading?  No locks held. */
@@ -3562,9 +3559,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
if (err < 0)
goto free_modinfo;
 
-   err = post_relocation(mod, info);
-   if (err < 0)
-   goto free_modinfo;
+   post_relocation(mod, info);
 
flush_module_icache(mod);
 
@@ -3589,6 +3584,11 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
if (err)
goto bug_cleanup;
 
+   /* Arch-specific module finalizing. */
+   err = module_finalize(info->hdr, info->sechdrs, mod);
+   if (err)
+   goto bug_cleanup;
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-03-29 Thread Jiri Kosina
On Tue, 29 Mar 2016, Miroslav Benes wrote:

> > 1) Jessica proposed using the Arch-independent patchset ensure that 
> > livepatch
> > finishes writing its relas before apply_paravirt() is called. However, this
> > introduces a bit more arch-dependent code. It would be useful to see if 
> > other
> > arches are affected by this as well.
> 
> I think this is the way to go. Provided we have Jessica's two patch sets 
> applied (arch-independent and notifiers removal) there are two options. We 
> either move a call to klp_coming_module() somewhere before 
> module_finalize(), or we move the problematic parts of module_finalize() 
> to the end of load_module() (on x86 it is probably module_finalize() as a 
> whole). The former is almost impossible because of the dependencies 
> (ftrace and such), the latter should be doable (with very careful check we 
> won't break anything).

Agreed; I think we should be safe applying all the alternatives (with 
paravirt being really just a special case of those) to the coming module 
at the very last phase; they really are required only during runtime, but 
nothing else should be depending on them. Right? If anyone is able to come 
up with and counter-example, please speak up :)

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-03-29 Thread Jiri Kosina
On Tue, 29 Mar 2016, Miroslav Benes wrote:

> > 1) Jessica proposed using the Arch-independent patchset ensure that 
> > livepatch
> > finishes writing its relas before apply_paravirt() is called. However, this
> > introduces a bit more arch-dependent code. It would be useful to see if 
> > other
> > arches are affected by this as well.
> 
> I think this is the way to go. Provided we have Jessica's two patch sets 
> applied (arch-independent and notifiers removal) there are two options. We 
> either move a call to klp_coming_module() somewhere before 
> module_finalize(), or we move the problematic parts of module_finalize() 
> to the end of load_module() (on x86 it is probably module_finalize() as a 
> whole). The former is almost impossible because of the dependencies 
> (ftrace and such), the latter should be doable (with very careful check we 
> won't break anything).

Agreed; I think we should be safe applying all the alternatives (with 
paravirt being really just a special case of those) to the coming module 
at the very last phase; they really are required only during runtime, but 
nothing else should be depending on them. Right? If anyone is able to come 
up with and counter-example, please speak up :)

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Bug with paravirt ops and livepatches

2016-03-29 Thread Miroslav Benes

[ adding CCs ]

On Tue, 29 Mar 2016, Chris J Arges wrote:

> Paravirtualized ops and livepatching currently don't mix very well and can
> cause undefined behavor such as oops, invalid opcodes or corrupted stacks.
> The original discussion of this issue can be found here [1].
> 
> I've written an example livepatch module that reproduces the issue [2].
> In order to trigger the issue you must first insert the module then trigger
> the paravirt ops by starting a VM.
> 
> In the thread here [1] a couple of solutions have been proposed:

Hi,

oh no... so this is not only about paravirt ops but also about 
alternatives, jump labels and so on, isn't it?

> 1) Jessica proposed using the Arch-independent patchset ensure that livepatch
> finishes writing its relas before apply_paravirt() is called. However, this
> introduces a bit more arch-dependent code. It would be useful to see if other
> arches are affected by this as well.

I think this is the way to go. Provided we have Jessica's two patch sets 
applied (arch-independent and notifiers removal) there are two options. We 
either move a call to klp_coming_module() somewhere before 
module_finalize(), or we move the problematic parts of module_finalize() 
to the end of load_module() (on x86 it is probably module_finalize() as a 
whole). The former is almost impossible because of the dependencies 
(ftrace and such), the latter should be doable (with very careful check we 
won't break anything).

> 2) Eugene proposed skipping application of the rela if the instruction to be
> relocated has already been changed. This passes the initial example [2];
> however its unclear if/how this will break things.

Hm, I don't like this one. It really depends on that the paravirt 
instructions which are supposed to be patched do not contain the 
code which needs to be relocated. This can be true for now, but we have to 
think long-term... which leads me to... If the new instructions need to be 
relocated... this is indeed a problem, right? You'd need to fix 
kpatch-build somehow to generate appropriate dynrelas for the paravirt 
patched code. But, during the livepatch module generation one does not 
know if the code would be patched by alternatives. Crap :/

Miroslav

> It may be good to weigh in here and get more eyes on this.
> Thanks,
> --chris
> 
> [1]: https://github.com/dynup/kpatch/issues/580
> [2]: 
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl/livepatch.c
> --
> To unsubscribe from this list: send the line "unsubscribe live-patching" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



Re: Bug with paravirt ops and livepatches

2016-03-29 Thread Miroslav Benes

[ adding CCs ]

On Tue, 29 Mar 2016, Chris J Arges wrote:

> Paravirtualized ops and livepatching currently don't mix very well and can
> cause undefined behavor such as oops, invalid opcodes or corrupted stacks.
> The original discussion of this issue can be found here [1].
> 
> I've written an example livepatch module that reproduces the issue [2].
> In order to trigger the issue you must first insert the module then trigger
> the paravirt ops by starting a VM.
> 
> In the thread here [1] a couple of solutions have been proposed:

Hi,

oh no... so this is not only about paravirt ops but also about 
alternatives, jump labels and so on, isn't it?

> 1) Jessica proposed using the Arch-independent patchset ensure that livepatch
> finishes writing its relas before apply_paravirt() is called. However, this
> introduces a bit more arch-dependent code. It would be useful to see if other
> arches are affected by this as well.

I think this is the way to go. Provided we have Jessica's two patch sets 
applied (arch-independent and notifiers removal) there are two options. We 
either move a call to klp_coming_module() somewhere before 
module_finalize(), or we move the problematic parts of module_finalize() 
to the end of load_module() (on x86 it is probably module_finalize() as a 
whole). The former is almost impossible because of the dependencies 
(ftrace and such), the latter should be doable (with very careful check we 
won't break anything).

> 2) Eugene proposed skipping application of the rela if the instruction to be
> relocated has already been changed. This passes the initial example [2];
> however its unclear if/how this will break things.

Hm, I don't like this one. It really depends on that the paravirt 
instructions which are supposed to be patched do not contain the 
code which needs to be relocated. This can be true for now, but we have to 
think long-term... which leads me to... If the new instructions need to be 
relocated... this is indeed a problem, right? You'd need to fix 
kpatch-build somehow to generate appropriate dynrelas for the paravirt 
patched code. But, during the livepatch module generation one does not 
know if the code would be patched by alternatives. Crap :/

Miroslav

> It may be good to weigh in here and get more eyes on this.
> Thanks,
> --chris
> 
> [1]: https://github.com/dynup/kpatch/issues/580
> [2]: 
> http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl/livepatch.c
> --
> To unsubscribe from this list: send the line "unsubscribe live-patching" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



Bug with paravirt ops and livepatches

2016-03-29 Thread Chris J Arges
Paravirtualized ops and livepatching currently don't mix very well and can
cause undefined behavor such as oops, invalid opcodes or corrupted stacks.
The original discussion of this issue can be found here [1].

I've written an example livepatch module that reproduces the issue [2].
In order to trigger the issue you must first insert the module then trigger
the paravirt ops by starting a VM.

In the thread here [1] a couple of solutions have been proposed:

1) Jessica proposed using the Arch-independent patchset ensure that livepatch
finishes writing its relas before apply_paravirt() is called. However, this
introduces a bit more arch-dependent code. It would be useful to see if other
arches are affected by this as well.

2) Eugene proposed skipping application of the rela if the instruction to be
relocated has already been changed. This passes the initial example [2];
however its unclear if/how this will break things.

It may be good to weigh in here and get more eyes on this.
Thanks,
--chris

[1]: https://github.com/dynup/kpatch/issues/580
[2]: 
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl/livepatch.c


Bug with paravirt ops and livepatches

2016-03-29 Thread Chris J Arges
Paravirtualized ops and livepatching currently don't mix very well and can
cause undefined behavor such as oops, invalid opcodes or corrupted stacks.
The original discussion of this issue can be found here [1].

I've written an example livepatch module that reproduces the issue [2].
In order to trigger the issue you must first insert the module then trigger
the paravirt ops by starting a VM.

In the thread here [1] a couple of solutions have been proposed:

1) Jessica proposed using the Arch-independent patchset ensure that livepatch
finishes writing its relas before apply_paravirt() is called. However, this
introduces a bit more arch-dependent code. It would be useful to see if other
arches are affected by this as well.

2) Eugene proposed skipping application of the rela if the instruction to be
relocated has already been changed. This passes the initial example [2];
however its unclear if/how this will break things.

It may be good to weigh in here and get more eyes on this.
Thanks,
--chris

[1]: https://github.com/dynup/kpatch/issues/580
[2]: 
http://people.canonical.com/~arges/livepatch_issue/livepatch_kvm_arch_vm_ioctl/livepatch.c