Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-30 Thread Boris Ostrovsky

On 07/30/2015 04:05 PM, Andy Lutomirski wrote:

On Thu, Jul 30, 2015 at 1:01 PM, Boris Ostrovsky
 wrote:

On 07/30/2015 02:54 PM, Andrew Cooper wrote:

On 30/07/15 19:30, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
 wrote:

On 30/07/2015 00:13, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
 wrote:

On 29/07/2015 23:49, Boris Ostrovsky wrote:

On 07/29/2015 06:46 PM, David Vrabel wrote:

On 29/07/2015 23:11, Andrew Cooper wrote:

On 29/07/2015 23:05, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
 wrote:

On 29/07/2015 22:26, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs
themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c
b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
pgprot_t prot)
  pte = pfn_pte(pfn, prot);
 +   (void)*(volatile int*)v;
if (HYPERVISOR_update_va_mapping((unsigned long)v,
pte, 0)) {
pr_err("set_aliased_prot va update failed
w/
lazy mode
%u\n", paravirt_get_lazy_mode());
BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same
problem.

I think in most cases we know that page is mapped so hopefully
this is the
only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix
that
can go to x86/urgent in the next few days even if a clean fix
isn't
available yet?

Quick and dirty?

Reading from v is the most obvious and quick way, for areas where
we are
certain v exists, is kernel memory and is expected to have a
backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.

__get_user((char *)v, tmp), perhaps, unless there's something
better
in the wings.  Keep in mind that we need this for -stable, and
it's
likely to get backported quite quickly due to CVE-2015-5157.

Hmm - something like that tucked inside
HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport,
nor
are we sure it is a viable fix at this time.

Changing this one use of update_va_mapping to use
mmu_update_normal_pt
is the correct fix to unblock this LDT series.  I see no reason why
this
cannot be backported.

To properly fix it should include batching and that is not something
that I think we should target for stable.

Batching is absolutely not necessary to alter update_va_mapping to
mmu_update_normal_pt.  After all, update_va_mapping isn't batched.

However this isn't the first issue issue we have had lazy mmu
faulting,
and I doubt it is the last.  There are not many callsites of
update_va_mapping - I will audit them tomorrow and see if any similar
issues are lurking elsewhere.

One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
access to fault.  Is this something we should be worried about?

Yes.  update_va_mapping() will function perfectly well taking one RW
mapping to RO even if there is a second RW mapping.  In such a case, the
next LDT access will fault.

Which is a problem because that alias might still exist, and also
because Linux really doesn't expect that fault.


On closer inspection, Xen is rather unhelpful with the fault.  Xen's
lazy #PF will be bounced back to the guest with cr2 adjusted to appear
in the range passed to set_ldt().  The error code however will be
unmodified (and limited only by not-user and not-reserved), so will
appear as a non-present read or write supervisor access to an address
which the kernel has a valid read mapping of.

More yuck.

I think I'm just going to stick an unconditional vm_flush_aliases in
alloc_ldt.


Therefore, set_ldt() needs to be confident that there are no writeable
mappings to the frames used to make up the LDT.  It could proactively
fault them in by accessing one descriptor in each page inside the limit,
but by the time a fault is received it is probably too late to work out
where the other mapping is which prevented the typechange (or indeed,
whether Xen objected to one of the descriptors instead).

This seems like overkill.

I'm still a bit confused, though: the failure is in xen_free_ldt.  How
do we make it all the way to xen_free_ldt without the vmapped page
existing in the guest's page tables?  After all, we had to survive
xen_alloc_ldt first, and ISTM that should fail in exactly the same
way.

(Summarising part of a discussion which has just occurred on IRC)

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-30 Thread Andy Lutomirski
On Thu, Jul 30, 2015 at 1:01 PM, Boris Ostrovsky
 wrote:
> On 07/30/2015 02:54 PM, Andrew Cooper wrote:
>>
>> On 30/07/15 19:30, Andy Lutomirski wrote:
>>>
>>> On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
>>>  wrote:

 On 30/07/2015 00:13, Andy Lutomirski wrote:
>
> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>  wrote:
>>
>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>>>
>>> On 07/29/2015 06:46 PM, David Vrabel wrote:

 On 29/07/2015 23:11, Andrew Cooper wrote:
>
> On 29/07/2015 23:05, Andy Lutomirski wrote:
>>
>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>  wrote:
>>>
>>> On 29/07/2015 22:26, Andy Lutomirski wrote:

 On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
  wrote:
>
> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>
>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>>
>>> FYI, I have got a repro now and am investigating.
>>
>> Good and bad news.  This bug has nothing to do with LDTs
>> themselves.
>>
>> I have worked out what is going on, but this:
>>
>> diff --git a/arch/x86/xen/enlighten.c
>> b/arch/x86/xen/enlighten.c
>> index 5abeaac..7e1a82e 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>> pgprot_t prot)
>>  pte = pfn_pte(pfn, prot);
>> +   (void)*(volatile int*)v;
>>if (HYPERVISOR_update_va_mapping((unsigned long)v,
>> pte, 0)) {
>>pr_err("set_aliased_prot va update failed
>> w/
>> lazy mode
>> %u\n", paravirt_get_lazy_mode());
>>BUG();
>>
>> Is perhaps not the fix we are looking for, and every use of
>> HYPERVISOR_update_va_mapping() is susceptible to the same
>> problem.
>
> I think in most cases we know that page is mapped so hopefully
> this is the
> only site that we need to be careful about.

 Is there any chance we can get some kind of quick-and-dirty fix
 that
 can go to x86/urgent in the next few days even if a clean fix
 isn't
 available yet?
>>>
>>> Quick and dirty?
>>>
>>> Reading from v is the most obvious and quick way, for areas where
>>> we are
>>> certain v exists, is kernel memory and is expected to have a
>>> backing
>>> page.  I don't know offhand how many of current
>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>>
>> __get_user((char *)v, tmp), perhaps, unless there's something
>> better
>> in the wings.  Keep in mind that we need this for -stable, and
>> it's
>> likely to get backported quite quickly due to CVE-2015-5157.
>
> Hmm - something like that tucked inside
> HYPERVISOR_update_va_mapping()
> would probably work, and certainly be minimal hassle for -stable.
>
> Altering the hypercall used is certainly not something to backport,
> nor
> are we sure it is a viable fix at this time.

 Changing this one use of update_va_mapping to use
 mmu_update_normal_pt
 is the correct fix to unblock this LDT series.  I see no reason why
 this
 cannot be backported.
>>>
>>> To properly fix it should include batching and that is not something
>>> that I think we should target for stable.
>>
>> Batching is absolutely not necessary to alter update_va_mapping to
>> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>>
>> However this isn't the first issue issue we have had lazy mmu
>> faulting,
>> and I doubt it is the last.  There are not many callsites of
>> update_va_mapping - I will audit them tomorrow and see if any similar
>> issues are lurking elsewhere.
>
> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
> access to fault.  Is this something we should be worried about?

 Yes.  update_va_mapping() will function perfectly well taking one RW
 mapping to RO even if there is a second RW mapping.  In such a case, the
 next LDT access will fault.
>>>
>>> Which is a problem because that alias might still exist, and also
>>> because Linux really doesn't expect that fault.
>>>
 On closer inspection, Xen is rather unhelpful with the fault.  Xen's
>>>

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-30 Thread Boris Ostrovsky

On 07/30/2015 02:54 PM, Andrew Cooper wrote:

On 30/07/15 19:30, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
 wrote:

On 30/07/2015 00:13, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
 wrote:

On 29/07/2015 23:49, Boris Ostrovsky wrote:

On 07/29/2015 06:46 PM, David Vrabel wrote:

On 29/07/2015 23:11, Andrew Cooper wrote:

On 29/07/2015 23:05, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
 wrote:

On 29/07/2015 22:26, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs
themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
pgprot_t prot)
 pte = pfn_pte(pfn, prot);
+   (void)*(volatile int*)v;
   if (HYPERVISOR_update_va_mapping((unsigned long)v,
pte, 0)) {
   pr_err("set_aliased_prot va update failed w/
lazy mode
%u\n", paravirt_get_lazy_mode());
   BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.

I think in most cases we know that page is mapped so hopefully
this is the
only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?

Quick and dirty?

Reading from v is the most obvious and quick way, for areas where
we are
certain v exists, is kernel memory and is expected to have a backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.

__get_user((char *)v, tmp), perhaps, unless there's something better
in the wings.  Keep in mind that we need this for -stable, and it's
likely to get backported quite quickly due to CVE-2015-5157.

Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport, nor
are we sure it is a viable fix at this time.

Changing this one use of update_va_mapping to use mmu_update_normal_pt
is the correct fix to unblock this LDT series.  I see no reason why this
cannot be backported.

To properly fix it should include batching and that is not something
that I think we should target for stable.

Batching is absolutely not necessary to alter update_va_mapping to
mmu_update_normal_pt.  After all, update_va_mapping isn't batched.

However this isn't the first issue issue we have had lazy mmu faulting,
and I doubt it is the last.  There are not many callsites of
update_va_mapping - I will audit them tomorrow and see if any similar
issues are lurking elsewhere.

One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
access to fault.  Is this something we should be worried about?

Yes.  update_va_mapping() will function perfectly well taking one RW
mapping to RO even if there is a second RW mapping.  In such a case, the
next LDT access will fault.

Which is a problem because that alias might still exist, and also
because Linux really doesn't expect that fault.


On closer inspection, Xen is rather unhelpful with the fault.  Xen's
lazy #PF will be bounced back to the guest with cr2 adjusted to appear
in the range passed to set_ldt().  The error code however will be
unmodified (and limited only by not-user and not-reserved), so will
appear as a non-present read or write supervisor access to an address
which the kernel has a valid read mapping of.

More yuck.

I think I'm just going to stick an unconditional vm_flush_aliases in alloc_ldt.


Therefore, set_ldt() needs to be confident that there are no writeable
mappings to the frames used to make up the LDT.  It could proactively
fault them in by accessing one descriptor in each page inside the limit,
but by the time a fault is received it is probably too late to work out
where the other mapping is which prevented the typechange (or indeed,
whether Xen objected to one of the descriptors instead).

This seems like overkill.

I'm still a bit confused, though: the failure is in xen_free_ldt.  How
do we make it all the way to xen_free_ldt without the vmapped page
existing in the guest's page tables?  After all, we had to survive
xen_alloc_ldt first, and ISTM that should fail in exactly the same
way.

(Summarising part of a discussion which has just occurred on IRC)

I presume that xen_free_ldt() is called while in the context of an mm
which doesn't have the particular area 

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-30 Thread Andrew Cooper
On 30/07/15 19:30, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
>  wrote:
>> On 30/07/2015 00:13, Andy Lutomirski wrote:
>>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>>>  wrote:
 On 29/07/2015 23:49, Boris Ostrovsky wrote:
> On 07/29/2015 06:46 PM, David Vrabel wrote:
>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
 On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
  wrote:
> On 29/07/2015 22:26, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>  wrote:
>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
 On 29/07/15 15:43, Boris Ostrovsky wrote:
> FYI, I have got a repro now and am investigating.
 Good and bad news.  This bug has nothing to do with LDTs
 themselves.

 I have worked out what is going on, but this:

 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index 5abeaac..7e1a82e 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
 pgprot_t prot)
 pte = pfn_pte(pfn, prot);
+   (void)*(volatile int*)v;
   if (HYPERVISOR_update_va_mapping((unsigned long)v,
 pte, 0)) {
   pr_err("set_aliased_prot va update failed w/
 lazy mode
 %u\n", paravirt_get_lazy_mode());
   BUG();

 Is perhaps not the fix we are looking for, and every use of
 HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>> I think in most cases we know that page is mapped so hopefully
>>> this is the
>>> only site that we need to be careful about.
>> Is there any chance we can get some kind of quick-and-dirty fix that
>> can go to x86/urgent in the next few days even if a clean fix isn't
>> available yet?
> Quick and dirty?
>
> Reading from v is the most obvious and quick way, for areas where
> we are
> certain v exists, is kernel memory and is expected to have a backing
> page.  I don't know offhand how many of current
> HYPERVISOR_update_va_mapping() callsites this applies to.
 __get_user((char *)v, tmp), perhaps, unless there's something better
 in the wings.  Keep in mind that we need this for -stable, and it's
 likely to get backported quite quickly due to CVE-2015-5157.
>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>>> would probably work, and certainly be minimal hassle for -stable.
>>>
>>> Altering the hypercall used is certainly not something to backport, nor
>>> are we sure it is a viable fix at this time.
>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>> is the correct fix to unblock this LDT series.  I see no reason why this
>> cannot be backported.
> To properly fix it should include batching and that is not something
> that I think we should target for stable.
 Batching is absolutely not necessary to alter update_va_mapping to
 mmu_update_normal_pt.  After all, update_va_mapping isn't batched.

 However this isn't the first issue issue we have had lazy mmu faulting,
 and I doubt it is the last.  There are not many callsites of
 update_va_mapping - I will audit them tomorrow and see if any similar
 issues are lurking elsewhere.
>>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
>>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
>>> access to fault.  Is this something we should be worried about?
>> Yes.  update_va_mapping() will function perfectly well taking one RW
>> mapping to RO even if there is a second RW mapping.  In such a case, the
>> next LDT access will fault.
> Which is a problem because that alias might still exist, and also
> because Linux really doesn't expect that fault.
>
>> On closer inspection, Xen is rather unhelpful with the fault.  Xen's
>> lazy #PF will be bounced back to the guest with cr2 adjusted to appear
>> in the range passed to set_ldt().  The error code however will be
>> unmodified (and limited only by not-user and not-reserved), so will
>> appear as a non-present read or write supervisor access to an address
>> which the kernel has a valid read mapping of.
> More yuck.
>
> I think I'm just going to stick an unconditional vm_flush_aliases in 
> alloc_ldt.
>
>> Therefore, set_ldt() needs to be confident that there are no writeable
>> mappings to the frames used to make up the LDT.  It could proactively
>> fault them in by accessing one descriptor in each page inside the limit,
>> but by the t

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-30 Thread Andy Lutomirski
On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
 wrote:
> On 30/07/2015 00:13, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>>  wrote:
>>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
 On 07/29/2015 06:46 PM, David Vrabel wrote:
> On 29/07/2015 23:11, Andrew Cooper wrote:
>> On 29/07/2015 23:05, Andy Lutomirski wrote:
>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>>  wrote:
 On 29/07/2015 22:26, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>  wrote:
>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>> On 29/07/15 15:43, Boris Ostrovsky wrote:
 FYI, I have got a repro now and am investigating.
>>> Good and bad news.  This bug has nothing to do with LDTs
>>> themselves.
>>>
>>> I have worked out what is going on, but this:
>>>
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 5abeaac..7e1a82e 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>>> pgprot_t prot)
>>> pte = pfn_pte(pfn, prot);
>>>+   (void)*(volatile int*)v;
>>>   if (HYPERVISOR_update_va_mapping((unsigned long)v,
>>> pte, 0)) {
>>>   pr_err("set_aliased_prot va update failed w/
>>> lazy mode
>>> %u\n", paravirt_get_lazy_mode());
>>>   BUG();
>>>
>>> Is perhaps not the fix we are looking for, and every use of
>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>> I think in most cases we know that page is mapped so hopefully
>> this is the
>> only site that we need to be careful about.
> Is there any chance we can get some kind of quick-and-dirty fix that
> can go to x86/urgent in the next few days even if a clean fix isn't
> available yet?
 Quick and dirty?

 Reading from v is the most obvious and quick way, for areas where
 we are
 certain v exists, is kernel memory and is expected to have a backing
 page.  I don't know offhand how many of current
 HYPERVISOR_update_va_mapping() callsites this applies to.
>>> __get_user((char *)v, tmp), perhaps, unless there's something better
>>> in the wings.  Keep in mind that we need this for -stable, and it's
>>> likely to get backported quite quickly due to CVE-2015-5157.
>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>> would probably work, and certainly be minimal hassle for -stable.
>>
>> Altering the hypercall used is certainly not something to backport, nor
>> are we sure it is a viable fix at this time.
> Changing this one use of update_va_mapping to use mmu_update_normal_pt
> is the correct fix to unblock this LDT series.  I see no reason why this
> cannot be backported.
 To properly fix it should include batching and that is not something
 that I think we should target for stable.
>>> Batching is absolutely not necessary to alter update_va_mapping to
>>> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>>>
>>> However this isn't the first issue issue we have had lazy mmu faulting,
>>> and I doubt it is the last.  There are not many callsites of
>>> update_va_mapping - I will audit them tomorrow and see if any similar
>>> issues are lurking elsewhere.
>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
>> access to fault.  Is this something we should be worried about?
>
> Yes.  update_va_mapping() will function perfectly well taking one RW
> mapping to RO even if there is a second RW mapping.  In such a case, the
> next LDT access will fault.

Which is a problem because that alias might still exist, and also
because Linux really doesn't expect that fault.

>
> On closer inspection, Xen is rather unhelpful with the fault.  Xen's
> lazy #PF will be bounced back to the guest with cr2 adjusted to appear
> in the range passed to set_ldt().  The error code however will be
> unmodified (and limited only by not-user and not-reserved), so will
> appear as a non-present read or write supervisor access to an address
> which the kernel has a valid read mapping of.

More yuck.

I think I'm just going to stick an unconditional vm_flush_aliases in alloc_ldt.

> Therefore, set_ldt() needs to be confident that there are no writeable
> mappings to the frames used to make up the LDT.  It could proactively
> fault them in by accessing one descriptor in each page inside the limit,
> but by the time a fault is received it is probably too late to work out
> where the other mapping is which prevented the typechange (or indeed,
> whethe

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 30/07/2015 00:13, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>  wrote:
>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>>> On 07/29/2015 06:46 PM, David Vrabel wrote:
 On 29/07/2015 23:11, Andrew Cooper wrote:
> On 29/07/2015 23:05, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>  wrote:
>>> On 29/07/2015 22:26, Andy Lutomirski wrote:
 On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
  wrote:
> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>> FYI, I have got a repro now and am investigating.
>> Good and bad news.  This bug has nothing to do with LDTs
>> themselves.
>>
>> I have worked out what is going on, but this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5abeaac..7e1a82e 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>> pgprot_t prot)
>> pte = pfn_pte(pfn, prot);
>>+   (void)*(volatile int*)v;
>>   if (HYPERVISOR_update_va_mapping((unsigned long)v,
>> pte, 0)) {
>>   pr_err("set_aliased_prot va update failed w/
>> lazy mode
>> %u\n", paravirt_get_lazy_mode());
>>   BUG();
>>
>> Is perhaps not the fix we are looking for, and every use of
>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
> I think in most cases we know that page is mapped so hopefully
> this is the
> only site that we need to be careful about.
 Is there any chance we can get some kind of quick-and-dirty fix that
 can go to x86/urgent in the next few days even if a clean fix isn't
 available yet?
>>> Quick and dirty?
>>>
>>> Reading from v is the most obvious and quick way, for areas where
>>> we are
>>> certain v exists, is kernel memory and is expected to have a backing
>>> page.  I don't know offhand how many of current
>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>> __get_user((char *)v, tmp), perhaps, unless there's something better
>> in the wings.  Keep in mind that we need this for -stable, and it's
>> likely to get backported quite quickly due to CVE-2015-5157.
> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
> would probably work, and certainly be minimal hassle for -stable.
>
> Altering the hypercall used is certainly not something to backport, nor
> are we sure it is a viable fix at this time.
 Changing this one use of update_va_mapping to use mmu_update_normal_pt
 is the correct fix to unblock this LDT series.  I see no reason why this
 cannot be backported.
>>> To properly fix it should include batching and that is not something
>>> that I think we should target for stable.
>> Batching is absolutely not necessary to alter update_va_mapping to
>> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>>
>> However this isn't the first issue issue we have had lazy mmu faulting,
>> and I doubt it is the last.  There are not many callsites of
>> update_va_mapping - I will audit them tomorrow and see if any similar
>> issues are lurking elsewhere.
> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
> access to fault.  Is this something we should be worried about?

Yes.  update_va_mapping() will function perfectly well taking one RW
mapping to RO even if there is a second RW mapping.  In such a case, the
next LDT access will fault.

On closer inspection, Xen is rather unhelpful with the fault.  Xen's
lazy #PF will be bounced back to the guest with cr2 adjusted to appear
in the range passed to set_ldt().  The error code however will be
unmodified (and limited only by not-user and not-reserved), so will
appear as a non-present read or write supervisor access to an address
which the kernel has a valid read mapping of.

Unlike pagetables, there is no notion of pinning a segdesc page in the
Xen ABI.  Pinning to a type allows the guest to take a single extra type
ref, and as a side effect forces eager validation of the contents.  It
also prevents another unsuspecting vcpu from coming along, constructing
a writeable mapping and turning the soon-to-be-faulted-in LDT into a
plain writeable page and forcing a fault.

This frankly looks like an oversight, as pinning a segdesc page would
work work fine in the existing page model; it is just that there isn't a
hypercall to make such an action happen.

Therefore, set_ldt() needs to be confident that there are no writeable
mappings to the frames used to make up the LDT.  It could proactively
fault t

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andy Lutomirski
On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
 wrote:
> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>>
>>> On 29/07/2015 23:11, Andrew Cooper wrote:
 On 29/07/2015 23:05, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>  wrote:
>> On 29/07/2015 22:26, Andy Lutomirski wrote:
>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>>  wrote:
 On 07/29/2015 03:03 PM, Andrew Cooper wrote:
> On 29/07/15 15:43, Boris Ostrovsky wrote:
>> FYI, I have got a repro now and am investigating.
> Good and bad news.  This bug has nothing to do with LDTs
> themselves.
>
> I have worked out what is going on, but this:
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 5abeaac..7e1a82e 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
> pgprot_t prot)
> pte = pfn_pte(pfn, prot);
>+   (void)*(volatile int*)v;
>   if (HYPERVISOR_update_va_mapping((unsigned long)v,
> pte, 0)) {
>   pr_err("set_aliased_prot va update failed w/
> lazy mode
> %u\n", paravirt_get_lazy_mode());
>   BUG();
>
> Is perhaps not the fix we are looking for, and every use of
> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
 I think in most cases we know that page is mapped so hopefully
 this is the
 only site that we need to be careful about.
>>> Is there any chance we can get some kind of quick-and-dirty fix that
>>> can go to x86/urgent in the next few days even if a clean fix isn't
>>> available yet?
>> Quick and dirty?
>>
>> Reading from v is the most obvious and quick way, for areas where
>> we are
>> certain v exists, is kernel memory and is expected to have a backing
>> page.  I don't know offhand how many of current
>> HYPERVISOR_update_va_mapping() callsites this applies to.
> __get_user((char *)v, tmp), perhaps, unless there's something better
> in the wings.  Keep in mind that we need this for -stable, and it's
> likely to get backported quite quickly due to CVE-2015-5157.
 Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
 would probably work, and certainly be minimal hassle for -stable.

 Altering the hypercall used is certainly not something to backport, nor
 are we sure it is a viable fix at this time.
>>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>>> is the correct fix to unblock this LDT series.  I see no reason why this
>>> cannot be backported.
>>
>> To properly fix it should include batching and that is not something
>> that I think we should target for stable.
>
> Batching is absolutely not necessary to alter update_va_mapping to
> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>
> However this isn't the first issue issue we have had lazy mmu faulting,
> and I doubt it is the last.  There are not many callsites of
> update_va_mapping - I will audit them tomorrow and see if any similar
> issues are lurking elsewhere.

One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
access to fault.  Is this something we should be worried about?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread David Vrabel


On 29/07/2015 23:11, Andrew Cooper wrote:
> On 29/07/2015 23:05, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>  wrote:
>>> On 29/07/2015 22:26, Andy Lutomirski wrote:
 On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
  wrote:
> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>> FYI, I have got a repro now and am investigating.
>> Good and bad news.  This bug has nothing to do with LDTs themselves.
>>
>> I have worked out what is going on, but this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5abeaac..7e1a82e 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
>>pte = pfn_pte(pfn, prot);
>>   +   (void)*(volatile int*)v;
>>  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
>>  pr_err("set_aliased_prot va update failed w/ lazy mode
>> %u\n", paravirt_get_lazy_mode());
>>  BUG();
>>
>> Is perhaps not the fix we are looking for, and every use of
>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
> I think in most cases we know that page is mapped so hopefully this is the
> only site that we need to be careful about.
 Is there any chance we can get some kind of quick-and-dirty fix that
 can go to x86/urgent in the next few days even if a clean fix isn't
 available yet?
>>> Quick and dirty?
>>>
>>> Reading from v is the most obvious and quick way, for areas where we are
>>> certain v exists, is kernel memory and is expected to have a backing
>>> page.  I don't know offhand how many of current
>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>> __get_user((char *)v, tmp), perhaps, unless there's something better
>> in the wings.  Keep in mind that we need this for -stable, and it's
>> likely to get backported quite quickly due to CVE-2015-5157.
> 
> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
> would probably work, and certainly be minimal hassle for -stable.
> 
> Altering the hypercall used is certainly not something to backport, nor
> are we sure it is a viable fix at this time.

Changing this one use of update_va_mapping to use mmu_update_normal_pt
is the correct fix to unblock this LDT series.  I see no reason why this
cannot be backported.

We can address any other potential update_va_mapping calls at a later
date (if they are shown to be problematic).

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 29/07/2015 23:49, Boris Ostrovsky wrote:
> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>
>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
 On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
  wrote:
> On 29/07/2015 22:26, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>  wrote:
>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
 On 29/07/15 15:43, Boris Ostrovsky wrote:
> FYI, I have got a repro now and am investigating.
 Good and bad news.  This bug has nothing to do with LDTs
 themselves.

 I have worked out what is going on, but this:

 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index 5abeaac..7e1a82e 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
 pgprot_t prot)
 pte = pfn_pte(pfn, prot);
+   (void)*(volatile int*)v;
   if (HYPERVISOR_update_va_mapping((unsigned long)v,
 pte, 0)) {
   pr_err("set_aliased_prot va update failed w/
 lazy mode
 %u\n", paravirt_get_lazy_mode());
   BUG();

 Is perhaps not the fix we are looking for, and every use of
 HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>> I think in most cases we know that page is mapped so hopefully
>>> this is the
>>> only site that we need to be careful about.
>> Is there any chance we can get some kind of quick-and-dirty fix that
>> can go to x86/urgent in the next few days even if a clean fix isn't
>> available yet?
> Quick and dirty?
>
> Reading from v is the most obvious and quick way, for areas where
> we are
> certain v exists, is kernel memory and is expected to have a backing
> page.  I don't know offhand how many of current
> HYPERVISOR_update_va_mapping() callsites this applies to.
 __get_user((char *)v, tmp), perhaps, unless there's something better
 in the wings.  Keep in mind that we need this for -stable, and it's
 likely to get backported quite quickly due to CVE-2015-5157.
>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>>> would probably work, and certainly be minimal hassle for -stable.
>>>
>>> Altering the hypercall used is certainly not something to backport, nor
>>> are we sure it is a viable fix at this time.
>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>> is the correct fix to unblock this LDT series.  I see no reason why this
>> cannot be backported.
>
> To properly fix it should include batching and that is not something
> that I think we should target for stable.

Batching is absolutely not necessary to alter update_va_mapping to
mmu_update_normal_pt.  After all, update_va_mapping isn't batched.

However this isn't the first issue issue we have had lazy mmu faulting,
and I doubt it is the last.  There are not many callsites of
update_va_mapping - I will audit them tomorrow and see if any similar
issues are lurking elsewhere.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread David Vrabel


On 29/07/2015 23:49, Boris Ostrovsky wrote:
> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>
>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
 On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
  wrote:
> On 29/07/2015 22:26, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>  wrote:
>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
 On 29/07/15 15:43, Boris Ostrovsky wrote:
> FYI, I have got a repro now and am investigating.
 Good and bad news.  This bug has nothing to do with LDTs
 themselves.

 I have worked out what is going on, but this:

 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index 5abeaac..7e1a82e 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
 pgprot_t prot)
 pte = pfn_pte(pfn, prot);
+   (void)*(volatile int*)v;
   if (HYPERVISOR_update_va_mapping((unsigned long)v,
 pte, 0)) {
   pr_err("set_aliased_prot va update failed w/
 lazy mode
 %u\n", paravirt_get_lazy_mode());
   BUG();

 Is perhaps not the fix we are looking for, and every use of
 HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>> I think in most cases we know that page is mapped so hopefully
>>> this is the
>>> only site that we need to be careful about.
>> Is there any chance we can get some kind of quick-and-dirty fix that
>> can go to x86/urgent in the next few days even if a clean fix isn't
>> available yet?
> Quick and dirty?
>
> Reading from v is the most obvious and quick way, for areas where
> we are
> certain v exists, is kernel memory and is expected to have a backing
> page.  I don't know offhand how many of current
> HYPERVISOR_update_va_mapping() callsites this applies to.
 __get_user((char *)v, tmp), perhaps, unless there's something better
 in the wings.  Keep in mind that we need this for -stable, and it's
 likely to get backported quite quickly due to CVE-2015-5157.
>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>>> would probably work, and certainly be minimal hassle for -stable.
>>>
>>> Altering the hypercall used is certainly not something to backport, nor
>>> are we sure it is a viable fix at this time.
>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>> is the correct fix to unblock this LDT series.  I see no reason why this
>> cannot be backported.
> 
> To properly fix it should include batching and that is not something
> that I think we should target for stable.

The original call isn't batched, so it's replacement doesn't need to be,

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Boris Ostrovsky

On 07/29/2015 06:46 PM, David Vrabel wrote:


On 29/07/2015 23:11, Andrew Cooper wrote:

On 29/07/2015 23:05, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
 wrote:

On 29/07/2015 22:26, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte = pfn_pte(pfn, prot);
   +   (void)*(volatile int*)v;
  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
  pr_err("set_aliased_prot va update failed w/ lazy mode
%u\n", paravirt_get_lazy_mode());
  BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.

I think in most cases we know that page is mapped so hopefully this is the
only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?

Quick and dirty?

Reading from v is the most obvious and quick way, for areas where we are
certain v exists, is kernel memory and is expected to have a backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.

__get_user((char *)v, tmp), perhaps, unless there's something better
in the wings.  Keep in mind that we need this for -stable, and it's
likely to get backported quite quickly due to CVE-2015-5157.

Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport, nor
are we sure it is a viable fix at this time.

Changing this one use of update_va_mapping to use mmu_update_normal_pt
is the correct fix to unblock this LDT series.  I see no reason why this
cannot be backported.


To properly fix it should include batching and that is not something 
that I think we should target for stable.


-boris




We can address any other potential update_va_mapping calls at a later
date (if they are shown to be problematic).

David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Boris Ostrovsky

On 07/29/2015 06:11 PM, Andrew Cooper wrote:

On 29/07/2015 23:05, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
 wrote:

On 29/07/2015 22:26, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte = pfn_pte(pfn, prot);
   +   (void)*(volatile int*)v;
  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
  pr_err("set_aliased_prot va update failed w/ lazy mode
%u\n", paravirt_get_lazy_mode());
  BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.

I think in most cases we know that page is mapped so hopefully this is the
only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?

Quick and dirty?

Reading from v is the most obvious and quick way, for areas where we are
certain v exists, is kernel memory and is expected to have a backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.

__get_user((char *)v, tmp), perhaps, unless there's something better
in the wings.  Keep in mind that we need this for -stable, and it's
likely to get backported quite quickly due to CVE-2015-5157.

Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport, nor
are we sure it is a viable fix at this time.


OK, I'll test tonight this quick fix and will defer a more proper patch 
for 4.3 then.


-boris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 29/07/2015 23:05, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>  wrote:
>> On 29/07/2015 22:26, Andy Lutomirski wrote:
>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>>  wrote:
 On 07/29/2015 03:03 PM, Andrew Cooper wrote:
> On 29/07/15 15:43, Boris Ostrovsky wrote:
>> FYI, I have got a repro now and am investigating.
> Good and bad news.  This bug has nothing to do with LDTs themselves.
>
> I have worked out what is going on, but this:
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 5abeaac..7e1a82e 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
>pte = pfn_pte(pfn, prot);
>   +   (void)*(volatile int*)v;
>  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
>  pr_err("set_aliased_prot va update failed w/ lazy mode
> %u\n", paravirt_get_lazy_mode());
>  BUG();
>
> Is perhaps not the fix we are looking for, and every use of
> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
 I think in most cases we know that page is mapped so hopefully this is the
 only site that we need to be careful about.
>>> Is there any chance we can get some kind of quick-and-dirty fix that
>>> can go to x86/urgent in the next few days even if a clean fix isn't
>>> available yet?
>> Quick and dirty?
>>
>> Reading from v is the most obvious and quick way, for areas where we are
>> certain v exists, is kernel memory and is expected to have a backing
>> page.  I don't know offhand how many of current
>> HYPERVISOR_update_va_mapping() callsites this applies to.
> __get_user((char *)v, tmp), perhaps, unless there's something better
> in the wings.  Keep in mind that we need this for -stable, and it's
> likely to get backported quite quickly due to CVE-2015-5157.

Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport, nor
are we sure it is a viable fix at this time.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andy Lutomirski
On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
 wrote:
> On 29/07/2015 22:26, Andy Lutomirski wrote:
>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>  wrote:
>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
 On 29/07/15 15:43, Boris Ostrovsky wrote:
> FYI, I have got a repro now and am investigating.
 Good and bad news.  This bug has nothing to do with LDTs themselves.

 I have worked out what is going on, but this:

 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index 5abeaac..7e1a82e 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte = pfn_pte(pfn, prot);
   +   (void)*(volatile int*)v;
  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
  pr_err("set_aliased_prot va update failed w/ lazy mode
 %u\n", paravirt_get_lazy_mode());
  BUG();

 Is perhaps not the fix we are looking for, and every use of
 HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>>
>>> I think in most cases we know that page is mapped so hopefully this is the
>>> only site that we need to be careful about.
>> Is there any chance we can get some kind of quick-and-dirty fix that
>> can go to x86/urgent in the next few days even if a clean fix isn't
>> available yet?
>
> Quick and dirty?
>
> Reading from v is the most obvious and quick way, for areas where we are
> certain v exists, is kernel memory and is expected to have a backing
> page.  I don't know offhand how many of current
> HYPERVISOR_update_va_mapping() callsites this applies to.

__get_user((char *)v, tmp), perhaps, unless there's something better
in the wings.  Keep in mind that we need this for -stable, and it's
likely to get backported quite quickly due to CVE-2015-5157.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 29/07/2015 22:26, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>  wrote:
>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>> On 29/07/15 15:43, Boris Ostrovsky wrote:
 FYI, I have got a repro now and am investigating.
>>> Good and bad news.  This bug has nothing to do with LDTs themselves.
>>>
>>> I have worked out what is going on, but this:
>>>
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 5abeaac..7e1a82e 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
>>>pte = pfn_pte(pfn, prot);
>>>   +   (void)*(volatile int*)v;
>>>  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
>>>  pr_err("set_aliased_prot va update failed w/ lazy mode
>>> %u\n", paravirt_get_lazy_mode());
>>>  BUG();
>>>
>>> Is perhaps not the fix we are looking for, and every use of
>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>
>> I think in most cases we know that page is mapped so hopefully this is the
>> only site that we need to be careful about.
> Is there any chance we can get some kind of quick-and-dirty fix that
> can go to x86/urgent in the next few days even if a clean fix isn't
> available yet?

Quick and dirty?

Reading from v is the most obvious and quick way, for areas where we are
certain v exists, is kernel memory and is expected to have a backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.

>
>>> The update_va_mapping hypercall is designed to emulate writing the pte
>>> for v, with auditing applied.  As part of this, it does a pagewalk on v
>>> to locate and map the l1.  During this walk, Xen it finds the l2 not
>>> present, and fails the hypercall.  i.e. v is not reachable from the
>>> current cr3.
>>>
>>> Reading the virtual address immediately before issuing the hypercall
>>> causes Linux's memory faulting logic to fault in the l2.  This also
>>> explains why vm_unmap_aliases() appears to fix the issue; it is likely
>>> to fault in enough of the paging structure for v to be reachable.
>>
>> We've just touched this page (in write_ldt()) in this test so why would it
>> not be mapped?
> With my patches applied, the LDT is never written via any paravirt
> hook -- I write it once (possibly implicitly using kzalloc/vzalloc)
> before paravirt_alloc_ldt(), and write_ldt() is never called.  We
> could even remove it write_ldt() :)

Even better!

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Boris Ostrovsky

On 07/29/2015 05:26 PM, Andy Lutomirski wrote:

On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte = pfn_pte(pfn, prot);
   +   (void)*(volatile int*)v;
  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
  pr_err("set_aliased_prot va update failed w/ lazy mode
%u\n", paravirt_get_lazy_mode());
  BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.


I think in most cases we know that page is mapped so hopefully this is the
only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?


I'll try to have it tomorrow.




The update_va_mapping hypercall is designed to emulate writing the pte
for v, with auditing applied.  As part of this, it does a pagewalk on v
to locate and map the l1.  During this walk, Xen it finds the l2 not
present, and fails the hypercall.  i.e. v is not reachable from the
current cr3.

Reading the virtual address immediately before issuing the hypercall
causes Linux's memory faulting logic to fault in the l2.  This also
explains why vm_unmap_aliases() appears to fix the issue; it is likely
to fault in enough of the paging structure for v to be reachable.


We've just touched this page (in write_ldt()) in this test so why would it
not be mapped?

With my patches applied, the LDT is never written via any paravirt
hook -- I write it once (possibly implicitly using kzalloc/vzalloc)
before paravirt_alloc_ldt(), and write_ldt() is never called.  We
could even remove it write_ldt() :)


I was referring to 'new_ldt->entries[ldt_info.entry_number] = ldt;' 
which we do write in this test, so it will fault the page in.


-boris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andy Lutomirski
On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
 wrote:
> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>
>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>>
>>> FYI, I have got a repro now and am investigating.
>>
>> Good and bad news.  This bug has nothing to do with LDTs themselves.
>>
>> I have worked out what is going on, but this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5abeaac..7e1a82e 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
>>pte = pfn_pte(pfn, prot);
>>   +   (void)*(volatile int*)v;
>>  if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
>>  pr_err("set_aliased_prot va update failed w/ lazy mode
>> %u\n", paravirt_get_lazy_mode());
>>  BUG();
>>
>> Is perhaps not the fix we are looking for, and every use of
>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>
>
> I think in most cases we know that page is mapped so hopefully this is the
> only site that we need to be careful about.

Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?

>
>>
>> The update_va_mapping hypercall is designed to emulate writing the pte
>> for v, with auditing applied.  As part of this, it does a pagewalk on v
>> to locate and map the l1.  During this walk, Xen it finds the l2 not
>> present, and fails the hypercall.  i.e. v is not reachable from the
>> current cr3.
>>
>> Reading the virtual address immediately before issuing the hypercall
>> causes Linux's memory faulting logic to fault in the l2.  This also
>> explains why vm_unmap_aliases() appears to fix the issue; it is likely
>> to fault in enough of the paging structure for v to be reachable.
>
>
> We've just touched this page (in write_ldt()) in this test so why would it
> not be mapped?

With my patches applied, the LDT is never written via any paravirt
hook -- I write it once (possibly implicitly using kzalloc/vzalloc)
before paravirt_alloc_ldt(), and write_ldt() is never called.  We
could even remove it write_ldt() :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Boris Ostrovsky

On 07/29/2015 03:03 PM, Andrew Cooper wrote:

On 29/07/15 15:43, Boris Ostrovsky wrote:

FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
  
 pte = pfn_pte(pfn, prot);
  
+   (void)*(volatile int*)v;

 if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
 pr_err("set_aliased_prot va update failed w/ lazy mode
%u\n", paravirt_get_lazy_mode());
 BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.


I think in most cases we know that page is mapped so hopefully this is 
the only site that we need to be careful about.




The update_va_mapping hypercall is designed to emulate writing the pte
for v, with auditing applied.  As part of this, it does a pagewalk on v
to locate and map the l1.  During this walk, Xen it finds the l2 not
present, and fails the hypercall.  i.e. v is not reachable from the
current cr3.

Reading the virtual address immediately before issuing the hypercall
causes Linux's memory faulting logic to fault in the l2.  This also
explains why vm_unmap_aliases() appears to fix the issue; it is likely
to fault in enough of the paging structure for v to be reachable.


We've just touched this page (in write_ldt()) in this test so why would 
it not be mapped?





One solution might be to use MMU_NORMAL_PT_UPDATE hypercall instead,
which take the physical address of pte to update.  This won't fail in
Xen if part of the paging structure is missing, and can be batched.


Yes, it does work. Thanks Andrew.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 29/07/15 15:43, Boris Ostrovsky wrote:
> FYI, I have got a repro now and am investigating.

Good and bad news.  This bug has nothing to do with LDTs themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
pte = pfn_pte(pfn, prot);
 
+   (void)*(volatile int*)v;
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) {
pr_err("set_aliased_prot va update failed w/ lazy mode
%u\n", paravirt_get_lazy_mode());
BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.

The update_va_mapping hypercall is designed to emulate writing the pte
for v, with auditing applied.  As part of this, it does a pagewalk on v
to locate and map the l1.  During this walk, Xen it finds the l2 not
present, and fails the hypercall.  i.e. v is not reachable from the
current cr3.

Reading the virtual address immediately before issuing the hypercall
causes Linux's memory faulting logic to fault in the l2.  This also
explains why vm_unmap_aliases() appears to fix the issue; it is likely
to fault in enough of the paging structure for v to be reachable.

One solution might be to use MMU_NORMAL_PT_UPDATE hypercall instead,
which take the physical address of pte to update.  This won't fail in
Xen if part of the paging structure is missing, and can be batched.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Boris Ostrovsky

On 07/29/2015 10:21 AM, Andrew Cooper wrote:

On 29/07/15 06:28, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
 wrote:

On 07/28/2015 08:47 PM, Andrew Cooper wrote:

On 29/07/2015 01:21, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
 wrote:

On 07/28/2015 01:07 PM, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
 wrote:

I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
xen_free_ldt() is attempting to nab back the pages which Xen still has
mapped as an LDT.


I just instrumented it with yet more LSL instructions.  I'm pretty
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.


Yes, I added some instrumentation to the hypervisor and we definitely
set
LDT to NULL before failing.

-boris

Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
getting incremented once on each CPU at the same time if both CPUs
fault in the same shadow LDT page at the same time?

Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
expected to have a type refcount of 2.


Similarly, what
keeps both CPUs from calling get_page_type at the same time and
therefore losing track of the page type reference count?

a cmpxchg() loop in the depths of __get_page_type().


I don't see why vmalloc or vm_unmap_aliases would have anything to do
with this, though.

So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
and the problem went away.

As far as I can tell, this affects TLB flushes but not unmaps.  That
means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
aliases but isn't involved in removing them from the page tables.
That must be why xen_alloc_ldt and xen_set_ldt work today.

So what does flushing the TLB have to do with anything?  The only
thing I can think of is that it might force some deferred hypercalls
out.  I can reproduce this easily on UP, so IPIs aren't involved.

The other odd thing is that it seems like this happens when clearing
the LDT and freeing the old one but not when setting the LDT and
freeing the old one.  This is plausibly related to the lazy mode in
effect at the time, but I have no evidence for that.

Two more data points:  Putting xen_flush_mc before and after the
SET_LDT multicall has no effect.  Putting flush_tlb_all() in
xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
exact same place does help.

FYI, I have got a repro now and am investigating.



To simplify your test case, this is sufficient for me to trigger this:


#define _GNU_SOURCE
#include 
#include 
#include 


int main()
{
int i;

struct user_desc desc = {
.entry_number= 0,
.base_addr   = 0,
.limit   = 10,
.seg_32bit   = 1,
.contents= 2, /* Code, not conforming */
.read_exec_only  = 0,
.limit_in_pages  = 0,
.seg_not_present = 0,
.useable = 0
};

for (i = 0; i < 500; i++)
syscall(SYS_modify_ldt, 0x11, &desc, sizeof(desc));
}


Run this program in a loop --- the error is triggered (again, for me), 
when it exits.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-29 Thread Andrew Cooper
On 29/07/15 06:28, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
>  wrote:
>> On 07/28/2015 08:47 PM, Andrew Cooper wrote:
>>> On 29/07/2015 01:21, Andy Lutomirski wrote:
 On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
  wrote:
> On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>>  wrote:
>>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
>>> xen_free_ldt() is attempting to nab back the pages which Xen still has
>>> mapped as an LDT.
>>>
>> I just instrumented it with yet more LSL instructions.  I'm pretty
>> sure that set_ldt really is clearing at least LDT entry zero.
>> Nonetheless the free_ldt call still oopses.
>>
> Yes, I added some instrumentation to the hypervisor and we definitely
> set
> LDT to NULL before failing.
>
> -boris
 Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
 getting incremented once on each CPU at the same time if both CPUs
 fault in the same shadow LDT page at the same time?
>>> Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
>>> expected to have a type refcount of 2.
>>>
 Similarly, what
 keeps both CPUs from calling get_page_type at the same time and
 therefore losing track of the page type reference count?
>>> a cmpxchg() loop in the depths of __get_page_type().
>>>
 I don't see why vmalloc or vm_unmap_aliases would have anything to do
 with this, though.
>>
>> So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
>> and the problem went away.
> As far as I can tell, this affects TLB flushes but not unmaps.  That
> means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
> aliases but isn't involved in removing them from the page tables.
> That must be why xen_alloc_ldt and xen_set_ldt work today.
>
> So what does flushing the TLB have to do with anything?  The only
> thing I can think of is that it might force some deferred hypercalls
> out.  I can reproduce this easily on UP, so IPIs aren't involved.
>
> The other odd thing is that it seems like this happens when clearing
> the LDT and freeing the old one but not when setting the LDT and
> freeing the old one.  This is plausibly related to the lazy mode in
> effect at the time, but I have no evidence for that.
>
> Two more data points:  Putting xen_flush_mc before and after the
> SET_LDT multicall has no effect.  Putting flush_tlb_all() in
> xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
> exact same place does help.

FYI, I have got a repro now and am investigating.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andy Lutomirski
On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
 wrote:
> On 07/28/2015 08:47 PM, Andrew Cooper wrote:
>>
>> On 29/07/2015 01:21, Andy Lutomirski wrote:
>>>
>>> On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
>>>  wrote:

 On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>
> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>  wrote:
>>
>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
>> xen_free_ldt() is attempting to nab back the pages which Xen still has
>> mapped as an LDT.
>>
> I just instrumented it with yet more LSL instructions.  I'm pretty
> sure that set_ldt really is clearing at least LDT entry zero.
> Nonetheless the free_ldt call still oopses.
>
 Yes, I added some instrumentation to the hypervisor and we definitely
 set
 LDT to NULL before failing.

 -boris
>>>
>>> Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
>>> getting incremented once on each CPU at the same time if both CPUs
>>> fault in the same shadow LDT page at the same time?
>>
>> Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
>> expected to have a type refcount of 2.
>>
>>> Similarly, what
>>> keeps both CPUs from calling get_page_type at the same time and
>>> therefore losing track of the page type reference count?
>>
>> a cmpxchg() loop in the depths of __get_page_type().
>>
>>> I don't see why vmalloc or vm_unmap_aliases would have anything to do
>>> with this, though.
>
>
> So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
> and the problem went away.

As far as I can tell, this affects TLB flushes but not unmaps.  That
means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
aliases but isn't involved in removing them from the page tables.
That must be why xen_alloc_ldt and xen_set_ldt work today.

So what does flushing the TLB have to do with anything?  The only
thing I can think of is that it might force some deferred hypercalls
out.  I can reproduce this easily on UP, so IPIs aren't involved.

The other odd thing is that it seems like this happens when clearing
the LDT and freeing the old one but not when setting the LDT and
freeing the old one.  This is plausibly related to the lazy mode in
effect at the time, but I have no evidence for that.

Two more data points:  Putting xen_flush_mc before and after the
SET_LDT multicall has no effect.  Putting flush_tlb_all() in
xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
exact same place does help.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andy Lutomirski
On Tue, Jul 28, 2015 at 8:01 PM, Boris Ostrovsky
 wrote:
> On 07/28/2015 08:47 PM, Andrew Cooper wrote:
>>
>> On 29/07/2015 01:21, Andy Lutomirski wrote:
>>>
>>> On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
>>>  wrote:

 On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>
> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>  wrote:
>>
>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
>> xen_free_ldt() is attempting to nab back the pages which Xen still has
>> mapped as an LDT.
>>
> I just instrumented it with yet more LSL instructions.  I'm pretty
> sure that set_ldt really is clearing at least LDT entry zero.
> Nonetheless the free_ldt call still oopses.
>
 Yes, I added some instrumentation to the hypervisor and we definitely
 set
 LDT to NULL before failing.

 -boris
>>>
>>> Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
>>> getting incremented once on each CPU at the same time if both CPUs
>>> fault in the same shadow LDT page at the same time?
>>
>> Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
>> expected to have a type refcount of 2.
>>
>>> Similarly, what
>>> keeps both CPUs from calling get_page_type at the same time and
>>> therefore losing track of the page type reference count?
>>
>> a cmpxchg() loop in the depths of __get_page_type().
>>
>>> I don't see why vmalloc or vm_unmap_aliases would have anything to do
>>> with this, though.
>
>
> So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
> and the problem went away.
>
> I also saw this warning, BTW:
>
> [  178.686542] [ cut here ]
> [  178.686554] WARNING: CPU: 0 PID: 16440 at
> ./arch/x86/include/asm/mmu_context.h:96 load_mm_ldt+0x70/0x76()
> [  178.686558] DEBUG_LOCKS_WARN_ON(!irqs_disabled())

Whoops!  That should be checking preemptible(), not irqs_disabled().

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 08:47 PM, Andrew Cooper wrote:

On 29/07/2015 01:21, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
 wrote:

On 07/28/2015 01:07 PM, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
 wrote:

I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
xen_free_ldt() is attempting to nab back the pages which Xen still has
mapped as an LDT.


I just instrumented it with yet more LSL instructions.  I'm pretty
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.


Yes, I added some instrumentation to the hypervisor and we definitely set
LDT to NULL before failing.

-boris

Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
getting incremented once on each CPU at the same time if both CPUs
fault in the same shadow LDT page at the same time?

Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
expected to have a type refcount of 2.


Similarly, what
keeps both CPUs from calling get_page_type at the same time and
therefore losing track of the page type reference count?

a cmpxchg() loop in the depths of __get_page_type().


I don't see why vmalloc or vm_unmap_aliases would have anything to do
with this, though.


So just for kicks I made lazy_max_pages() return 0 to free vmaps 
immediately and the problem went away.


I also saw this warning, BTW:

[  178.686542] [ cut here ]
[  178.686554] WARNING: CPU: 0 PID: 16440 at 
./arch/x86/include/asm/mmu_context.h:96 load_mm_ldt+0x70/0x76()

[  178.686558] DEBUG_LOCKS_WARN_ON(!irqs_disabled())
[  178.686561] Modules linked in:
[  178.686566] CPU: 0 PID: 16440 Comm: kworker/u2:1 Not tainted 
4.1.0-32b #80
[  178.686570]    ea4e3df8 c1670e71  ea4e3e28 
c106ac1e c1814e43
[  178.686577]  ea4e3e54 4038 c181bc2c 0060 c166fd3b c166fd3b 
e6705dc0 
[  178.686583]  ea665000 ea4e3e40 c106ad03 0009 ea4e3e38 c1814e43 
ea4e3e54 ea4e3e5c

[  178.686589] Call Trace:
[  178.686594]  [] dump_stack+0x41/0x52
[  178.686598]  [] warn_slowpath_common+0x8e/0xd0
[  178.686602]  [] ? load_mm_ldt+0x70/0x76
[  178.686609]  [] ? load_mm_ldt+0x70/0x76
[  178.686612]  [] warn_slowpath_fmt+0x33/0x40
[  178.686615]  [] load_mm_ldt+0x70/0x76
[  178.686619]  [] flush_old_exec+0x6f9/0x750
[  178.686626]  [] load_elf_binary+0x2b4/0x1040
[  178.686630]  [] ? page_address+0x15/0xf0
[  178.686633]  [] ? kunmap+0x1f/0x70
[  178.686636]  [] search_binary_handler+0x89/0x1c0
[  178.686639]  [] do_execveat_common+0x4c0/0x620
[  178.686653]  [] ? kmemdup+0x33/0x50
[  178.686659]  [] ? __call_rcu.constprop.66+0xbb/0x220
[  178.686673]  [] do_execve+0x24/0x30
[  178.686679]  [] call_usermodehelper+0xde/0x120
[  178.686684]  [] ret_from_kernel_thread+0x21/0x30
[  178.686696]  [] ? __request_module+0x240/0x240
[  178.686701] ---[ end trace 8b3f5341f50e6c88 ]---


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andrew Cooper
On 29/07/2015 01:21, Andy Lutomirski wrote:
> On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
>  wrote:
>> On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>>> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>>>  wrote:
 I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
 xen_free_ldt() is attempting to nab back the pages which Xen still has
 mapped as an LDT.

>>> I just instrumented it with yet more LSL instructions.  I'm pretty
>>> sure that set_ldt really is clearing at least LDT entry zero.
>>> Nonetheless the free_ldt call still oopses.
>>>
>> Yes, I added some instrumentation to the hypervisor and we definitely set
>> LDT to NULL before failing.
>>
>> -boris
> Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
> getting incremented once on each CPU at the same time if both CPUs
> fault in the same shadow LDT page at the same time?

Nothing, but that is fine.  If a page is in use in two vcpus LDTs, it is
expected to have a type refcount of 2.

> Similarly, what
> keeps both CPUs from calling get_page_type at the same time and
> therefore losing track of the page type reference count?

a cmpxchg() loop in the depths of __get_page_type().

>
> I don't see why vmalloc or vm_unmap_aliases would have anything to do
> with this, though.

Nor me.  I have compiled your branch and will see about reproducing the
issue myself tomorrow.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andy Lutomirski
On Tue, Jul 28, 2015 at 10:10 AM, Boris Ostrovsky
 wrote:
> On 07/28/2015 01:07 PM, Andy Lutomirski wrote:
>>
>> On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
>>  wrote:
>>>
>>> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
>>> xen_free_ldt() is attempting to nab back the pages which Xen still has
>>> mapped as an LDT.
>>>
>> I just instrumented it with yet more LSL instructions.  I'm pretty
>> sure that set_ldt really is clearing at least LDT entry zero.
>> Nonetheless the free_ldt call still oopses.
>>
>
> Yes, I added some instrumentation to the hypervisor and we definitely set
> LDT to NULL before failing.
>
> -boris

Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
getting incremented once on each CPU at the same time if both CPUs
fault in the same shadow LDT page at the same time?  Similarly, what
keeps both CPUs from calling get_page_type at the same time and
therefore losing track of the page type reference count?

I don't see why vmalloc or vm_unmap_aliases would have anything to do
with this, though.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 01:07 PM, Andy Lutomirski wrote:

On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
 wrote:

I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
xen_free_ldt() is attempting to nab back the pages which Xen still has
mapped as an LDT.


I just instrumented it with yet more LSL instructions.  I'm pretty
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.



Yes, I added some instrumentation to the hypervisor and we definitely 
set LDT to NULL before failing.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andy Lutomirski
On Tue, Jul 28, 2015 at 9:30 AM, Andrew Cooper
 wrote:
> I suspect that the set_ldt(NULL, 0) call hasn't reached Xen before
> xen_free_ldt() is attempting to nab back the pages which Xen still has
> mapped as an LDT.
>

I just instrumented it with yet more LSL instructions.  I'm pretty
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.

--Andy

> ~Andrew



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andrew Cooper
On 28/07/15 16:43, Andy Lutomirski wrote:
>
 After forward-porting my virtio patches, I got this thing to run on
 Xen.  After several tries, I got:

 [   53.985707] [ cut here ]
 [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
 [   53.986677] invalid opcode:  [#1] SMP
 [   53.986677] Modules linked in:
 [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
 [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
 04/01/2014
 [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
 [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
 [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
 [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
 [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
 [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
 [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
 [   53.986677] Stack:
 [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
 0b4a 0200
 [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
 c1062310 c01861c0
 [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
 c2373a80 
 [   53.986677] Call Trace:
 [   53.986677]  [] xen_free_ldt+0x2d/0x40
 [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
 [   53.986677]  [] destroy_context+0x25/0x40
 [   53.986677]  [] __mmdrop+0x1e/0xc0
 [   53.986677]  [] finish_task_switch+0xd8/0x1a0
 [   53.986677]  [] __schedule+0x316/0x950
 [   53.986677]  [] schedule+0x26/0x70
 [   53.986677]  [] do_wait+0x1b3/0x200
 [   53.986677]  [] SyS_waitpid+0x67/0xd0
 [   53.986677]  [] ? task_stopped_code+0x50/0x50
 [   53.986677]  [] syscall_call+0x7/0x7
 [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
 c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
 89 e5
 [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 
 0069:c0875e74
 [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

 Is that the error you're seeing?

 If I change xen_free_ldt to:

 static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
 {
 const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 int i;

 vm_unmap_aliases();
 xen_mc_flush();

 for(i = 0; i < entries; i += entries_per_page)
 set_aliased_prot(ldt + i, PAGE_KERNEL);
 }

 then it works.  I don't know why this makes a difference.
 (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
 doesn't.)

>>> That fix makes sense if there's some way that the vmalloc area we're
>>> freeing has an extra alias somewhere, which is very much possible.  On
>>> the other hand, I don't see how this happens without first doing an
>>> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
>>> expected that to blow up and/or result in test case failures.
>>>
>>> But I'm still confused, because it seems like Xen will never populate
>>> the actual (hidden) LDT mapping unless the pages backing it are
>>> unaliased and well-formed, which make me wonder why this stuff ever
>>> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
>>> in segfaults?
>>>
>>> The semantics seem to be very odd.  xen_free_ldt with an aliased
>>> address might fail (and OOPS), but actual access to the LDT with an
>>> aliased address page faults.
>>>
>>> Also, using kzalloc for everything fixes the problem, which suggests
>>> that there really is something to my theory that the problem involves
>>> unexpected aliases.
>> Xen does lazily populate the LDT frames.  The first time a page is ever
>> referenced via the LDT, Xen will perform a typechange.
>>
>> Under Xen, guest mappings are reference counted with both a plain
>> reference, and a type count.  Types of writeable, segdec and pagetables
>> are mutually exclusive.  This prevents the guest from having writeable
>> mappings of interesting datastructures, but readable mappings are fine.
>> Typechanges may only occur when the type reference count is 0.
> Makes sense.
>
>> At the point of the typechange, no writeable mappings of the frame may
>> exist (and it must not be referenced by a L2 or greater page directory),
>> or the typechange will fail.  Additionally the descriptors are audited
>> at this point, so if Xen objects to any of the descriptors in the same
>> page, the typechange will also fail.
>>
>> If the typechange fails, the pagefault gets propagated back to the guest.
> The part I don't understand is that I didn't observe any page faults.
>
>> The corollary to this is that, for xe

Re: [Xen-devel] [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 11:23 AM, Andrew Cooper wrote:

On 28/07/15 15:50, Boris Ostrovsky wrote:

On 07/28/2015 10:35 AM, Andrew Cooper wrote:

On 28/07/15 15:05, Boris Ostrovsky wrote:

On 07/28/2015 06:29 AM, Andrew Cooper wrote:

After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX:
8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP:
c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4:
00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001
cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80 
c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e
c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74
31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
   const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
   int i;

   vm_unmap_aliases();
   xen_mc_flush();

   for(i = 0; i < entries; i += entries_per_page)
   set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)


That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much
possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kzalloc for everything fixes the problem, which suggests
that there really is something to my theory that the problem involves
unexpected aliases.

Xen does lazily populate the LDT frames.  The first time a page is
ever
referenced via the LDT, Xen will perform a typechange.

Under Xen, guest mappings are reference counted with both a plain
reference, and a type count.  Types of writeable, segdec and
pagetables
are mutually exclusive.  This prevents the guest from having writeable
mappings of interesting datastructures, but readable mappings are
fine.
Typechanges may only occur when the type reference count is 0.

At the point of the typechange, no writeable mappings of the frame may
exist (and it must not be referenced by a L2 or greater page
directory),
or the typechange will fail.  Additionally the descriptors are audited
at this point, so if Xen objects to any of the descriptors in the same
page, the typechange will also fail.

If the typechange fails, the pagefault gets propagated back to the
guest.

The corollary to this is that, for xen_free_ldt() to create writeable
mappings again, a typechange back to writeable is needed.  This will
fail if the LDT frames are still referenced in any vcpus LDT.

It would be interesting to know which of the two BUG()s in
set_aliased_prot() tripped.

The first one (i.e. not the alias)


In which case the page in question 

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andy Lutomirski
On Jul 28, 2015 3:30 AM, "Andrew Cooper"  wrote:
>
> On 28/07/15 04:16, Andy Lutomirski wrote:
> > On Mon, Jul 27, 2015 at 7:20 PM, Andy Lutomirski  
> > wrote:
> >> On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
> >>  wrote:
> >>> On 07/27/2015 11:53 AM, Andy Lutomirski wrote:
>  On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
>   wrote:
> > On 07/25/2015 01:36 AM, Andy Lutomirski wrote:
> >> Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
> >> probably a good general attack surface reduction, and it replaces some
> >> scary code with IMO less scary code.
> >>
> >> Also, servers and embedded systems should probably turn off modify_ldt.
> >> This makes that possible.
> >>
> >> Xen people, can you take a look at this?
> >>
> >> Willy and Kees: I left the config option alone.  The -tiny people will
> >> like it, and we can always add a sysctl of some sort later.
> >>
> >> Changes from v3:
> >>- Hopefully fixed Xen.
> >
> > 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)
> >
> >>- Fixed 32-bit test case on 32-bit native kernel.
> >
> > I am not sure I see what changed.
>  I misplaced the fix in the wrong git commit, so I failed to sent it.
>  Oops.
> 
>  I just sent v4.1 of patch 3.  Can you try that?
> >>>
> >>>
> >>> I am hitting BUG() in Xen code (returning from a hypercall) when freeing 
> >>> LDT
> >>> in destroy_context(). Interestingly though when I run the test in the
> >>> debugger I get SIGILL (just like before) but no BUG().
> >>>
> >>> Let me get back to you on that later today.
> >>>
> >>>
> >> After forward-porting my virtio patches, I got this thing to run on
> >> Xen.  After several tries, I got:
> >>
> >> [   53.985707] [ cut here ]
> >> [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
> >> [   53.986677] invalid opcode:  [#1] SMP
> >> [   53.986677] Modules linked in:
> >> [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
> >> [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
> >> 04/01/2014
> >> [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
> >> [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
> >> [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
> >> [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
> >> [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
> >> [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> >> [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
> >> [   53.986677] Stack:
> >> [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
> >> 0b4a 0200
> >> [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
> >> c1062310 c01861c0
> >> [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
> >> c2373a80 
> >> [   53.986677] Call Trace:
> >> [   53.986677]  [] xen_free_ldt+0x2d/0x40
> >> [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
> >> [   53.986677]  [] destroy_context+0x25/0x40
> >> [   53.986677]  [] __mmdrop+0x1e/0xc0
> >> [   53.986677]  [] finish_task_switch+0xd8/0x1a0
> >> [   53.986677]  [] __schedule+0x316/0x950
> >> [   53.986677]  [] schedule+0x26/0x70
> >> [   53.986677]  [] do_wait+0x1b3/0x200
> >> [   53.986677]  [] SyS_waitpid+0x67/0xd0
> >> [   53.986677]  [] ? task_stopped_code+0x50/0x50
> >> [   53.986677]  [] syscall_call+0x7/0x7
> >> [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
> >> 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
> >> c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
> >> 89 e5
> >> [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 
> >> 0069:c0875e74
> >> [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
> >>
> >> Is that the error you're seeing?
> >>
> >> If I change xen_free_ldt to:
> >>
> >> static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
> >> {
> >> const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
> >> int i;
> >>
> >> vm_unmap_aliases();
> >> xen_mc_flush();
> >>
> >> for(i = 0; i < entries; i += entries_per_page)
> >> set_aliased_prot(ldt + i, PAGE_KERNEL);
> >> }
> >>
> >> then it works.  I don't know why this makes a difference.
> >> (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
> >> doesn't.)
> >>
> > That fix makes sense if there's some way that the vmalloc area we're
> > freeing has an extra alias somewhere, which is very much possible.  On
> > the other hand, I don't see how this happens without first doing an
> > MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
> > expected that to blow up and/or result in test case failures.
> >
> > But I'm still confused, because it seems like Xen 

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 11:15 AM, Konrad Rzeszutek Wilk wrote:

On Tue, Jul 28, 2015 at 10:50:39AM -0400, Boris Ostrovsky wrote:

On 07/28/2015 10:35 AM, Andrew Cooper wrote:

On 28/07/15 15:05, Boris Ostrovsky wrote:

On 07/28/2015 06:29 AM, Andrew Cooper wrote:

After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
  const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
  int i;

  vm_unmap_aliases();
  xen_mc_flush();

  for(i = 0; i < entries; i += entries_per_page)
  set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)


That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kzalloc for everything fixes the problem, which suggests
that there really is something to my theory that the problem involves
unexpected aliases.

Xen does lazily populate the LDT frames.  The first time a page is ever
referenced via the LDT, Xen will perform a typechange.

Under Xen, guest mappings are reference counted with both a plain
reference, and a type count.  Types of writeable, segdec and pagetables
are mutually exclusive.  This prevents the guest from having writeable
mappings of interesting datastructures, but readable mappings are fine.
Typechanges may only occur when the type reference count is 0.

At the point of the typechange, no writeable mappings of the frame may
exist (and it must not be referenced by a L2 or greater page directory),
or the typechange will fail.  Additionally the descriptors are audited
at this point, so if Xen objects to any of the descriptors in the same
page, the typechange will also fail.

If the typechange fails, the pagefault gets propagated back to the
guest.

The corollary to this is that, for xen_free_ldt() to create writeable
mappings again, a typechange back to writeable is needed.  This will
fail if the LDT frames are still referenced in any vcpus LDT.

It would be interesting to know which of the two BUG()s in
set_aliased_prot() tripped.

The first one (i.e. not the alias)


In which c

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andrew Cooper
On 28/07/15 15:50, Boris Ostrovsky wrote:
> On 07/28/2015 10:35 AM, Andrew Cooper wrote:
>> On 28/07/15 15:05, Boris Ostrovsky wrote:
>>> On 07/28/2015 06:29 AM, Andrew Cooper wrote:
>> After forward-porting my virtio patches, I got this thing to run on
>> Xen.  After several tries, I got:
>>
>> [   53.985707] [ cut here ]
>> [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
>> [   53.986677] invalid opcode:  [#1] SMP
>> [   53.986677] Modules linked in:
>> [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
>> [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> 1996),
>> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
>> 04/01/2014
>> [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
>> [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
>> [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
>> [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX:
>> 8000
>> [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP:
>> c0875e74
>> [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4:
>> 00042660
>> [   53.986677] Stack:
>> [   53.986677]  8000 0672e063 21c0 cc3d1000 0001
>> cc3d2000
>> 0b4a 0200
>> [   53.986677]  c0875ea8 c105312d c2317940 c2373a80 
>> c0875eb4
>> c1062310 c01861c0
>> [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e
>> c7007a00
>> c2373a80 
>> [   53.986677] Call Trace:
>> [   53.986677]  [] xen_free_ldt+0x2d/0x40
>> [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
>> [   53.986677]  [] destroy_context+0x25/0x40
>> [   53.986677]  [] __mmdrop+0x1e/0xc0
>> [   53.986677]  [] finish_task_switch+0xd8/0x1a0
>> [   53.986677]  [] __schedule+0x316/0x950
>> [   53.986677]  [] schedule+0x26/0x70
>> [   53.986677]  [] do_wait+0x1b3/0x200
>> [   53.986677]  [] SyS_waitpid+0x67/0xd0
>> [   53.986677]  [] ? task_stopped_code+0x50/0x50
>> [   53.986677]  [] syscall_call+0x7/0x7
>> [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
>> 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
>> c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74
>> 31 55
>> 89 e5
>> [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
>> 0069:c0875e74
>> [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
>>
>> Is that the error you're seeing?
>>
>> If I change xen_free_ldt to:
>>
>> static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
>> {
>>   const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
>>   int i;
>>
>>   vm_unmap_aliases();
>>   xen_mc_flush();
>>
>>   for(i = 0; i < entries; i += entries_per_page)
>>   set_aliased_prot(ldt + i, PAGE_KERNEL);
>> }
>>
>> then it works.  I don't know why this makes a difference.
>> (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
>> doesn't.)
>>
> That fix makes sense if there's some way that the vmalloc area we're
> freeing has an extra alias somewhere, which is very much
> possible.  On
> the other hand, I don't see how this happens without first doing an
> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
> expected that to blow up and/or result in test case failures.
>
> But I'm still confused, because it seems like Xen will never populate
> the actual (hidden) LDT mapping unless the pages backing it are
> unaliased and well-formed, which make me wonder why this stuff ever
> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
> in segfaults?
>
> The semantics seem to be very odd.  xen_free_ldt with an aliased
> address might fail (and OOPS), but actual access to the LDT with an
> aliased address page faults.
>
> Also, using kzalloc for everything fixes the problem, which suggests
> that there really is something to my theory that the problem involves
> unexpected aliases.
 Xen does lazily populate the LDT frames.  The first time a page is
 ever
 referenced via the LDT, Xen will perform a typechange.

 Under Xen, guest mappings are reference counted with both a plain
 reference, and a type count.  Types of writeable, segdec and
 pagetables
 are mutually exclusive.  This prevents the guest from having writeable
 mappings of interesting datastructures, but readable mappings are
 fine.
 Typechanges may only occur when the type reference count is 0.

 At the point of the typechange, no writeable mappings of the frame may
 exist (and it must not be referenced by a L2 or grea

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Konrad Rzeszutek Wilk
On Tue, Jul 28, 2015 at 10:50:39AM -0400, Boris Ostrovsky wrote:
> On 07/28/2015 10:35 AM, Andrew Cooper wrote:
> >On 28/07/15 15:05, Boris Ostrovsky wrote:
> >>On 07/28/2015 06:29 AM, Andrew Cooper wrote:
> >After forward-porting my virtio patches, I got this thing to run on
> >Xen.  After several tries, I got:
> >
> >[   53.985707] [ cut here ]
> >[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
> >[   53.986677] invalid opcode:  [#1] SMP
> >[   53.986677] Modules linked in:
> >[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
> >[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
> >04/01/2014
> >[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
> >[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
> >[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
> >[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
> >[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
> >[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> >[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
> >[   53.986677] Stack:
> >[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
> >0b4a 0200
> >[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
> >c1062310 c01861c0
> >[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
> >c2373a80 
> >[   53.986677] Call Trace:
> >[   53.986677]  [] xen_free_ldt+0x2d/0x40
> >[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
> >[   53.986677]  [] destroy_context+0x25/0x40
> >[   53.986677]  [] __mmdrop+0x1e/0xc0
> >[   53.986677]  [] finish_task_switch+0xd8/0x1a0
> >[   53.986677]  [] __schedule+0x316/0x950
> >[   53.986677]  [] schedule+0x26/0x70
> >[   53.986677]  [] do_wait+0x1b3/0x200
> >[   53.986677]  [] SyS_waitpid+0x67/0xd0
> >[   53.986677]  [] ? task_stopped_code+0x50/0x50
> >[   53.986677]  [] syscall_call+0x7/0x7
> >[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
> >4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
> >c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
> >89 e5
> >[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
> >0069:c0875e74
> >[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
> >
> >Is that the error you're seeing?
> >
> >If I change xen_free_ldt to:
> >
> >static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
> >{
> >  const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
> >  int i;
> >
> >  vm_unmap_aliases();
> >  xen_mc_flush();
> >
> >  for(i = 0; i < entries; i += entries_per_page)
> >  set_aliased_prot(ldt + i, PAGE_KERNEL);
> >}
> >
> >then it works.  I don't know why this makes a difference.
> >(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
> >doesn't.)
> >
> That fix makes sense if there's some way that the vmalloc area we're
> freeing has an extra alias somewhere, which is very much possible.  On
> the other hand, I don't see how this happens without first doing an
> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
> expected that to blow up and/or result in test case failures.
> 
> But I'm still confused, because it seems like Xen will never populate
> the actual (hidden) LDT mapping unless the pages backing it are
> unaliased and well-formed, which make me wonder why this stuff ever
> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
> in segfaults?
> 
> The semantics seem to be very odd.  xen_free_ldt with an aliased
> address might fail (and OOPS), but actual access to the LDT with an
> aliased address page faults.
> 
> Also, using kzalloc for everything fixes the problem, which suggests
> that there really is something to my theory that the problem involves
> unexpected aliases.
> >>>Xen does lazily populate the LDT frames.  The first time a page is ever
> >>>referenced via the LDT, Xen will perform a typechange.
> >>>
> >>>Under Xen, guest mappings are reference counted with both a plain
> >>>reference, and a type count.  Types of writeable, segdec and pagetables
> >>>are mutually exclusive.  This prevents the guest from having writeable
> >>>mappings of interesting datastructures, but readable mappings are fine.
> >>>Typechanges may only occur when the type reference count is 0.
> >>>
> >>>At the point of the typechange, no writeable mappings of the frame may
> >>>exist (and it must not be referenced by a L2 or greater page directory),
> >>>or the typechan

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 10:35 AM, Andrew Cooper wrote:

On 28/07/15 15:05, Boris Ostrovsky wrote:

On 07/28/2015 06:29 AM, Andrew Cooper wrote:

After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
  const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
  int i;

  vm_unmap_aliases();
  xen_mc_flush();

  for(i = 0; i < entries; i += entries_per_page)
  set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)


That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kzalloc for everything fixes the problem, which suggests
that there really is something to my theory that the problem involves
unexpected aliases.

Xen does lazily populate the LDT frames.  The first time a page is ever
referenced via the LDT, Xen will perform a typechange.

Under Xen, guest mappings are reference counted with both a plain
reference, and a type count.  Types of writeable, segdec and pagetables
are mutually exclusive.  This prevents the guest from having writeable
mappings of interesting datastructures, but readable mappings are fine.
Typechanges may only occur when the type reference count is 0.

At the point of the typechange, no writeable mappings of the frame may
exist (and it must not be referenced by a L2 or greater page directory),
or the typechange will fail.  Additionally the descriptors are audited
at this point, so if Xen objects to any of the descriptors in the same
page, the typechange will also fail.

If the typechange fails, the pagefault gets propagated back to the
guest.

The corollary to this is that, for xen_free_ldt() to create writeable
mappings again, a typechange back to writeable is needed.  This will
fail if the LDT frames are still referenced in any vcpus LDT.

It would be interesting to know which of the two BUG()s in
set_aliased_prot() tripped.

The first one (i.e. not the alias)


In which case the page in question is still referenced in an LDT
(perhaps on a different vcpu)


The problem is reproducible on a 

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andrew Cooper
On 28/07/15 15:05, Boris Ostrovsky wrote:
> On 07/28/2015 06:29 AM, Andrew Cooper wrote:
>>
 After forward-porting my virtio patches, I got this thing to run on
 Xen.  After several tries, I got:

 [   53.985707] [ cut here ]
 [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
 [   53.986677] invalid opcode:  [#1] SMP
 [   53.986677] Modules linked in:
 [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
 [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
 04/01/2014
 [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
 [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
 [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
 [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
 [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
 [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
 [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
 [   53.986677] Stack:
 [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
 0b4a 0200
 [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
 c1062310 c01861c0
 [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
 c2373a80 
 [   53.986677] Call Trace:
 [   53.986677]  [] xen_free_ldt+0x2d/0x40
 [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
 [   53.986677]  [] destroy_context+0x25/0x40
 [   53.986677]  [] __mmdrop+0x1e/0xc0
 [   53.986677]  [] finish_task_switch+0xd8/0x1a0
 [   53.986677]  [] __schedule+0x316/0x950
 [   53.986677]  [] schedule+0x26/0x70
 [   53.986677]  [] do_wait+0x1b3/0x200
 [   53.986677]  [] SyS_waitpid+0x67/0xd0
 [   53.986677]  [] ? task_stopped_code+0x50/0x50
 [   53.986677]  [] syscall_call+0x7/0x7
 [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
 c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
 89 e5
 [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP
 0069:c0875e74
 [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

 Is that the error you're seeing?

 If I change xen_free_ldt to:

 static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
 {
  const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
  int i;

  vm_unmap_aliases();
  xen_mc_flush();

  for(i = 0; i < entries; i += entries_per_page)
  set_aliased_prot(ldt + i, PAGE_KERNEL);
 }

 then it works.  I don't know why this makes a difference.
 (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
 doesn't.)

>>> That fix makes sense if there's some way that the vmalloc area we're
>>> freeing has an extra alias somewhere, which is very much possible.  On
>>> the other hand, I don't see how this happens without first doing an
>>> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
>>> expected that to blow up and/or result in test case failures.
>>>
>>> But I'm still confused, because it seems like Xen will never populate
>>> the actual (hidden) LDT mapping unless the pages backing it are
>>> unaliased and well-formed, which make me wonder why this stuff ever
>>> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
>>> in segfaults?
>>>
>>> The semantics seem to be very odd.  xen_free_ldt with an aliased
>>> address might fail (and OOPS), but actual access to the LDT with an
>>> aliased address page faults.
>>>
>>> Also, using kzalloc for everything fixes the problem, which suggests
>>> that there really is something to my theory that the problem involves
>>> unexpected aliases.
>> Xen does lazily populate the LDT frames.  The first time a page is ever
>> referenced via the LDT, Xen will perform a typechange.
>>
>> Under Xen, guest mappings are reference counted with both a plain
>> reference, and a type count.  Types of writeable, segdec and pagetables
>> are mutually exclusive.  This prevents the guest from having writeable
>> mappings of interesting datastructures, but readable mappings are fine.
>> Typechanges may only occur when the type reference count is 0.
>>
>> At the point of the typechange, no writeable mappings of the frame may
>> exist (and it must not be referenced by a L2 or greater page directory),
>> or the typechange will fail.  Additionally the descriptors are audited
>> at this point, so if Xen objects to any of the descriptors in the same
>> page, the typechange will also fail.
>>
>> If the typechange fails, the pagefault gets propagated back to the
>> guest.
>>
>> The corollary to this is that, for xen_free_ldt() to create writea

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Boris Ostrovsky

On 07/28/2015 06:29 AM, Andrew Cooper wrote:



After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
 const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 int i;

 vm_unmap_aliases();
 xen_mc_flush();

 for(i = 0; i < entries; i += entries_per_page)
 set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)


That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kzalloc for everything fixes the problem, which suggests
that there really is something to my theory that the problem involves
unexpected aliases.

Xen does lazily populate the LDT frames.  The first time a page is ever
referenced via the LDT, Xen will perform a typechange.

Under Xen, guest mappings are reference counted with both a plain
reference, and a type count.  Types of writeable, segdec and pagetables
are mutually exclusive.  This prevents the guest from having writeable
mappings of interesting datastructures, but readable mappings are fine.
Typechanges may only occur when the type reference count is 0.

At the point of the typechange, no writeable mappings of the frame may
exist (and it must not be referenced by a L2 or greater page directory),
or the typechange will fail.  Additionally the descriptors are audited
at this point, so if Xen objects to any of the descriptors in the same
page, the typechange will also fail.

If the typechange fails, the pagefault gets propagated back to the guest.

The corollary to this is that, for xen_free_ldt() to create writeable
mappings again, a typechange back to writeable is needed.  This will
fail if the LDT frames are still referenced in any vcpus LDT.

It would be interesting to know which of the two BUG()s in
set_aliased_prot() tripped.


The first one (i.e. not the alias)

-boris


If writeable aliases did exist then
xen_alloc_ldt() could indeed be insufficient to make the frames usable
as an LDT, but xen_free_ldt() wouldn't fail when trying to recreate the
writeable mappings.

~Andrew


--
T

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-28 Thread Andrew Cooper
On 28/07/15 04:16, Andy Lutomirski wrote:
> On Mon, Jul 27, 2015 at 7:20 PM, Andy Lutomirski  wrote:
>> On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
>>  wrote:
>>> On 07/27/2015 11:53 AM, Andy Lutomirski wrote:
 On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
  wrote:
> On 07/25/2015 01:36 AM, Andy Lutomirski wrote:
>> Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
>> probably a good general attack surface reduction, and it replaces some
>> scary code with IMO less scary code.
>>
>> Also, servers and embedded systems should probably turn off modify_ldt.
>> This makes that possible.
>>
>> Xen people, can you take a look at this?
>>
>> Willy and Kees: I left the config option alone.  The -tiny people will
>> like it, and we can always add a sysctl of some sort later.
>>
>> Changes from v3:
>>- Hopefully fixed Xen.
>
> 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)
>
>>- Fixed 32-bit test case on 32-bit native kernel.
>
> I am not sure I see what changed.
 I misplaced the fix in the wrong git commit, so I failed to sent it.
 Oops.

 I just sent v4.1 of patch 3.  Can you try that?
>>>
>>>
>>> I am hitting BUG() in Xen code (returning from a hypercall) when freeing LDT
>>> in destroy_context(). Interestingly though when I run the test in the
>>> debugger I get SIGILL (just like before) but no BUG().
>>>
>>> Let me get back to you on that later today.
>>>
>>>
>> After forward-porting my virtio patches, I got this thing to run on
>> Xen.  After several tries, I got:
>>
>> [   53.985707] [ cut here ]
>> [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
>> [   53.986677] invalid opcode:  [#1] SMP
>> [   53.986677] Modules linked in:
>> [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
>> [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
>> 04/01/2014
>> [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
>> [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
>> [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
>> [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
>> [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
>> [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
>> [   53.986677] Stack:
>> [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
>> 0b4a 0200
>> [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
>> c1062310 c01861c0
>> [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
>> c2373a80 
>> [   53.986677] Call Trace:
>> [   53.986677]  [] xen_free_ldt+0x2d/0x40
>> [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
>> [   53.986677]  [] destroy_context+0x25/0x40
>> [   53.986677]  [] __mmdrop+0x1e/0xc0
>> [   53.986677]  [] finish_task_switch+0xd8/0x1a0
>> [   53.986677]  [] __schedule+0x316/0x950
>> [   53.986677]  [] schedule+0x26/0x70
>> [   53.986677]  [] do_wait+0x1b3/0x200
>> [   53.986677]  [] SyS_waitpid+0x67/0xd0
>> [   53.986677]  [] ? task_stopped_code+0x50/0x50
>> [   53.986677]  [] syscall_call+0x7/0x7
>> [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
>> 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
>> c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
>> 89 e5
>> [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 
>> 0069:c0875e74
>> [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
>>
>> Is that the error you're seeing?
>>
>> If I change xen_free_ldt to:
>>
>> static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
>> {
>> const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
>> int i;
>>
>> vm_unmap_aliases();
>> xen_mc_flush();
>>
>> for(i = 0; i < entries; i += entries_per_page)
>> set_aliased_prot(ldt + i, PAGE_KERNEL);
>> }
>>
>> then it works.  I don't know why this makes a difference.
>> (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
>> doesn't.)
>>
> That fix makes sense if there's some way that the vmalloc area we're
> freeing has an extra alias somewhere, which is very much possible.  On
> the other hand, I don't see how this happens without first doing an
> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
> expected that to blow up and/or result in test case failures.
>
> But I'm still confused, because it seems like Xen will never populate
> the actual (hidden) LDT mapping unless the pages backing it are
> unaliased and well-formed, which make me wonder why this stuff ever
> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
> in segfaults?
>
> The semantics seem to be very odd.  xen_f

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Boris Ostrovsky

On 07/27/2015 11:16 PM, Andy Lutomirski wrote:

On Mon, Jul 27, 2015 at 7:20 PM, Andy Lutomirski  wrote:

On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
 wrote:

On 07/27/2015 11:53 AM, Andy Lutomirski wrote:

On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
 wrote:

On 07/25/2015 01:36 AM, Andy Lutomirski wrote:

Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
probably a good general attack surface reduction, and it replaces some
scary code with IMO less scary code.

Also, servers and embedded systems should probably turn off modify_ldt.
This makes that possible.

Xen people, can you take a look at this?

Willy and Kees: I left the config option alone.  The -tiny people will
like it, and we can always add a sysctl of some sort later.

Changes from v3:
- Hopefully fixed Xen.


32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)


- Fixed 32-bit test case on 32-bit native kernel.


I am not sure I see what changed.

I misplaced the fix in the wrong git commit, so I failed to sent it.
Oops.

I just sent v4.1 of patch 3.  Can you try that?



I am hitting BUG() in Xen code (returning from a hypercall) when freeing LDT
in destroy_context(). Interestingly though when I run the test in the
debugger I get SIGILL (just like before) but no BUG().

Let me get back to you on that later today.



After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
 const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
 int i;

 vm_unmap_aliases();
 xen_mc_flush();

 for(i = 0; i < entries; i += entries_per_page)
 set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)


That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kzalloc for everything fixes the problem, which suggests
that there really is something to my theory that the problem involves
unexpected aliases.


Yes, this is as far as I got as well (I didn't try unaliasing but now 
that you found it -- it does indeed w

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Andy Lutomirski
On Mon, Jul 27, 2015 at 8:16 PM, Andy Lutomirski  wrote:
> On Mon, Jul 27, 2015 at 7:20 PM, Andy Lutomirski  wrote:
>> On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
>>  wrote:
>>> On 07/27/2015 11:53 AM, Andy Lutomirski wrote:

 On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
  wrote:
>
> On 07/25/2015 01:36 AM, Andy Lutomirski wrote:
>>
>> Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
>> probably a good general attack surface reduction, and it replaces some
>> scary code with IMO less scary code.
>>
>> Also, servers and embedded systems should probably turn off modify_ldt.
>> This makes that possible.
>>
>> Xen people, can you take a look at this?
>>
>> Willy and Kees: I left the config option alone.  The -tiny people will
>> like it, and we can always add a sysctl of some sort later.
>>
>> Changes from v3:
>>- Hopefully fixed Xen.
>
>
> 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)
>
>>- Fixed 32-bit test case on 32-bit native kernel.
>
>
> I am not sure I see what changed.

 I misplaced the fix in the wrong git commit, so I failed to sent it.
 Oops.

 I just sent v4.1 of patch 3.  Can you try that?
>>>
>>>
>>>
>>> I am hitting BUG() in Xen code (returning from a hypercall) when freeing LDT
>>> in destroy_context(). Interestingly though when I run the test in the
>>> debugger I get SIGILL (just like before) but no BUG().
>>>
>>> Let me get back to you on that later today.
>>>
>>>
>>
>> After forward-porting my virtio patches, I got this thing to run on
>> Xen.  After several tries, I got:
>>
>> [   53.985707] [ cut here ]
>> [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
>> [   53.986677] invalid opcode:  [#1] SMP
>> [   53.986677] Modules linked in:
>> [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
>> [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
>> 04/01/2014
>> [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
>> [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
>> [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
>> [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
>> [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
>> [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
>> [   53.986677] Stack:
>> [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
>> 0b4a 0200
>> [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
>> c1062310 c01861c0
>> [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
>> c2373a80 
>> [   53.986677] Call Trace:
>> [   53.986677]  [] xen_free_ldt+0x2d/0x40
>> [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
>> [   53.986677]  [] destroy_context+0x25/0x40
>> [   53.986677]  [] __mmdrop+0x1e/0xc0
>> [   53.986677]  [] finish_task_switch+0xd8/0x1a0
>> [   53.986677]  [] __schedule+0x316/0x950
>> [   53.986677]  [] schedule+0x26/0x70
>> [   53.986677]  [] do_wait+0x1b3/0x200
>> [   53.986677]  [] SyS_waitpid+0x67/0xd0
>> [   53.986677]  [] ? task_stopped_code+0x50/0x50
>> [   53.986677]  [] syscall_call+0x7/0x7
>> [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
>> 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
>> c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
>> 89 e5
>> [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 
>> 0069:c0875e74
>> [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
>>
>> Is that the error you're seeing?
>>
>> If I change xen_free_ldt to:
>>
>> static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
>> {
>> const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
>> int i;
>>
>> vm_unmap_aliases();
>> xen_mc_flush();
>>
>> for(i = 0; i < entries; i += entries_per_page)
>> set_aliased_prot(ldt + i, PAGE_KERNEL);
>> }
>>
>> then it works.  I don't know why this makes a difference.
>> (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
>> doesn't.)
>>
>
> That fix makes sense if there's some way that the vmalloc area we're
> freeing has an extra alias somewhere, which is very much possible.  On
> the other hand, I don't see how this happens without first doing an
> MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
> expected that to blow up and/or result in test case failures.
>
> But I'm still confused, because it seems like Xen will never populate
> the actual (hidden) LDT mapping unless the pages backing it are
> unaliased and well-formed, which make me wonder why this stuff ever
> worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
> 

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Andy Lutomirski
On Mon, Jul 27, 2015 at 7:20 PM, Andy Lutomirski  wrote:
> On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
>  wrote:
>> On 07/27/2015 11:53 AM, Andy Lutomirski wrote:
>>>
>>> On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
>>>  wrote:

 On 07/25/2015 01:36 AM, Andy Lutomirski wrote:
>
> Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
> probably a good general attack surface reduction, and it replaces some
> scary code with IMO less scary code.
>
> Also, servers and embedded systems should probably turn off modify_ldt.
> This makes that possible.
>
> Xen people, can you take a look at this?
>
> Willy and Kees: I left the config option alone.  The -tiny people will
> like it, and we can always add a sysctl of some sort later.
>
> Changes from v3:
>- Hopefully fixed Xen.


 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)

>- Fixed 32-bit test case on 32-bit native kernel.


 I am not sure I see what changed.
>>>
>>> I misplaced the fix in the wrong git commit, so I failed to sent it.
>>> Oops.
>>>
>>> I just sent v4.1 of patch 3.  Can you try that?
>>
>>
>>
>> I am hitting BUG() in Xen code (returning from a hypercall) when freeing LDT
>> in destroy_context(). Interestingly though when I run the test in the
>> debugger I get SIGILL (just like before) but no BUG().
>>
>> Let me get back to you on that later today.
>>
>>
>
> After forward-porting my virtio patches, I got this thing to run on
> Xen.  After several tries, I got:
>
> [   53.985707] [ cut here ]
> [   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
> [   53.986677] invalid opcode:  [#1] SMP
> [   53.986677] Modules linked in:
> [   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
> [   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
> 04/01/2014
> [   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
> [   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
> [   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
> [   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
> [   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
> [   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
> [   53.986677] Stack:
> [   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
> 0b4a 0200
> [   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
> c1062310 c01861c0
> [   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
> c2373a80 
> [   53.986677] Call Trace:
> [   53.986677]  [] xen_free_ldt+0x2d/0x40
> [   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
> [   53.986677]  [] destroy_context+0x25/0x40
> [   53.986677]  [] __mmdrop+0x1e/0xc0
> [   53.986677]  [] finish_task_switch+0xd8/0x1a0
> [   53.986677]  [] __schedule+0x316/0x950
> [   53.986677]  [] schedule+0x26/0x70
> [   53.986677]  [] do_wait+0x1b3/0x200
> [   53.986677]  [] SyS_waitpid+0x67/0xd0
> [   53.986677]  [] ? task_stopped_code+0x50/0x50
> [   53.986677]  [] syscall_call+0x7/0x7
> [   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
> 4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
> c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
> 89 e5
> [   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 
> 0069:c0875e74
> [   54.010069] ---[ end trace 89ac35b29c1c59bb ]---
>
> Is that the error you're seeing?
>
> If I change xen_free_ldt to:
>
> static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
> {
> const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
> int i;
>
> vm_unmap_aliases();
> xen_mc_flush();
>
> for(i = 0; i < entries; i += entries_per_page)
> set_aliased_prot(ldt + i, PAGE_KERNEL);
> }
>
> then it works.  I don't know why this makes a difference.
> (xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
> doesn't.)
>

That fix makes sense if there's some way that the vmalloc area we're
freeing has an extra alias somewhere, which is very much possible.  On
the other hand, I don't see how this happens without first doing an
MMUEXT_SET_LDT with an unexpectedly aliased address, and I would have
expected that to blow up and/or result in test case failures.

But I'm still confused, because it seems like Xen will never populate
the actual (hidden) LDT mapping unless the pages backing it are
unaliased and well-formed, which make me wonder why this stuff ever
worked.  Wouldn't LDT access with pre-existing vmalloc aliases result
in segfaults?

The semantics seem to be very odd.  xen_free_ldt with an aliased
address might fail (and OOPS), but actual access to the LDT with an
aliased address page faults.

Also, using kza

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Andy Lutomirski
On Mon, Jul 27, 2015 at 9:18 AM, Boris Ostrovsky
 wrote:
> On 07/27/2015 11:53 AM, Andy Lutomirski wrote:
>>
>> On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
>>  wrote:
>>>
>>> On 07/25/2015 01:36 AM, Andy Lutomirski wrote:

 Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
 probably a good general attack surface reduction, and it replaces some
 scary code with IMO less scary code.

 Also, servers and embedded systems should probably turn off modify_ldt.
 This makes that possible.

 Xen people, can you take a look at this?

 Willy and Kees: I left the config option alone.  The -tiny people will
 like it, and we can always add a sysctl of some sort later.

 Changes from v3:
- Hopefully fixed Xen.
>>>
>>>
>>> 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)
>>>
- Fixed 32-bit test case on 32-bit native kernel.
>>>
>>>
>>> I am not sure I see what changed.
>>
>> I misplaced the fix in the wrong git commit, so I failed to sent it.
>> Oops.
>>
>> I just sent v4.1 of patch 3.  Can you try that?
>
>
>
> I am hitting BUG() in Xen code (returning from a hypercall) when freeing LDT
> in destroy_context(). Interestingly though when I run the test in the
> debugger I get SIGILL (just like before) but no BUG().
>
> Let me get back to you on that later today.
>
>

After forward-porting my virtio patches, I got this thing to run on
Xen.  After several tries, I got:

[   53.985707] [ cut here ]
[   53.986314] kernel BUG at arch/x86/xen/enlighten.c:496!
[   53.986677] invalid opcode:  [#1] SMP
[   53.986677] Modules linked in:
[   53.986677] CPU: 0 PID: 1400 Comm: bash Not tainted 4.2.0-rc4+ #4
[   53.986677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[   53.986677] task: c2376180 ti: c0874000 task.ti: c0874000
[   53.986677] EIP: 0061:[] EFLAGS: 00010282 CPU: 0
[   53.986677] EIP is at set_aliased_prot+0xb2/0xc0
[   53.986677] EAX: ffea EBX: cc3d1000 ECX: 0672e063 EDX: 8000
[   53.986677] ESI:  EDI: 8000 EBP: c0875e94 ESP: c0875e74
[   53.986677]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[   53.986677] CR0: 80050033 CR2: b77404d4 CR3: 020b6000 CR4: 00042660
[   53.986677] Stack:
[   53.986677]  8000 0672e063 21c0 cc3d1000 0001 cc3d2000
0b4a 0200
[   53.986677]  c0875ea8 c105312d c2317940 c2373a80  c0875eb4
c1062310 c01861c0
[   53.986677]  c0875ec0 c1062735 c01861c0 c0875ed4 c10a764e c7007a00
c2373a80 
[   53.986677] Call Trace:
[   53.986677]  [] xen_free_ldt+0x2d/0x40
[   53.986677]  [] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [] destroy_context+0x25/0x40
[   53.986677]  [] __mmdrop+0x1e/0xc0
[   53.986677]  [] finish_task_switch+0xd8/0x1a0
[   53.986677]  [] __schedule+0x316/0x950
[   53.986677]  [] schedule+0x26/0x70
[   53.986677]  [] do_wait+0x1b3/0x200
[   53.986677]  [] SyS_waitpid+0x67/0xd0
[   53.986677]  [] ? task_stopped_code+0x50/0x50
[   53.986677]  [] syscall_call+0x7/0x7
[   53.986677] Code: e8 c1 e3 0c 81 eb 00 00 00 40 39 5d ec 74 11 8b
4d e4 8b 55 e0 31 f6 e8 dd e0 fa ff 85 c0 75 0d 83 c4 14 5b 5e 5f 5d
c3 90 0f 0b <0f> 0b 0f 0b 8d 76 00 8d bc 27 00 00 00 00 85 d2 74 31 55
89 e5
[   53.986677] EIP: [] set_aliased_prot+0xb2/0xc0 SS:ESP 0069:c0875e74
[   54.010069] ---[ end trace 89ac35b29c1c59bb ]---

Is that the error you're seeing?

If I change xen_free_ldt to:

static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
{
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;

vm_unmap_aliases();
xen_mc_flush();

for(i = 0; i < entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL);
}

then it works.  I don't know why this makes a difference.
(xen_mc_flush makes a little bit of sense to me.  vm_unmap_aliases
doesn't.)

It's *possible* that there's some but in my code that causes a CPU to
retain a reference to a stale LDT, but I don't see it.  Hmm.  Is it
possible that, when a process exits, we kill the mm without
synchronously unlazying it everywhere else?  Seems a bit hard to
imagine to me -- I don't see why this wouldn't blow up when the pgt
went away.

My best guess is that there's a silly race in which one CPU frees and
LDT before the other CPU flushes its hypercalls.  But I don't really
believe this, because I got this trace:

[   14.257546] Free LDT cb912000: CPU0 cb923000 CPU1 cb923000
[OK]All 5 iterations succeeded
root@(none):/# [   15.824723] Free LDT cb923000: CPU0   (null) CPU1   (null)
[   15.827404] [ cut here ]
[   15.828349] kernel BUG at arch/x86/xen/enlighten.c:497!
[   15.828349] invalid opcode:  [#1] SMP

with this patch applied:

@@ -537,7 +542,9 @@ static void xen_set_ldt(const void *addr, unsigned entries)

MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);

-   xen_mc_i

Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Boris Ostrovsky

On 07/27/2015 11:53 AM, Andy Lutomirski wrote:

On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
 wrote:

On 07/25/2015 01:36 AM, Andy Lutomirski wrote:

Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
probably a good general attack surface reduction, and it replaces some
scary code with IMO less scary code.

Also, servers and embedded systems should probably turn off modify_ldt.
This makes that possible.

Xen people, can you take a look at this?

Willy and Kees: I left the config option alone.  The -tiny people will
like it, and we can always add a sysctl of some sort later.

Changes from v3:
   - Hopefully fixed Xen.


32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)


   - Fixed 32-bit test case on 32-bit native kernel.


I am not sure I see what changed.

I misplaced the fix in the wrong git commit, so I failed to sent it.  Oops.

I just sent v4.1 of patch 3.  Can you try that?



I am hitting BUG() in Xen code (returning from a hypercall) when freeing 
LDT in destroy_context(). Interestingly though when I run the test in 
the debugger I get SIGILL (just like before) but no BUG().


Let me get back to you on that later today.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Andy Lutomirski
On Mon, Jul 27, 2015 at 8:36 AM, Boris Ostrovsky
 wrote:
> On 07/25/2015 01:36 AM, Andy Lutomirski wrote:
>>
>> Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
>> probably a good general attack surface reduction, and it replaces some
>> scary code with IMO less scary code.
>>
>> Also, servers and embedded systems should probably turn off modify_ldt.
>> This makes that possible.
>>
>> Xen people, can you take a look at this?
>>
>> Willy and Kees: I left the config option alone.  The -tiny people will
>> like it, and we can always add a sysctl of some sort later.
>>
>> Changes from v3:
>>   - Hopefully fixed Xen.
>
>
> 32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)
>
>>   - Fixed 32-bit test case on 32-bit native kernel.
>
>
> I am not sure I see what changed.

I misplaced the fix in the wrong git commit, so I failed to sent it.  Oops.

I just sent v4.1 of patch 3.  Can you try that?

>
> -boris



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-27 Thread Boris Ostrovsky

On 07/25/2015 01:36 AM, Andy Lutomirski wrote:

Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
probably a good general attack surface reduction, and it replaces some
scary code with IMO less scary code.

Also, servers and embedded systems should probably turn off modify_ldt.
This makes that possible.

Xen people, can you take a look at this?

Willy and Kees: I left the config option alone.  The -tiny people will
like it, and we can always add a sysctl of some sort later.

Changes from v3:
  - Hopefully fixed Xen.


32b-on-32b fails in the same manner. (but non-zero LDT is taken care of)


  - Fixed 32-bit test case on 32-bit native kernel.


I am not sure I see what changed.

-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-24 Thread Willy Tarreau
On Fri, Jul 24, 2015 at 10:36:43PM -0700, Andy Lutomirski wrote:
> Willy and Kees: I left the config option alone.  The -tiny people will
> like it, and we can always add a sysctl of some sort later.

OK, please ignore my other e-mail I missed this part. I'll see if I
can propose the sysctl completement on top of this so that we can
hope a wider deployment asap.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] x86: modify_ldt improvement, test, and config option

2015-07-24 Thread Andy Lutomirski
Here's v3.  It fixes the "dazed and confused" issue, I hope.  It's also
probably a good general attack surface reduction, and it replaces some
scary code with IMO less scary code.

Also, servers and embedded systems should probably turn off modify_ldt.
This makes that possible.

Xen people, can you take a look at this?

Willy and Kees: I left the config option alone.  The -tiny people will
like it, and we can always add a sysctl of some sort later.

Changes from v3:
 - Hopefully fixed Xen.
 - Fixed 32-bit test case on 32-bit native kernel.
 - Fix bogus vumnap for some LDT sizes.
 - Strengthen test case to check all LDT sizes (catches bogus vunmap).
 - Lots of cleanups, mostly from Borislav.
 - Simplify IPI code using on_each_cpu_mask.

Changes from v2:
 - Allocate ldt_struct and the LDT entries separately.  This should fix Xen.
 - Stop using write_ldt_entry, since I'm pretty sure it's unnecessary now
   that we no longer mutate an in-use LDT.  (Xen people, can you check?)

Changes from v1:
 - The config option is new.
 - The test case is new.
 - Fixed a missing allocation failure check.
 - Fixed a use-after-free on fork().

Andy Lutomirski (3):
  x86/ldt: Make modify_ldt synchronous
  x86/ldt: Make modify_ldt optional
  selftests/x86, x86/ldt: Add a selftest for modify_ldt

 arch/x86/Kconfig  |  17 ++
 arch/x86/include/asm/desc.h   |  15 --
 arch/x86/include/asm/mmu.h|   5 +-
 arch/x86/include/asm/mmu_context.h|  68 -
 arch/x86/kernel/Makefile  |   3 +-
 arch/x86/kernel/cpu/common.c  |   4 +-
 arch/x86/kernel/cpu/perf_event.c  |  16 +-
 arch/x86/kernel/ldt.c | 262 +-
 arch/x86/kernel/process_64.c  |   6 +-
 arch/x86/kernel/step.c|   8 +-
 arch/x86/power/cpu.c  |   3 +-
 kernel/sys_ni.c   |   1 +
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/ldt_gdt.c | 492 ++
 14 files changed, 747 insertions(+), 155 deletions(-)
 create mode 100644 tools/testing/selftests/x86/ldt_gdt.c

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/