Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-12 21:36+0100, Paolo Bonzini:
> From: Gleb Natapov 
> 
> A guest can cause a BUG_ON() leading to a host kernel crash.
> When the guest writes to the ICR to request an IPI, while in x2apic
> mode the following things happen, the destination is read from
> ICR2, which is a register that the guest can control.
> 
> kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> cluster id.  A BUG_ON is triggered, which is a protection against
> accessing map->logical_map with an out-of-bounds access and manages
> to avoid that anything really unsafe occurs.
> 
> The logic in the code is correct from real HW point of view. The problem
> is that KVM supports only one cluster with ID 0 in clustered mode, but
> the code that has the bug does not take this into account.

The more I read about x2apic, the more confused I am ...

 - How was the cluster x2apic enabled?

   Linux won't enable cluster x2apic without interrupt remapping and I
   had no idea we were allowed to do it.

 - A hardware test-suite found this?

   This bug can only be hit when the destination cpu is > 256, so the
   request itself is buggy -- we don't support that many in kvm and it
   would crash when initializing the vcpus if we did.
   => It looks like we should just ignore the ipi, because we have no
  vcpus in that cluster.

 - Where does the 'only one supported cluster' come from?

   I only see we use 'struct kvm_lapic *logical_map[16][16];', which
   supports 16 clusters of 16 apics = first 256 vcpus, so if we map
   everything to logical_map[0][0:15], we would not work correctly in
   the cluster x2apic, with > 16 vcpus.

Thanks.

> Reported-by: Lars Bull 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gleb Natapov 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/lapic.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8bec45c1610..801dc3fd66e1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> +#define KMV_X2APIC_CID_BITS 0
> +
>  static void recalculate_apic_map(struct kvm *kvm)
>  {
>   struct kvm_apic_map *new, *old = NULL;
> @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>   if (apic_x2apic_mode(apic)) {
>   new->ldr_bits = 32;
>   new->cid_shift = 16;
> - new->cid_mask = new->lid_mask = 0x;
> + new->cid_mask = (1 << KMV_X2APIC_CID_BITS) - 1;
> + new->lid_mask = 0x;
>   } else if (kvm_apic_sw_enabled(apic) &&
>   !new->cid_mask /* flat mode */ &&
>   kvm_apic_get_reg(apic, APIC_DFR) == 
> APIC_DFR_CLUSTER) {
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Paolo Bonzini
Il 13/12/2013 17:07, Radim Krčmář ha scritto:
>This bug can only be hit when the destination cpu is > 256, so the
>request itself is buggy -- we don't support that many in kvm and it
>would crash when initializing the vcpus if we did.
>=> It looks like we should just ignore the ipi, because we have no
>   vcpus in that cluster.

That's what should happen in physical mode.  Something like this patch:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117d5c4c..1f8e9e1abd3b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
 
if (irq->dest_mode == 0) { /* physical mode */
if (irq->delivery_mode == APIC_DM_LOWEST ||
-   irq->dest_id == 0xff)
+   irq->dest_id == 0xff ||
+   (apic_x2apic_mode(src) && irq->dest_id > 0xff))
goto out;
-   dst = &map->phys_map[irq->dest_id & 0xff];
+   dst = &map->phys_map[irq->dest_id];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
 

On top of this, the x2apic spec documents a "broadcast" destination ID that
could be implemented as follows.  But I have no idea if this is correct and
how it differs from the usual broadcast shorthand:

@@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.level = icr_low & APIC_INT_ASSERT;
irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
irq.shorthand = icr_low & APIC_SHORT_MASK;
-   if (apic_x2apic_mode(apic))
+   if (apic_x2apic_mode(apic)) {
irq.dest_id = icr_high;
-   else
+   if (icr_high == 0x)
+   irq.shorthand = APIC_DEST_ALLINC;
+   } else
irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
 
trace_kvm_apic_ipi(icr_low, irq.dest_id);


>  - Where does the 'only one supported cluster' come from?
> 
>I only see we use 'struct kvm_lapic *logical_map[16][16];', which
>supports 16 clusters of 16 apics = first 256 vcpus, so if we map
>everything to logical_map[0][0:15], we would not work correctly in
>the cluster x2apic, with > 16 vcpus.

I think you're right.  Something like this would be a better fix:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..58745cbbb7e6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
cid = apic_cluster_id(new, ldr);
lid = apic_logical_id(new, ldr);
 
-   if (lid)
+   if (cid < ARRAY_SIZE(new->logical_map) && lid)
new->logical_map[cid][ffs(lid) - 1] = apic;
}
 out:
@@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
dst = &map->phys_map[irq->dest_id & 0xff];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
+   u32 cid = apic_cluster_id(map, mda);
 
-   dst = map->logical_map[apic_cluster_id(map, mda)];
+   if (cid >= ARRAY_SIZE(map->logical_map))
+   goto out;
 
+   dst = map->logical_map[cid];
bitmap = apic_logical_id(map, mda);
 
if (irq->delivery_mode == APIC_DM_LOWEST) {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c8b0d0d2da5c..5ccb71658894 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map *map, 
u32 ldr)
ldr >>= 32 - map->ldr_bits;
cid = (ldr >> map->cid_shift) & map->cid_mask;
 
-   BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
-
return cid;
 }
 
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
git diff kvm/lapic.c
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..1005185972a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
-#define KVM_X2APIC_CID_BITS 0
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
@@ -182,8 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
if (apic_x2apic_mode(apic)) {
new->ldr_bits = 32;
new->cid_shift = 16;
-   new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
-   new->lid_mask = 0x;
+   new->cid_mask = new->lid_mask = 0x;
} else if (kvm_apic_sw_enabled(apic) &&
!new->cid_mask /* flat mode */ &&
   

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-13 18:25+0100, Paolo Bonzini:
> Il 13/12/2013 17:07, Radim Krčmář ha scritto:
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> 
> That's what should happen in physical mode.  Something like this patch:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5439117d5c4c..1f8e9e1abd3b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>  
>   if (irq->dest_mode == 0) { /* physical mode */
>   if (irq->delivery_mode == APIC_DM_LOWEST ||
> - irq->dest_id == 0xff)
> + irq->dest_id == 0xff ||

The 0xff is xapic broadcast, we did not care about the x2apic one and
no-one noticed that it does not work :)

0xff isn't special in x2apic mode.

> + (apic_x2apic_mode(src) && irq->dest_id > 0xff))
>   goto out;
> - dst = &map->phys_map[irq->dest_id & 0xff];
> + dst = &map->phys_map[irq->dest_id];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> 
> On top of this, the x2apic spec documents a "broadcast" destination ID that
> could be implemented as follows.  But I have no idea if this is correct and
> how it differs from the usual broadcast shorthand:
> 
> @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
>   irq.level = icr_low & APIC_INT_ASSERT;
>   irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
>   irq.shorthand = icr_low & APIC_SHORT_MASK;
> - if (apic_x2apic_mode(apic))
> + if (apic_x2apic_mode(apic)) {
>   irq.dest_id = icr_high;
> - else
> + if (icr_high == 0x)
> + irq.shorthand = APIC_DEST_ALLINC;

The shorthand takes priority, so we shouldn't overwrite it.
This is better solved after we 'goto out' from the _fast().
(could be nicer still ...)

> + } else
>   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
>   trace_kvm_apic_ipi(icr_low, irq.dest_id);
> 
> 

And another possibility occured to me now: Google was the first one to
encounter a broadcast; we don't handle it at all in the fast path and
x2apic has 0x as the target in both modes ...
(I think that xapic has 0xff in both of them too, but I'll check)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117..e4618c4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -512,19 +512,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
apic_update_ppr(apic);
 }
 
-int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
+int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
 {
-   return dest == 0xff || kvm_apic_id(apic) == dest;
+   return dest == (apic_x2apic_mode(apic) ? 0x : 0xff)
+  || kvm_apic_id(apic) == dest;
 }
 
-int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
+int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 {
int result = 0;
u32 logical_id;
 
+   if (mda == (apic_x2apic_mode(apic) ? 0x : 0xff))
+   return 1;
+
if (apic_x2apic_mode(apic)) {
logical_id = kvm_apic_get_reg(apic, APIC_LDR);
-   return logical_id & mda;
+   return mda >> 16 == logical_id >> 16
+  && logical_id & mda & 0x;
}
 
logical_id = GET_APIC_LOGICAL_ID(kvm_apic_get_reg(apic, APIC_LDR));
@@ -605,6 +610,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
if (irq->shorthand)
return false;
 
+   /* broadcast */
+   if (irq->dest_id == (apic_x2apic_mode(src) ? 0x : 0xff))
+   return false;
+
rcu_read_lock();
map = rcu_dereference(kvm->arch.apic_map);
 
@@ -612,10 +621,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,
goto out;
 
if (irq->dest_mode == 0) { /* physical mode */
-   if (irq->delivery_mode == APIC_DM_LOWEST ||
-   irq->dest_id == 0xff)
+   if (irq->dest_id > 0xff) {
+   ret = true;
+   goto out;
+   }
+   if (irq->delivery_mode == APIC_DM_LOWEST)
goto out;
-   dst = &map->phys_map[irq->dest_id & 0xff];
+   dst = &map->phys_map[irq->dest_id];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
 

The logical x2apic slowpath was completely broken, maybe still is,
I have no way to test it ...

The broadcast checking s

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
> 2013-12-12 21:36+0100, Paolo Bonzini:
> > From: Gleb Natapov 
> > 
> > A guest can cause a BUG_ON() leading to a host kernel crash.
> > When the guest writes to the ICR to request an IPI, while in x2apic
> > mode the following things happen, the destination is read from
> > ICR2, which is a register that the guest can control.
> > 
> > kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> > cluster id.  A BUG_ON is triggered, which is a protection against
> > accessing map->logical_map with an out-of-bounds access and manages
> > to avoid that anything really unsafe occurs.
> > 
> > The logic in the code is correct from real HW point of view. The problem
> > is that KVM supports only one cluster with ID 0 in clustered mode, but
> > the code that has the bug does not take this into account.
> 
> The more I read about x2apic, the more confused I am ...
> 
>  - How was the cluster x2apic enabled?
> 
>Linux won't enable cluster x2apic without interrupt remapping and I
>had no idea we were allowed to do it.
> 
Malicious code can do it.

>  - A hardware test-suite found this?
> 
>This bug can only be hit when the destination cpu is > 256, so the
>request itself is buggy -- we don't support that many in kvm and it
>would crash when initializing the vcpus if we did.
>=> It looks like we should just ignore the ipi, because we have no
>   vcpus in that cluster.
> 
That's the nature of malicious code: it does what your code does not expects
it to do :)


>  - Where does the 'only one supported cluster' come from?
> 
"only one supported cluster" comes from 8 bit cpuid limitation of KVM's x2apic
implementation. With 8 bit cpuid you can only address cluster 0 in logical mode.

>I only see we use 'struct kvm_lapic *logical_map[16][16];', which
>supports 16 clusters of 16 apics = first 256 vcpus, so if we map
>everything to logical_map[0][0:15], we would not work correctly in
>the cluster x2apic, with > 16 vcpus.
> 
Such config cannot work today because of 8 bit cpuid limitation. When the 
limitation
will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
want to support.
It will likely be smaller then 16 bit though since full 18 bit support will 
require huge tables.

> Thanks.
> 
> > Reported-by: Lars Bull 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Gleb Natapov 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  arch/x86/kvm/lapic.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index b8bec45c1610..801dc3fd66e1 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
> > return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> >  }
> >  
> > +#define KMV_X2APIC_CID_BITS 0
> > +
> >  static void recalculate_apic_map(struct kvm *kvm)
> >  {
> > struct kvm_apic_map *new, *old = NULL;
> > @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
> > if (apic_x2apic_mode(apic)) {
> > new->ldr_bits = 32;
> > new->cid_shift = 16;
> > -   new->cid_mask = new->lid_mask = 0x;
> > +   new->cid_mask = (1 << KMV_X2APIC_CID_BITS) - 1;
> > +   new->lid_mask = 0x;
> > } else if (kvm_apic_sw_enabled(apic) &&
> > !new->cid_mask /* flat mode */ &&
> > kvm_apic_get_reg(apic, APIC_DFR) == 
> > APIC_DFR_CLUSTER) {
> > -- 
> > 1.8.3.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 06:25:20PM +0100, Paolo Bonzini wrote:
> Il 13/12/2013 17:07, Radim Krčmář ha scritto:
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> 
> That's what should happen in physical mode.  Something like this patch:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5439117d5c4c..1f8e9e1abd3b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>  
>   if (irq->dest_mode == 0) { /* physical mode */
>   if (irq->delivery_mode == APIC_DM_LOWEST ||
> - irq->dest_id == 0xff)
> + irq->dest_id == 0xff ||
> + (apic_x2apic_mode(src) && irq->dest_id > 0xff))
Here you fall back to slow path wich still uses irq->dest_id == 0xff as 
broadcast test.

>   goto out;
> - dst = &map->phys_map[irq->dest_id & 0xff];
> + dst = &map->phys_map[irq->dest_id];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> 
> On top of this, the x2apic spec documents a "broadcast" destination ID that
> could be implemented as follows.  But I have no idea if this is correct and
> how it differs from the usual broadcast shorthand:
> 
> @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
I'd rather add kvm_apic_is_broadcast() function like that:
bool kvm_apic_is_broadcast(struct kvm_lapic *l, u32 dest)
{
 return dest == (apic_x2apic_mode(l) ? 0x : 0xff);
}

and use everywhere instead of  irq->dest_id == 0xff test.

>   irq.level = icr_low & APIC_INT_ASSERT;
>   irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
>   irq.shorthand = icr_low & APIC_SHORT_MASK;
> - if (apic_x2apic_mode(apic))
> + if (apic_x2apic_mode(apic)) {
>   irq.dest_id = icr_high;
> - else
> + if (icr_high == 0x)
> + irq.shorthand = APIC_DEST_ALLINC;
> + } else
>   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
>  
>   trace_kvm_apic_ipi(icr_low, irq.dest_id);
> 
> 
> >  - Where does the 'only one supported cluster' come from?
> > 
> >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> >everything to logical_map[0][0:15], we would not work correctly in
> >the cluster x2apic, with > 16 vcpus.
> 
> I think you're right.  Something like this would be a better fix:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
See my answer to Radim. There is not point in maintaining cid mapping that 
cannot be
addressed.

> index dec48bfaddb8..58745cbbb7e6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>   cid = apic_cluster_id(new, ldr);
>   lid = apic_logical_id(new, ldr);
>  
> - if (lid)
> + if (cid < ARRAY_SIZE(new->logical_map) && lid)
>   new->logical_map[cid][ffs(lid) - 1] = apic;
>   }
>  out:
> @@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>   dst = &map->phys_map[irq->dest_id & 0xff];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
> + u32 cid = apic_cluster_id(map, mda);
>  
> - dst = map->logical_map[apic_cluster_id(map, mda)];
> + if (cid >= ARRAY_SIZE(map->logical_map))
> + goto out;
>  
> + dst = map->logical_map[cid];
>   bitmap = apic_logical_id(map, mda);
>  
>   if (irq->delivery_mode == APIC_DM_LOWEST) {
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index c8b0d0d2da5c..5ccb71658894 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map 
> *map, u32 ldr)
>   ldr >>= 32 - map->ldr_bits;
>   cid = (ldr >> map->cid_shift) & map->cid_mask;
>  
> - BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> -
>   return cid;
>  }
>  
> playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
> playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
> git diff kvm/lapic.c
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index dec48bfaddb8..1005185972a9 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> -#define KVM_X2APIC_CID_BITS 0
> -
>  static 

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-14 11:46+0200, Gleb Natapov:
> On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
> > 2013-12-12 21:36+0100, Paolo Bonzini:
> > > From: Gleb Natapov 
> > > 
> > > A guest can cause a BUG_ON() leading to a host kernel crash.
> > > When the guest writes to the ICR to request an IPI, while in x2apic
> > > mode the following things happen, the destination is read from
> > > ICR2, which is a register that the guest can control.
> > > 
> > > kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> > > cluster id.  A BUG_ON is triggered, which is a protection against
> > > accessing map->logical_map with an out-of-bounds access and manages
> > > to avoid that anything really unsafe occurs.
> > > 
> > > The logic in the code is correct from real HW point of view. The problem
> > > is that KVM supports only one cluster with ID 0 in clustered mode, but
> > > the code that has the bug does not take this into account.
> > 
> > The more I read about x2apic, the more confused I am ...
> > 
> >  - How was the cluster x2apic enabled?
> > 
> >Linux won't enable cluster x2apic without interrupt remapping and I
> >had no idea we were allowed to do it.
> > 
> Malicious code can do it.
> 
> >  - A hardware test-suite found this?
> > 
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> > 
> That's the nature of malicious code: it does what your code does not expects
> it to do :)

I was wondering if there wasn't malicious linux on the other side too :)

> >  - Where does the 'only one supported cluster' come from?
> > 
> "only one supported cluster" comes from 8 bit cpuid limitation of KVM's x2apic
> implementation. With 8 bit cpuid you can only address cluster 0 in logical 
> mode.

One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
so 8 bit cpuid can address first 16 clusters as well.

  u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));

> >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> >everything to logical_map[0][0:15], we would not work correctly in
> >the cluster x2apic, with > 16 vcpus.
> > 
> Such config cannot work today because of 8 bit cpuid limitation. When the 
> limitation
> will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
> want to support.

Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
cpuid, we would still deliver interrupts destined for cpuid > 256 to
potentially plugged cpus.

> It will likely be smaller then 16 bit though since full 18 bit support will 
> require huge tables.

Yeah, we'll have to do dynamic allocation if we are ever going to need
the full potential of x2apic.
(2^20-16 cpus in cluster and 2^32-1 in flat mode)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > >  - Where does the 'only one supported cluster' come from?
> > > 
> > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > x2apic
> > implementation. With 8 bit cpuid you can only address cluster 0 in logical 
> > mode.
> 
> One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> so 8 bit cpuid can address first 16 clusters as well.
> 
>   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> 
Interrupt from a device cannot generate such ldr, only IPI can. Only
4 cpus in cluster zero are addressable in clustering mode by a
device. Without irq remapping x2apic is a PV interface between host
and guest where guest needs to know KVM implementation's limitation to
use it. I do not see a point in fixing problems in x2apic logical mode
emulation right now since it will not make it usable, as long as
there is not security problems there.

> > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > >everything to logical_map[0][0:15], we would not work correctly in
> > >the cluster x2apic, with > 16 vcpus.
> > > 
> > Such config cannot work today because of 8 bit cpuid limitation. When the 
> > limitation
> > will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
> > want to support.
> 
> Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> cpuid, we would still deliver interrupts destined for cpuid > 256 to
> potentially plugged cpus.
Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
unfortunately, not sure what you mean by the second part of the sentence.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 14:16+0200, Gleb Natapov:
> On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > >  - Where does the 'only one supported cluster' come from?
> > > > 
> > > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > > x2apic
> > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > logical mode.
> > 
> > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > so 8 bit cpuid can address first 16 clusters as well.
> > 
> >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > 
> Interrupt from a device cannot generate such ldr, only IPI can. Only
> 4 cpus in cluster zero are addressable in clustering mode by a
> device. Without irq remapping x2apic is a PV interface between host
> and guest where guest needs to know KVM implementation's limitation to
> use it.

Thanks, I'll read more about devices ... still no idea how could they
address cluster > 15.

> I do not see a point in fixing problems in x2apic logical mode
> emulation right now since it will not make it usable, as long as
> there is not security problems there.

Agreed; I wanted to know why this patch was correct, if we cared.

> > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > >everything to logical_map[0][0:15], we would not work correctly in
> > > >the cluster x2apic, with > 16 vcpus.
> > > > 
> > > Such config cannot work today because of 8 bit cpuid limitation. When the 
> > > limitation
> > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits 
> > > we want to support.
> > 
> > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > potentially plugged cpus.
> Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> unfortunately, not sure what you mean by the second part of the sentence.

Sorry, I meant that with this change, we map all clusters to cluster 0,
which has two flaws:
 - in kvm_lapic_set_base(), the order of vcpu creation determines those
   assigned to cluster 0, and the rest is unaddressable (overwritten)
 - we can send IPI to an unplugged high cpuid and it arives in cluster 0
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 13:55+0100, Radim Krčmář:
> 2013-12-16 14:16+0200, Gleb Natapov:
> > On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > > >  - Where does the 'only one supported cluster' come from?
> > > > > 
> > > > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > > > x2apic
> > > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > > logical mode.
> > > 
> > > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > > so 8 bit cpuid can address first 16 clusters as well.
> > > 
> > >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > > 
> > Interrupt from a device cannot generate such ldr, only IPI can. Only
> > 4 cpus in cluster zero are addressable in clustering mode by a
> > device. Without irq remapping x2apic is a PV interface between host
> > and guest where guest needs to know KVM implementation's limitation to
> > use it.
> 
> Thanks, I'll read more about devices ... still no idea how could they
> address cluster > 15.
> 
> > I do not see a point in fixing problems in x2apic logical mode
> > emulation right now since it will not make it usable, as long as
> > there is not security problems there.
> 
> Agreed; I wanted to know why this patch was correct, if we cared.
> 
> > > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > > >everything to logical_map[0][0:15], we would not work correctly in
> > > > >the cluster x2apic, with > 16 vcpus.
> > > > > 
> > > > Such config cannot work today because of 8 bit cpuid limitation. When 
> > > > the limitation
> > > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
> > > > bits we want to support.
> > > 
> > > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > > potentially plugged cpus.
> > Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> > unfortunately, not sure what you mean by the second part of the sentence.
> 
> Sorry, I meant that with this change, we map all clusters to cluster 0,
> which has two flaws:
>  - in kvm_lapic_set_base(), the order of vcpu creation determines those

Gah, should have been 'recalculate_apic_map()' ...

The patch would be especially surprising with a dynamic adding of vcpus.

>assigned to cluster 0, and the rest is unaddressable (overwritten)
>  - we can send IPI to an unplugged high cpuid and it arives in cluster 0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 02:31:43PM +0100, Radim Krčmář wrote:
> 2013-12-16 13:55+0100, Radim Krčmář:
> > 2013-12-16 14:16+0200, Gleb Natapov:
> > > On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > > > >  - Where does the 'only one supported cluster' come from?
> > > > > > 
> > > > > "only one supported cluster" comes from 8 bit cpuid limitation of 
> > > > > KVM's x2apic
> > > > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > > > logical mode.
> > > > 
> > > > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > > > so 8 bit cpuid can address first 16 clusters as well.
> > > > 
> > > >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > > > 
> > > Interrupt from a device cannot generate such ldr, only IPI can. Only
> > > 4 cpus in cluster zero are addressable in clustering mode by a
> > > device. Without irq remapping x2apic is a PV interface between host
> > > and guest where guest needs to know KVM implementation's limitation to
> > > use it.
> > 
> > Thanks, I'll read more about devices ... still no idea how could they
> > address cluster > 15.
> > 
With irq remapping device they will be able to provide 32bit apic addresses.

> > > I do not see a point in fixing problems in x2apic logical mode
> > > emulation right now since it will not make it usable, as long as
> > > there is not security problems there.
> > 
> > Agreed; I wanted to know why this patch was correct, if we cared.
> > 
> > > > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > > > >everything to logical_map[0][0:15], we would not work correctly 
> > > > > > in
> > > > > >the cluster x2apic, with > 16 vcpus.
> > > > > > 
> > > > > Such config cannot work today because of 8 bit cpuid limitation. When 
> > > > > the limitation
> > > > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
> > > > > bits we want to support.
> > > > 
> > > > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > > > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > > > potentially plugged cpus.
> > > Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> > > unfortunately, not sure what you mean by the second part of the sentence.
> > 
> > Sorry, I meant that with this change, we map all clusters to cluster 0,
> > which has two flaws:
> >  - in kvm_lapic_set_base(), the order of vcpu creation determines those
> 
> Gah, should have been 'recalculate_apic_map()' ...
> 
> The patch would be especially surprising with a dynamic adding of vcpus.
> 
> >assigned to cluster 0, and the rest is unaddressable (overwritten)
> >  - we can send IPI to an unplugged high cpuid and it arives in cluster 0
It does not matter since well behaved guest is not supposed to configure apic 
like
that.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html