Re: Remaining passthrough/VT-d tasks list

2008-10-05 Thread Avi Kivity
Yang, Sheng wrote:

 You're right, I didn't think it through.

 If there was a standard way to mask pci irqs, it might have worked, but
 there isn't, unfortunately.

 
 What if we got a way to mask pci irqs? We also have to unmask pci irq when 
 guest wrote EOI to vlapic(or at any other time). I think this still cause 
 problem. The problem is, we don't know if guest would deassert the line. 
 Maybe add some time-based detection here might work?

 And about the mask of pci irq, how about disable PCI device interrupt using 
 Device Control Register bit 10? Not sure if it would affect the pending 
 transaction, also not sure all device support this (though they should 
 support).
   

I didn't know about this. I'll try to find a copy of the pci spec and
read up on it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-28 Thread Muli Ben-Yehuda
On Sat, Sep 27, 2008 at 01:16:22PM +0300, Avi Kivity wrote:

 Using MSI works around the issue nicely, since interrupts are no
 longer shared.  I imagine SR-IOV also fixes the issue.

SR-IOV mandates the use of either MSI or MSI-X for VFs.

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
  xxx
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-28 Thread Yang, Sheng
On Wednesday 24 September 2008 17:51:24 Avi Kivity wrote:
 Yang, Sheng wrote:
  2. shared guest pci interrupts
 
  That's a correctness issue.  No matter how many interrupts we have, we
  may have sharing issues.  Of course with only three the issue is very
  pressing since we will get sharing with just a few devices.  Currently
  if two assigned devices share a guest interrupts, or if an emulated
  device shares an interrupt with an assigned device, things will break.
 
  They need to be fixed independently.
 
  About the second issue, I don't understand how it would break... Would
  you please give more details on this? It's a QEmu bug or IOAPIC bug?

 It's a kernel bug.

 Both the device assignment code and KVM_SET_IRQ ioctl() call
 kvm_set_irq(), so the last one wins.  We need logical-OR mixing between
 the various sources.  Just like pci_set_irq() in qemu, only for the kernel.

 Userspace is one source, each assigned device irq is a separate source.

I am working on this now.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Wednesday 24 September 2008 16:38:35 Avi Kivity wrote:
 Yang, Sheng wrote:
  - Shared Interrupt support
 
  I still don't know who would do this. It's very important for VT-d real
  usable. If nobody interested in it, I would pick it up, but after Oct. 6
  (after National Holiday in China).

 Shared host interrupts?  What's your plan here?  The polarity trick?

Hi, Avi

After check host shared interrupts situation, I got a question here:

If I understand correctly, current solution don't block host shared irq, just 
come with the performance pentry. The penalty come with host disabled irq 
line for a period. We have to wait guest to write EOI. But I fail to see the 
correctness problem here (except a lot of spurious interrupt in the guest).

I've checked mail, but can't find clue about that. Can you explain the 
situation?

Thanks!
--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity

Yang, Sheng wrote:

After check host shared interrupts situation, I got a question here:

If I understand correctly, current solution don't block host shared irq, just 
come with the performance pentry. The penalty come with host disabled irq 
line for a period. We have to wait guest to write EOI. But I fail to see the 
correctness problem here (except a lot of spurious interrupt in the guest).


I've checked mail, but can't find clue about that. Can you explain the 
situation?


  


If the guest fails to disable interrupts on a device that shares an 
interrupt line with the host, the host will experience an interrupt 
flood.  Eventually the host will disable the host device as well.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From:Avi Kivity
Sent: 2008年9月27日 17:50

Yang, Sheng wrote:
 After check host shared interrupts situation, I got a question here:

 If I understand correctly, current solution don't block host
shared irq, just
 come with the performance pentry. The penalty come with host
disabled irq
 line for a period. We have to wait guest to write EOI. But I
fail to see the
 correctness problem here (except a lot of spurious interrupt
in the guest).

 I've checked mail, but can't find clue about that. Can you
explain the
 situation?



If the guest fails to disable interrupts on a device that shares an
interrupt line with the host, the host will experience an interrupt
flood.  Eventually the host will disable the host device as well.


This issue also exists on host side, that one misbehaved driver
can hurt all other drivers sharing same irq line. But it seems no
good way to avoid it. Since not all devices support MSI, we still
need support irq sharing possibly with above caveats given.

Existing approach at least works with a sane guest driver, with
some performance penality there.

Or do you have better alternative?

Thanks,
Kevin


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie
Tian, Kevin wrote:
 From:Avi Kivity
 Sent: 2008年9月27日 17:50
 
 Yang, Sheng wrote:
 After check host shared interrupts situation, I got a
 question here: 
 
 If I understand correctly, current solution don't block
 host shared irq, just come with the performance pentry.
 The penalty come with host disabled irq line for a
 period. We have to wait guest to write EOI. But I fail
 to see the correctness problem here (except a lot of
 spurious interrupt in the guest).  
 
 I've checked mail, but can't find clue about that. Can
 you explain the situation? 
 
 
 
 If the guest fails to disable interrupts on a device
 that shares an interrupt line with the host, the host
 will experience an interrupt flood.  Eventually the host
 will disable the host device as well. 
 
 
 This issue also exists on host side, that one misbehaved
 driver can hurt all other drivers sharing same irq line.
 But it seems no good way to avoid it. Since not all
 devices support MSI, we still need support irq sharing
 possibly with above caveats given. 
 
 Existing approach at least works with a sane guest
 driver, with some performance penality there.
 
 Or do you have better alternative?
 
 Thanks,
 Kevin


MSI is always 1st choice. Including taking host MSI for guest IOAPIC situation 
because we don't if guest OS has MSI support but we are sure host Linux can.

When MSI is impossible, I recommend we disable device assignment for those 
sharing interrupt , or we assign all devices with same interrupt to same guest. 
Yes the issue is same in native, but in native the whole OS (kernel) is in same 
isolation domain, but now different guest has different isolation domain :(

In one world, MSI is pretty important for direct IO, and SR-IOV is #1 usage in 
future. Just advocate more and wish more people can ack the SR-IOV patch from 
ZhaoYU so that we can see 2.6,28 work for direct I/O without sacrificing 
sharing :)

Eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie

 I don't see how this relates to shared guest interrupts. 
 Whatever you have on the host side, you still need to
 support shared guest interrupts.  The only way to avoid
 the issue is by using MSI for the guest, and even then we
 still have to support interrupt sharing since not all
 guests have MSI support. 

Yes, but guest sharing is easy to solve by emulating NAND gate of PCI
line. 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From: Dong, Eddie
Sent: 2008年9月28日 10:04

Tian, Kevin wrote:
 From:Avi Kivity
 Sent: 2008年9月27日 17:50

 Yang, Sheng wrote:
 After check host shared interrupts situation, I got a
 question here:

 If I understand correctly, current solution don't block
 host shared irq, just come with the performance pentry.
 The penalty come with host disabled irq line for a
 period. We have to wait guest to write EOI. But I fail
 to see the correctness problem here (except a lot of
 spurious interrupt in the guest).

 I've checked mail, but can't find clue about that. Can
 you explain the situation?



 If the guest fails to disable interrupts on a device
 that shares an interrupt line with the host, the host
 will experience an interrupt flood.  Eventually the host
 will disable the host device as well.


 This issue also exists on host side, that one misbehaved
 driver can hurt all other drivers sharing same irq line.
 But it seems no good way to avoid it. Since not all
 devices support MSI, we still need support irq sharing
 possibly with above caveats given.

 Existing approach at least works with a sane guest
 driver, with some performance penality there.

 Or do you have better alternative?

 Thanks,
 Kevin


MSI is always 1st choice. Including taking host MSI for guest
IOAPIC situation because we don't if guest OS has MSI support
but we are sure host Linux can.

When MSI is impossible, I recommend we disable device
assignment for those sharing interrupt , or we assign all
devices with same interrupt to same guest. Yes the issue is
same in native, but in native the whole OS (kernel) is in same
isolation domain, but now different guest has different
isolation domain :(

In one world, MSI is pretty important for direct IO, and
SR-IOV is #1 usage in future. Just advocate more and wish more
people can ack the SR-IOV patch from ZhaoYU so that we can see
2.6,28 work for direct I/O without sacrificing sharing :)


Yes, irq sharing is most tricky stuff, and hard to make it
architectureally clean. Besides irq storm mentioned by Avi,
driver timeout or device buffer overflow is also subtle to be
intervened by the guest sharing irq. Guest inter-dependency
can impact shared irq handling too. If people do care those
issues that known irq sharing approaches can't address,
your recommendation looks making sense.

Thanks
Kevin
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f

Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity
Tian, Kevin wrote:

 If the guest fails to disable interrupts on a device that shares an
 interrupt line with the host, the host will experience an interrupt
 flood.  Eventually the host will disable the host device as well.

 

 This issue also exists on host side, that one misbehaved driver
 can hurt all other drivers sharing same irq line. 

There is no issue on the host, since all drivers operate on the same
trust level. A misbehaving driver on the host will take down the entire
system even without shared interrupts, by corrupting memory, not
releasing a lock, etc.

But if you move a driver to the guest, you expect it will be isolated
from the rest of the system, and if there are shared interrupts, it isn't.

 But it seems no
 good way to avoid it. Since not all devices support MSI, we still
 need support irq sharing possibly with above caveats given.

 Existing approach at least works with a sane guest driver, with
 some performance penality there.

   

How can we recommend it to users? We tell them, your guests are isolated
and secure as long as they don't misbehave?

 Or do you have better alternative?
   

No. Maybe the Neocleus polarity trick (which also reduces performance).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity

Dong, Eddie wrote:
I don't see how this relates to shared guest interrupts. 
Whatever you have on the host side, you still need to

support shared guest interrupts.  The only way to avoid
the issue is by using MSI for the guest, and even then we
still have to support interrupt sharing since not all
guests have MSI support. 



Yes, but guest sharing is easy to solve by emulating NAND gate of PCI
line. 
  


Certainly, it isn't difficult.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Tian, Kevin
From: Avi Kivity [mailto:[EMAIL PROTECTED]
Sent: 2008年9月28日 12:23

There is no issue on the host, since all drivers operate on the same
trust level. A misbehaving driver on the host will take down the entire
system even without shared interrupts, by corrupting memory, not
releasing a lock, etc.

But if you move a driver to the guest, you expect it will be isolated
from the rest of the system, and if there are shared
interrupts, it isn't.


Yes, you're right

 Or do you have better alternative?


No. Maybe the Neocleus polarity trick (which also reduces performance).


To my knowledge, Neocleus polarity trick can't solve this isolation
issue, which just provides one effecient way to track assertion/deassertion
transition on the irq line. For example, reverse polarity when receiving an
instance, and then a new irq instance would occur when all devices de-
assert on shared irq line, and then recover the polarity. In your concerned
case where guest driver misbehaves, this polarity trick can't work neither
as one device always asserts the line.

Thanks,
Kevin
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f

Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Avi Kivity
Tian, Kevin wrote:

 No. Maybe the Neocleus polarity trick (which also reduces performance).
 

 To my knowledge, Neocleus polarity trick can't solve this isolation
 issue, which just provides one effecient way to track assertion/deassertion
 transition on the irq line. For example, reverse polarity when receiving an
 instance, and then a new irq instance would occur when all devices de-
 assert on shared irq line, and then recover the polarity. In your concerned
 case where guest driver misbehaves, this polarity trick can't work neither
 as one device always asserts the line.
   

You're right, I didn't think it through.

If there was a standard way to mask pci irqs, it might have worked, but
there isn't, unfortunately.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Sunday 28 September 2008 13:04:06 Avi Kivity wrote:
 Tian, Kevin wrote:
  No. Maybe the Neocleus polarity trick (which also reduces performance).
 
  To my knowledge, Neocleus polarity trick can't solve this isolation
  issue, which just provides one effecient way to track
  assertion/deassertion transition on the irq line. For example, reverse
  polarity when receiving an instance, and then a new irq instance would
  occur when all devices de- assert on shared irq line, and then recover
  the polarity. In your concerned case where guest driver misbehaves, this
  polarity trick can't work neither as one device always asserts the line.

 You're right, I didn't think it through.

 If there was a standard way to mask pci irqs, it might have worked, but
 there isn't, unfortunately.

What if we got a way to mask pci irqs? We also have to unmask pci irq when 
guest wrote EOI to vlapic(or at any other time). I think this still cause 
problem. The problem is, we don't know if guest would deassert the line. 
Maybe add some time-based detection here might work?

And about the mask of pci irq, how about disable PCI device interrupt using 
Device Control Register bit 10? Not sure if it would affect the pending 
transaction, also not sure all device support this (though they should 
support).

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Dong, Eddie
Avi Kivity wrote:
 Dong, Eddie wrote:
 I don't see how this relates to shared guest interrupts.
 Whatever you have on the host side, you still need to
 support shared guest interrupts.  The only way to avoid
 the issue is by using MSI for the guest, and even then
 we still have to support interrupt sharing since not
 all guests have MSI support. 
 
 
 Yes, but guest sharing is easy to solve by emulating
 NAND gate of PCI line. 
 
 
 Certainly, it isn't difficult.

BTW, Did u have a look at SR-IOV patch? It address both Xen  KVM.
Thx, eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-27 Thread Yang, Sheng
On Sunday 28 September 2008 13:04:06 Avi Kivity wrote:
 Tian, Kevin wrote:
  No. Maybe the Neocleus polarity trick (which also reduces performance).
 
  To my knowledge, Neocleus polarity trick can't solve this isolation
  issue, which just provides one effecient way to track
  assertion/deassertion transition on the irq line. For example, reverse
  polarity when receiving an instance, and then a new irq instance would
  occur when all devices de- assert on shared irq line, and then recover
  the polarity. In your concerned case where guest driver misbehaves, this
  polarity trick can't work neither as one device always asserts the line.

 You're right, I didn't think it through.

 If there was a standard way to mask pci irqs, it might have worked, but
 there isn't, unfortunately.

One purpose:

If we suffered from IRQ storm of one level triggered irq line, two possible: 
host issue or guest issue.

If it's a host issue, host should try to stop it. If it can't, the IRQ line 
would be disabled, and guest device also isn't functional. 

If it's a guest issue, guest should try to stop it, and prevent it from 
causing trouble in host. KVM should try best including disable guest device 
to do this. So guest device also won't functional.

Base on above theory, we can assume that IRQ storm caused by assigned guest 
device, and try to stop device from doing this. (Yeah, anyway, guest device 
won't survive).

I think we can brought a little QoS concept here(stolen from Eddie :) ). The 
assumption is, the normal rate of device deliver interrupts is much slower 
than a continuous level trigger if the EOI is wrote immediately. So we can do 
something with the gap.

Measure the calling rate of our irq handler, if it's exceed some reasonable 
threshold, KVM would try to stop guest device for a while (even it don't know 
if the guest device cause this).

First to try set interrupt disable bit in Device Control Register, wait for a 
period of time, then check again.

If the irq strom can't be stopped, KVM try a more aggressive way: Do the 
Function Level Reset. It's should be the end of device's life...

Oh, of course, if even FLR didn't solve the IRQ storm, that's host's issue. 
Let's wait host to disable the IRQ line - of course, the guest device can't 
be recovered too.

It's just a initial purpose, I think it may work. The problem is if the gap is 
easy to catch... But at least, I think a physical continuous one should be 
much different from any working ones...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Remaining passthrough/VT-d tasks list

2008-09-24 Thread Han, Weidong
Hi all,

The initial passthrough/VT-d patches have been in kvm, it's time to
enhance it, and push them into 2.6.28.

Following is the remaining passthrough/VT-d tasks list:

- Multiple devices assignment (WIP)
- MSI support (WIP)
- MTRR/PAT support of EPT (WIP)
- MTRR/PAT support of shadow (WIP)
- Basic FLR support (WIP)
(Above tasks are working in process, some patches have been sent out,
others will be sent out in near future)
- architecture independent (such as x86, IPF)
- Shared Interrupt support
- Add dummy driver to hide/unbind passthrough device from host
kernel

If I omit some good features or you have some good proposals, please
feel free to add them to this list.
If you are interest in any tasks, please reply the mail directly and let
other guys to know your progress. Appreciate any effort from you!


Randy (Weidong)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Yang, Sheng
On Wednesday 24 September 2008 14:15:15 Han, Weidong wrote:
 Hi all,

 The initial passthrough/VT-d patches have been in kvm, it's time to enhance
 it, and push them into 2.6.28.


Some supplements:

 Following is the remaining passthrough/VT-d tasks list:

 - Multiple devices assignment (WIP)

Weidong is working on this.

 - MSI support (WIP)
 - MTRR/PAT support of EPT (WIP)
 - MTRR/PAT support of shadow (WIP)
 - Basic FLR support (WIP)

Above four are my works. All of them work now. But more job should be done to 
polish the patches. And the main part of Function Level Reset would be picked 
by linux-pci. 

Another thing is we would send out/update above patches before Sept. 28, and 
hope they can picked by 2.6.28 merge window.

Avi, what's your opinion? Of course we would work hard. :) But what's the 
deadline of merge window? 

 (Above tasks are working in process, some patches have been sent out,
 others will be sent out in near future) - architecture independent (such as
 x86, IPF)
 - Shared Interrupt support

I still don't know who would do this. It's very important for VT-d real 
usable. If nobody interested in it, I would pick it up, but after Oct. 6
(after National Holiday in China).

--
regards
Yang, Sheng

 - Add dummy driver to hide/unbind passthrough device from host
 kernel

 If I omit some good features or you have some good proposals, please feel
 free to add them to this list. If you are interest in any tasks, please
 reply the mail directly and let other guys to know your progress.
 Appreciate any effort from you!


 Randy (Weidong)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Amit Shah
* On Wednesday 24 Sep 2008 13:21:25 Han, Weidong wrote:
 Amit Shah wrote:
 
 - Add dummy driver to hide/unbind passthrough device from
 host kernel

  This isn't needed; we currently don't assign the device to the guest
  if we find that a driver is already loaded. I intend to change it to
  failing guest start altogether in case we find a module already using
  a device. When a guest exits, we release all the structures and hence
  even unloading kvm is not needed to reclaim the device on the host
  side.

 This task needn't targe 2.6.28. For long term, we need it to make device
 assignment more user friendly.

How is the current scheme not user friendly? Or, how will adding a dummy 
driver be more user friendly?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Han, Weidong wrote:

Hi all,

The initial passthrough/VT-d patches have been in kvm, it's time to
enhance it, and push them into 2.6.28.

- Shared Interrupt support
  


Shared guest interrupts is a prerequisite for merging into mainline.  
Without this, device assignment is useless in anything but a benchmark 
scenario.  I won't push device assignment for 2.6.28 without it.


Shared host interrupts are a different matter; which one did you mean?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Han, Weidong
Amit Shah wrote:
 * On Wednesday 24 Sep 2008 13:21:25 Han, Weidong wrote:
 Amit Shah wrote:
 
 - Add dummy driver to hide/unbind passthrough device from
 host kernel
 
 This isn't needed; we currently don't assign the device to the guest
 if we find that a driver is already loaded. I intend to change it to
 failing guest start altogether in case we find a module already
 using a device. When a guest exits, we release all the structures
 and hence even unloading kvm is not needed to reclaim the device on
 the host side.
 
 This task needn't targe 2.6.28. For long term, we need it to make
 device assignment more user friendly.
 
 How is the current scheme not user friendly? Or, how will adding a
 dummy driver be more user friendly?

We had some discussion on this few months ago. Currently, users need to
remove device driver before assignment. If there are more than one same
type devices, removing driver makes them cannot work at the same time,
even though user just want to assign one of them to guest. Note that not
all drivers support unbind function. If we can provide a mechanism to
hide single device independently, e.g, implement a dummy driver to own
devices that user want to assign to guest. I think it's more friendly to
end user than remove/unbind driver manually.

Randy (Weidong)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Yang, Sheng wrote:



- MSI support (WIP)
- MTRR/PAT support of EPT (WIP)
- MTRR/PAT support of shadow (WIP)
- Basic FLR support (WIP)



Above four are my works. All of them work now. But more job should be done to 
polish the patches. And the main part of Function Level Reset would be picked 
by linux-pci. 

Another thing is we would send out/update above patches before Sept. 28, and 
hope they can picked by 2.6.28 merge window.


Avi, what's your opinion? Of course we would work hard. :) But what's the 
deadline of merge window? 

  


No one knows, but it's very unlikely these features will make it for 
2.6.28.  To be merged, it is not sufficient for the patches to be 
ready.  They have to undergo some testing in the field.




- Shared Interrupt support



I still don't know who would do this. It's very important for VT-d real 
usable. If nobody interested in it, I would pick it up, but after Oct. 6

(after National Holiday in China).
  


Shared host interrupts?  What's your plan here?  The polarity trick?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Han, Weidong wrote:

- Add dummy driver to hide/unbind passthrough device from host
kernel
  



Maybe this can be implemented at the modprobe/hotplug level.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Yang, Sheng
On Wednesday 24 September 2008 16:34:22 Avi Kivity wrote:
 Han, Weidong wrote:
  Hi all,
 
  The initial passthrough/VT-d patches have been in kvm, it's time to
  enhance it, and push them into 2.6.28.
 
- Shared Interrupt support

 Shared guest interrupts is a prerequisite for merging into mainline.
 Without this, device assignment is useless in anything but a benchmark
 scenario.  I won't push device assignment for 2.6.28 without it.

 Shared host interrupts are a different matter; which one did you mean?


Got confused...

I think we are talking about share host interrupts, that is pre-assigned 
device shared IRQ with other devices. 

Why share guest interrupts is a prerequisite... 

--
regards
Yang, Sheng

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Yang, Sheng
On Wednesday 24 September 2008 16:38:35 Avi Kivity wrote:
 Yang, Sheng wrote:
  - MSI support (WIP)
  - MTRR/PAT support of EPT (WIP)
  - MTRR/PAT support of shadow (WIP)
  - Basic FLR support (WIP)
 
  Above four are my works. All of them work now. But more job should be
  done to polish the patches. And the main part of Function Level Reset
  would be picked by linux-pci.
 
  Another thing is we would send out/update above patches before Sept. 28,
  and hope they can picked by 2.6.28 merge window.
 
  Avi, what's your opinion? Of course we would work hard. :) But what's the
  deadline of merge window?

 No one knows, but it's very unlikely these features will make it for
 2.6.28.  To be merged, it is not sufficient for the patches to be
 ready.  They have to undergo some testing in the field.

..

  - Shared Interrupt support
 
  I still don't know who would do this. It's very important for VT-d real
  usable. If nobody interested in it, I would pick it up, but after Oct. 6
  (after National Holiday in China).

 Shared host interrupts?  What's your plan here?  The polarity trick?

Yeah, share host interrupts. But haven't got the very clear idea yet. 

-- 
regards
Yang, Sheng


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Amit Shah wrote:

I'd say we have about 3 weeks to get things in.

  


How do you figure? 2.6.26 was released July 13, we're more than 2.5 
months later.


Furthermore, I'm not queueing untested patches for 2.6.28 at this time.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
  - Add dummy driver to hide/unbind passthrough device from host
 kernel 
 
 
 
 Maybe this can be implemented at the modprobe/hotplug level.

I think so.

Randy (Weidong)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Han, Weidong wrote:

We had some discussion on this few months ago. Currently, users need to
remove device driver before assignment. If there are more than one same
type devices, removing driver makes them cannot work at the same time,
even though user just want to assign one of them to guest. Note that not
all drivers support unbind function. If we can provide a mechanism to
hide single device independently, e.g, implement a dummy driver to own
devices that user want to assign to guest. I think it's more friendly to
end user than remove/unbind driver manually.
  


That's a good point -- multiple devices with the same driver.

We may need a kernel parameter as well, for built-in drivers.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Yang, Sheng wrote:

Shared guest interrupts is a prerequisite for merging into mainline.
Without this, device assignment is useless in anything but a benchmark
scenario.  I won't push device assignment for 2.6.28 without it.

Shared host interrupts are a different matter; which one did you mean?




Got confused...

I think we are talking about share host interrupts, that is pre-assigned 
device shared IRQ with other devices. 

Why share guest interrupts is a prerequisite... 
  


We only have three pci interrupts at this point (though this could be 
easily extended); if you start the guest with a non-trivial number of 
devices, you will have shared guest interrupts.


(of course, when I pointed this out during review, people said it could 
be done later, then forgot all about it)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Yang, Sheng
On Wednesday 24 September 2008 16:53:15 Avi Kivity wrote:
 Yang, Sheng wrote:
  Shared guest interrupts is a prerequisite for merging into mainline.
  Without this, device assignment is useless in anything but a benchmark
  scenario.  I won't push device assignment for 2.6.28 without it.
 
  Shared host interrupts are a different matter; which one did you mean?
 
  Got confused...
 
  I think we are talking about share host interrupts, that is pre-assigned
  device shared IRQ with other devices.
 
  Why share guest interrupts is a prerequisite...

 We only have three pci interrupts at this point (though this could be
 easily extended); if you start the guest with a non-trivial number of
 devices, you will have shared guest interrupts.

 (of course, when I pointed this out during review, people said it could
 be done later, then forgot all about it)

.. 

I think it's a performance issue, not break it? How about do it like Xen side? 
Try best to avoid the share, extended the pci interrupts, improve hash 
algorithm. Is there anything else we can do?

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Han, Weidong wrote:

Avi Kivity wrote:
  

Han, Weidong wrote:


- Add dummy driver to hide/unbind passthrough device from host
kernel 

  

Maybe this can be implemented at the modprobe/hotplug level.



I think so.
  


I'm not sure now -- after I saw the point about a driver binding to two 
devices.


Perhaps the deeper fix is to separate driver loading from binding to 
devices (or maybe it's separated already, but not exposed)?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Yang, Sheng wrote:

We only have three pci interrupts at this point (though this could be
easily extended); if you start the guest with a non-trivial number of
devices, you will have shared guest interrupts.

(of course, when I pointed this out during review, people said it could
be done later, then forgot all about it)


. 

I think it's a performance issue, not break it? How about do it like Xen side? 
Try best to avoid the share, extended the pci interrupts, improve hash 
algorithm. Is there anything else we can do?
  



Two separate issues:

1. only three guest pci interrupts

That's a performance issue, not correctness.  can be fixed by using gsi 
16-23 in APIC mode, and by adding another IOAPIC (so we can use gsi 
16-47).  Anthony Xu posted some patches for this, not sure where this 
stands, but it was the right approach.


2. shared guest pci interrupts

That's a correctness issue.  No matter how many interrupts we have, we 
may have sharing issues.  Of course with only three the issue is very 
pressing since we will get sharing with just a few devices.  Currently 
if two assigned devices share a guest interrupts, or if an emulated 
device shares an interrupt with an assigned device, things will break.


They need to be fixed independently.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Yang, Sheng
On Wednesday 24 September 2008 17:22:53 Avi Kivity wrote:
 Yang, Sheng wrote:
  We only have three pci interrupts at this point (though this could be
  easily extended); if you start the guest with a non-trivial number of
  devices, you will have shared guest interrupts.
 
  (of course, when I pointed this out during review, people said it could
  be done later, then forgot all about it)
 
  .
 
  I think it's a performance issue, not break it? How about do it like Xen
  side? Try best to avoid the share, extended the pci interrupts, improve
  hash algorithm. Is there anything else we can do?

 Two separate issues:

 1. only three guest pci interrupts

 That's a performance issue, not correctness.  can be fixed by using gsi
 16-23 in APIC mode, and by adding another IOAPIC (so we can use gsi
 16-47).  Anthony Xu posted some patches for this, not sure where this
 stands, but it was the right approach.

 2. shared guest pci interrupts

 That's a correctness issue.  No matter how many interrupts we have, we
 may have sharing issues.  Of course with only three the issue is very
 pressing since we will get sharing with just a few devices.  Currently
 if two assigned devices share a guest interrupts, or if an emulated
 device shares an interrupt with an assigned device, things will break.

 They need to be fixed independently.

About the second issue, I don't understand how it would break... Would you 
please give more details on this? It's a QEmu bug or IOAPIC bug?

-- 
regards
Yang, Sheng

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Amit Shah
* On Wednesday 24 Sep 2008 14:08:14 Han, Weidong wrote:
 Amit Shah wrote:
  * On Wednesday 24 Sep 2008 13:21:25 Han, Weidong wrote:
  Amit Shah wrote:
  - Add dummy driver to hide/unbind passthrough device from
  host kernel
 
  This isn't needed; we currently don't assign the device to the guest
  if we find that a driver is already loaded. I intend to change it to
  failing guest start altogether in case we find a module already
  using a device. When a guest exits, we release all the structures
  and hence even unloading kvm is not needed to reclaim the device on
  the host side.
 
  This task needn't targe 2.6.28. For long term, we need it to make
  device assignment more user friendly.
 
  How is the current scheme not user friendly? Or, how will adding a
  dummy driver be more user friendly?

 We had some discussion on this few months ago. Currently, users need to
 remove device driver before assignment. If there are more than one same
 type devices, removing driver makes them cannot work at the same time,
 even though user just want to assign one of them to guest. Note that not
 all drivers support unbind function. If we can provide a mechanism to
 hide single device independently, e.g, implement a dummy driver to own
 devices that user want to assign to guest. I think it's more friendly to
 end user than remove/unbind driver manually.

This needs a change in the driver core and it definitely won't be solved by 
having a dummy device. We have to have a way to signal to modules that a 
particular device will now be owned by a different module, even if the 
current module thinks it is the sole owner.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Amit Shah
* On Wednesday 24 Sep 2008 14:16:47 Avi Kivity wrote:
 Amit Shah wrote:
  I'd say we have about 3 weeks to get things in.

 How do you figure? 2.6.26 was released July 13, we're more than 2.5
 months later.

A week for 2.6.28 to open and two weeks for the rc1 window.

 Furthermore, I'm not queueing untested patches for 2.6.28 at this time.

Of course, I'm not advocating this! If they're tested by Intel, we can push 
them in.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Avi Kivity

Amit Shah wrote:

* On Wednesday 24 Sep 2008 14:16:47 Avi Kivity wrote:
  

Amit Shah wrote:


I'd say we have about 3 weeks to get things in.
  

How do you figure? 2.6.26 was released July 13, we're more than 2.5
months later.



A week for 2.6.28 to open and two weeks for the rc1 window.

  

Furthermore, I'm not queueing untested patches for 2.6.28 at this time.



Of course, I'm not advocating this! If they're tested by Intel, we can push 
them in.
  


No, the patches have to be in my tree some time before the merge window 
opens.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Han, Weidong
Amit Shah wrote:
 * On Wednesday 24 Sep 2008 14:08:14 Han, Weidong wrote:
 Amit Shah wrote:
 * On Wednesday 24 Sep 2008 13:21:25 Han, Weidong wrote:
 Amit Shah wrote:
 - Add dummy driver to hide/unbind passthrough device from
 host kernel
 
 This isn't needed; we currently don't assign the device to the
 guest if we find that a driver is already loaded. I intend to
 change it to failing guest start altogether in case we find a
 module already using a device. When a guest exits, we release all
 the structures and hence even unloading kvm is not needed to
 reclaim the device on the host side.
 
 This task needn't targe 2.6.28. For long term, we need it to make
 device assignment more user friendly.
 
 How is the current scheme not user friendly? Or, how will adding a
 dummy driver be more user friendly?
 
 We had some discussion on this few months ago. Currently, users need
 to remove device driver before assignment. If there are more than
 one same type devices, removing driver makes them cannot work at the
 same time, even though user just want to assign one of them to
 guest. Note that not all drivers support unbind function. If we can
 provide a mechanism to hide single device independently, e.g,
 implement a dummy driver to own devices that user want to assign to
 guest. I think it's more friendly to end user than remove/unbind
 driver manually. 
 
 This needs a change in the driver core and it definitely won't be
 solved by having a dummy device. We have to have a way to signal to
 modules that a particular device will now be owned by a different
 module, even if the current module thinks it is the sole owner.

The assigned devices are only owned by the dummy driver. Like Xen,
pciback owns the assignable devices via adding option
'pciback.hide=(bus:dev:func)' in grub, that means device(bus:dev:func)
driver won't be loaded. Then user can assign these hidden devices. 

Randy (Weidong)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Han, Weidong
Avi Kivity wrote:
 Amit Shah wrote:
 * On Wednesday 24 Sep 2008 14:16:47 Avi Kivity wrote:
 
 Amit Shah wrote:
 
 I'd say we have about 3 weeks to get things in.
 
 How do you figure? 2.6.26 was released July 13, we're more than 2.5
 months later. 
 
 
 A week for 2.6.28 to open and two weeks for the rc1 window.
 
 
 Furthermore, I'm not queueing untested patches for 2.6.28 at this
 time. 
 
 
 Of course, I'm not advocating this! If they're tested by Intel, we
 can push them in. 
 
 
 No, the patches have to be in my tree some time before the merge
 window opens.

I agree patches need sufficient testing before merge to mainline.
Anyway, let's try best to improve passthrough/VT-d code quality and make
it stable asap. 

Randy (Weidong)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Anthony Liguori

Avi Kivity wrote:

Han, Weidong wrote:

- Add dummy driver to hide/unbind passthrough device from host
kernel
  



Maybe this can be implemented at the modprobe/hotplug level.


Wouldn't you just blacklist the devices in the host and call it a day?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Remaining passthrough/VT-d tasks list

2008-09-24 Thread Dong, Eddie
Avi Kivity wrote:
 Han, Weidong wrote:
 Hi all,
 
 The initial passthrough/VT-d patches have been in kvm,
 it's time to enhance it, and push them into 2.6.28.
 
  - Shared Interrupt support
 
 
 Shared guest interrupts is a prerequisite for merging
 into mainline. Without this, device assignment is useless
 in anything but a benchmark scenario.  I won't push
 device assignment for 2.6.28 without it. 
 
 Shared host interrupts are a different matter; which one
 did you mean? 
 
Avi:
How about we think in other way? The top usage model of IOMMU is
SR-IOV in my mind, at least for enterprise usage model. We are pushing
the SR-IOV patch for 2.6.28, and are continuously polishing the patch.
Even if it missed the 2.6.28 merge windows (unlikely?), we could be able
to ask OSVs to take the SR-IOV patch seperately before code froze since
it is very small, but it is hard to ask for taking whole IOMMU patches.

In Xen side, IOMMU is there, MSI-x is there, so SR-IOV patch is
the only one missed to enable SR-IOV. In KVM side, very likely we can
get MSI patch down soon before chinese holiday, and we of course will
spend tons of effort in qualities too. Should we target this? If yes, we
put MSI patch and push 2.6.28 as 1st priority. We would be able to see
next major release of VMM using KVM have HW IO virtualization
technology: Close to native performance, non sacriface of IO sharing,
minimal CPU utilization etc.
For those legacy PCI pass thru support, we can continue improve
it too.
Thanks, eddie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html