Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-06 Thread Avi Kivity
Alexander Graf wrote:
>> Marcelo Tosatti wrote:
>>> Add three PCI bridges to support 128 slots.
>>>
>>> Changes since v1:
>>> - Remove I/O address range "support" (so standard PCI I/O space is 
>>> used).
>>> - Verify that there's no special quirks for 82801 PCI bridge.
>>> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>>>
>>>
>>
>> I've cooled off on the 128 slot stuff, mainly because most real hosts
>> don't have them. An unusual configuration will likely lead to problems
>> as most guest OSes and workloads will not have been tested thoroughly
>> with them.
>
> This is more of a "let's do this conditionally" than a "let's not do 
> it" reason imho.

Yes. More precisely, let's not do it until we're sure it works and performs.

I don't think a queue-per-disk approach will perform well, since the 
queue will always be very short and will not be able to amortize exit 
costs and ring management overhead very well.

>> - it requires a large number of interrupts, which are difficult to
>> provide, and which it is hard to ensure all OSes support. MSI is
>> relatively new.
>
> We could just as well extend the device layout to have every device be 
> attached to one virtual IOAPIC pin, so we'd have like 128 / 4 = 32 
> IOAPICs in the system and one interrupt for each device.

That's problematic for these reasons:

- how many OSes work well with 32 IOAPICs?
- at one point, you run out of interrupt vectors (~ 220 per cpu if the 
OS can allocate per-cpu vectors; otherwise just ~220)
- you will have many interrupts fired, each for a single device with a 
few requests, reducing performance

>> - is only a few interrupts are available, then each interrupt requires
>> scanning a large number of queues
>
> This case should be rare, basically only existent with OSs that don't 
> support APIC properly.
>

Hopefully.

>> The alternative approach of having the virtio block device control up to
>> 16 disks allows having those 80 disks with just 5 slots (and 5
>> interrupts). This is similar to the way traditional SCSI controllers
>> behave, and so should not surprise the guest OS.
>
> The one thing I'm actually really missing here is use cases. What are 
> we doing this for? And further along the line, are there other 
> approaches to the problems for which this was supposed to be a 
> solution? Maybe someone can raise a case where it's not virtblk / 
> virtnet.

The requirement for lots of storage is a given. There are two ways of 
doing that, paying a lot of money to EMC or NetApp for a storage 
controller, or connecting lots of disks directly and doing the storage 
controller on the OS (what EMC and NetApp do anyway, inside their 
boxes). zfs is a good example of a use case, and I'd guess databases 
could use this too if they were able to supply the redundancy.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-06 Thread Avi Kivity
Anthony Liguori wrote:
> Avi Kivity wrote:
>> Marcelo Tosatti wrote:
>>  
>>> Add three PCI bridges to support 128 slots.
>>>
>>> Changes since v1:
>>> - Remove I/O address range "support" (so standard PCI I/O space is 
>>> used).
>>> - Verify that there's no special quirks for 82801 PCI bridge.
>>> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>>>
>>>   
>>
>> I've cooled off on the 128 slot stuff, mainly because most real hosts 
>> don't have them.  An unusual configuration will likely lead to 
>> problems as most guest OSes and workloads will not have been tested 
>> thoroughly with them.
>>
>> - it requires a large number of interrupts, which are difficult to 
>> provide, and which it is hard to ensure all OSes support.  MSI is 
>> relatively new.
>> - is only a few interrupts are available, then each interrupt 
>> requires scanning a large number of queues
>>
>> If we are to do this, then we need better tests than "80 disks show up".
>>
>> The alternative approach of having the virtio block device control up 
>> to 16 disks allows having those 80 disks with just 5 slots (and 5 
>> interrupts).  This is similar to the way traditional SCSI controllers 
>> behave, and so should not surprise the guest OS.
>>   
>
> If you have a single virtio-blk device that shows up as 8 functions, 
> we could achieve the same thing.  We can cheat with the interrupt 
> handlers to avoid cache line bouncing too.  

You can't cheat on all guests, and even on Linux, it's better to keep on 
doing what real hardware does than go off on a tangent than no one else 
uses.

You'll have to cheat on ->kick(), too.  Virtio needs one exit per 
O(queue depth).  With one spindle per ring, it doesn't make sense to 
have a queue depth > 4 (or latency goes to hell), so you have many exits.

> Plus, we can use PCI hotplug so we don't have to reinvent a new 
> hotplug mechanism.

You can plug disks into a Fibre Channel mesh, so presumably that works 
on real hardware somehow.

>
> I'm inclined to think that ring sharing isn't as useful as it seems as 
> long as we don't have indirect scatter gather lists.

I agree, but I think that indirect sg is very important for storage:

- a long sg list is cheap from the disk's point of view (the seeks are 
what's expensive)
- it is important to keep the queue depth meaningful and small 
(O(spindles * 3)), as it drastically affects latency

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-05 Thread Alexander Graf

On May 4, 2008, at 9:56 AM, Avi Kivity wrote:

> Marcelo Tosatti wrote:
>> Add three PCI bridges to support 128 slots.
>>
>> Changes since v1:
>> - Remove I/O address range "support" (so standard PCI I/O space is  
>> used).
>> - Verify that there's no special quirks for 82801 PCI bridge.
>> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>>
>>
>
> I've cooled off on the 128 slot stuff, mainly because most real hosts
> don't have them.  An unusual configuration will likely lead to  
> problems
> as most guest OSes and workloads will not have been tested thoroughly
> with them.

This is more of a "let's do this conditionally" than a "let's not do  
it" reason imho.

> - it requires a large number of interrupts, which are difficult to
> provide, and which it is hard to ensure all OSes support.  MSI is
> relatively new.

We could just as well extend the device layout to have every device be  
attached to one virtual IOAPIC pin, so we'd have like 128 / 4 = 32  
IOAPICs in the system and one interrupt for each device.

> - is only a few interrupts are available, then each interrupt requires
> scanning a large number of queues

This case should be rare, basically only existent with OSs that don't  
support APIC properly.

> If we are to do this, then we need better tests than "80 disks show  
> up".

True.

> The alternative approach of having the virtio block device control  
> up to
> 16 disks allows having those 80 disks with just 5 slots (and 5
> interrupts).  This is similar to the way traditional SCSI controllers
> behave, and so should not surprise the guest OS.

The one thing I'm actually really missing here is use cases. What are  
we doing this for? And further along the line, are there other  
approaches to the problems for which this was supposed to be a  
solution? Maybe someone can raise a case where it's not virtblk /  
virtnet.

Alex

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-05 Thread Anthony Liguori
Avi Kivity wrote:
> Marcelo Tosatti wrote:
>   
>> Add three PCI bridges to support 128 slots.
>>
>> Changes since v1:
>> - Remove I/O address range "support" (so standard PCI I/O space is used).
>> - Verify that there's no special quirks for 82801 PCI bridge.
>> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>>
>>   
>> 
>
> I've cooled off on the 128 slot stuff, mainly because most real hosts 
> don't have them.  An unusual configuration will likely lead to problems 
> as most guest OSes and workloads will not have been tested thoroughly 
> with them.
>
> - it requires a large number of interrupts, which are difficult to 
> provide, and which it is hard to ensure all OSes support.  MSI is 
> relatively new.
> - is only a few interrupts are available, then each interrupt requires 
> scanning a large number of queues
>
> If we are to do this, then we need better tests than "80 disks show up".
>
> The alternative approach of having the virtio block device control up to 
> 16 disks allows having those 80 disks with just 5 slots (and 5 
> interrupts).  This is similar to the way traditional SCSI controllers 
> behave, and so should not surprise the guest OS.
>   

If you have a single virtio-blk device that shows up as 8 functions, we 
could achieve the same thing.  We can cheat with the interrupt handlers 
to avoid cache line bouncing too.  Plus, we can use PCI hotplug so we 
don't have to reinvent a new hotplug mechanism.

I'm inclined to think that ring sharing isn't as useful as it seems as 
long as we don't have indirect scatter gather lists.

Regards,

Anthony Liguori



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-05 Thread Alexander Graf

On May 4, 2008, at 9:56 AM, Avi Kivity wrote:

> Marcelo Tosatti wrote:
>> Add three PCI bridges to support 128 slots.
>>
>> Changes since v1:
>> - Remove I/O address range "support" (so standard PCI I/O space is  
>> used).
>> - Verify that there's no special quirks for 82801 PCI bridge.
>> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>>
>>
>
> I've cooled off on the 128 slot stuff, mainly because most real hosts
> don't have them.  An unusual configuration will likely lead to  
> problems
> as most guest OSes and workloads will not have been tested thoroughly
> with them.

This is more of a "let's do this conditionally" than a "let's not do  
it" reason imho.

> - it requires a large number of interrupts, which are difficult to
> provide, and which it is hard to ensure all OSes support.  MSI is
> relatively new.

We could just as well extend the device layout to have every device be  
attached to one virtual IOAPIC pin, so we'd have like 128 / 4 = 32  
IOAPICs in the system and one interrupt for each device.

> - is only a few interrupts are available, then each interrupt requires
> scanning a large number of queues

This case should be rare, basically only existent with OSs that don't  
support APIC properly.

> If we are to do this, then we need better tests than "80 disks show  
> up".

True.

> The alternative approach of having the virtio block device control  
> up to
> 16 disks allows having those 80 disks with just 5 slots (and 5
> interrupts).  This is similar to the way traditional SCSI controllers
> behave, and so should not surprise the guest OS.

The one thing I'm actually really missing here is use cases. What are  
we doing this for? And further along the line, are there other  
approaches to the problems for which this was supposed to be a  
solution? Maybe someone can raise a case where it's not virtblk /  
virtnet.

Alex

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-04 Thread Avi Kivity
Marcelo Tosatti wrote:
> Add three PCI bridges to support 128 slots.
>
> Changes since v1:
> - Remove I/O address range "support" (so standard PCI I/O space is used).
> - Verify that there's no special quirks for 82801 PCI bridge.
> - Introduce separate flat IRQ mapping function for non-SPARC targets.
>
>   

I've cooled off on the 128 slot stuff, mainly because most real hosts 
don't have them.  An unusual configuration will likely lead to problems 
as most guest OSes and workloads will not have been tested thoroughly 
with them.

- it requires a large number of interrupts, which are difficult to 
provide, and which it is hard to ensure all OSes support.  MSI is 
relatively new.
- is only a few interrupts are available, then each interrupt requires 
scanning a large number of queues

If we are to do this, then we need better tests than "80 disks show up".

The alternative approach of having the virtio block device control up to 
16 disks allows having those 80 disks with just 5 slots (and 5 
interrupts).  This is similar to the way traditional SCSI controllers 
behave, and so should not surprise the guest OS.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 0/3] QEMU/KVM: add support for 128 PCI slots (v2)

2008-05-02 Thread Marcelo Tosatti
Add three PCI bridges to support 128 slots.

Changes since v1:
- Remove I/O address range "support" (so standard PCI I/O space is used).
- Verify that there's no special quirks for 82801 PCI bridge.
- Introduce separate flat IRQ mapping function for non-SPARC targets.


-- 


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel