Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Marcel, The findings are pretty consistent with what I identified. Although it looks like SeaBIOS fairs better than UEFI. Thanks for the headsup, will reply on the thread itself. Ray K -Original Message- From: Marcel Apfelbaum [mailto:mar...@redhat.com] Sent: Wednesday, August 9, 2017 3:53 AM To: Kinsella, Ray <ray.kinse...@intel.com>; Kevin O'Connor <ke...@koconnor.net> Cc: Tan, Jianfeng <jianfeng@intel.com>; seab...@seabios.org; Michael Tsirkin <m...@redhat.com>; qemu-devel@nongnu.org; Gerd Hoffmann <kra...@redhat.com> Subject: Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices On 07/08/2017 22:00, Kinsella, Ray wrote: > Hi Marcel, > Hi Ray, Please have a look on this thread, I think Laszlo and Paolo found the root cause. https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01368.html It seems hot-plugging the devices would not help. Thanks, MArcel > Yup - I am using Seabios by default. > I took all the measures from the Kernel time reported in syslog. > As Seabios wasn't exhibiting any obvious scaling problem. > > Ray K > > -Original Message- > From: Marcel Apfelbaum [mailto:mar...@redhat.com] > Sent: Wednesday, August 2, 2017 5:43 AM > To: Kinsella, Ray <ray.kinse...@intel.com>; Kevin O'Connor > <ke...@koconnor.net> > Cc: Tan, Jianfeng <jianfeng@intel.com>; seab...@seabios.org; Michael > Tsirkin <m...@redhat.com>; qemu-devel@nongnu.org; Gerd Hoffmann > <kra...@redhat.com> > Subject: Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices > > It is an issue worth looking into it, one more question, all the measurements > are from OS boot? Do you use SeaBIOS? > No problems with the firmware? > > Thanks, > Marcel > >
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 07/08/2017 22:00, Kinsella, Ray wrote: Hi Marcel, Hi Ray, Please have a look on this thread, I think Laszlo and Paolo found the root cause. https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01368.html It seems hot-plugging the devices would not help. Thanks, MArcel Yup - I am using Seabios by default. I took all the measures from the Kernel time reported in syslog. As Seabios wasn't exhibiting any obvious scaling problem. Ray K -Original Message- From: Marcel Apfelbaum [mailto:mar...@redhat.com] Sent: Wednesday, August 2, 2017 5:43 AM To: Kinsella, Ray <ray.kinse...@intel.com>; Kevin O'Connor <ke...@koconnor.net> Cc: Tan, Jianfeng <jianfeng@intel.com>; seab...@seabios.org; Michael Tsirkin <m...@redhat.com>; qemu-devel@nongnu.org; Gerd Hoffmann <kra...@redhat.com> Subject: Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices It is an issue worth looking into it, one more question, all the measurements are from OS boot? Do you use SeaBIOS? No problems with the firmware? Thanks, Marcel
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi Marcel, Yup - I am using Seabios by default. I took all the measures from the Kernel time reported in syslog. As Seabios wasn't exhibiting any obvious scaling problem. Ray K -Original Message- From: Marcel Apfelbaum [mailto:mar...@redhat.com] Sent: Wednesday, August 2, 2017 5:43 AM To: Kinsella, Ray <ray.kinse...@intel.com>; Kevin O'Connor <ke...@koconnor.net> Cc: Tan, Jianfeng <jianfeng@intel.com>; seab...@seabios.org; Michael Tsirkin <m...@redhat.com>; qemu-devel@nongnu.org; Gerd Hoffmann <kra...@redhat.com> Subject: Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices It is an issue worth looking into it, one more question, all the measurements are from OS boot? Do you use SeaBIOS? No problems with the firmware? Thanks, Marcel
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 25/07/2017 21:00, Kinsella, Ray wrote: Hi Marcel, Hi Ray, On 24/07/2017 00:14, Marcel Apfelbaum wrote: On 24/07/2017 7:53, Kinsella, Ray wrote: Even if I am not aware of how much time would take to init a bare-metal PCIe Root Port, it seems too much. So I repeated the testing for 64, 128, 256 and 512 ports. I ensured the configuration was sane, that 128 was twice the number of root ports and virtio-pci-net devices as 64. I got the following results - shown in seconds, as you can see it is non linear but not exponential, there is something that is not scaling well. 64128256512 PCIe Root Ports14724302672 ACPI4353423863 Loading Drivers1131621 Total Boot341378907516 ( I did try to test 1024 devices, but it just dies silently ) Ray K It is an issue worth looking into it, one more question, all the measurements are from OS boot? Do you use SeaBIOS? No problems with the firmware? Thanks, Marcel
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi Marcel, On 24/07/2017 00:14, Marcel Apfelbaum wrote: On 24/07/2017 7:53, Kinsella, Ray wrote: Even if I am not aware of how much time would take to init a bare-metal PCIe Root Port, it seems too much. So I repeated the testing for 64, 128, 256 and 512 ports. I ensured the configuration was sane, that 128 was twice the number of root ports and virtio-pci-net devices as 64. I got the following results - shown in seconds, as you can see it is non linear but not exponential, there is something that is not scaling well. 64 128 256 512 PCIe Root Ports 14 72 430 2672 ACPI4 35 342 3863 Loading Drivers 1 1 31 621 Total Boot 34 137 890 7516 ( I did try to test 1024 devices, but it just dies silently ) Ray K
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 24/07/2017 7:53, Kinsella, Ray wrote: Hi Ray, Thank you for the details, So as it turns out at 512 devices, it is nothing to do SeaBIOS, it was the Kernel again. It is taking quite a while to startup, a little over two hours (7489 seconds). The main culprits appear to be enumerating/initializing the PCI Express ports and enabling interrupts. The PCI Express Root Ports are taking a long time to enumerate/ initializing. 42 minutes in total=2579/60=64 ports in total, 40 seconds each. Even if I am not aware of how much time would take to init a bare-metal PCIe Root Port, it seems too much. [ 50.612822] pci_bus :80: root bus resource [bus 80-c1] [ 172.345361] pci :80:00.0: PCI bridge to [bus 81] ... [ 2724.734240] pci :80:08.0: PCI bridge to [bus c1] [ 2751.154702] ACPI: Enabled 2 GPEs in block 00 to 3F I assume the 1 hour (3827 seconds) below is being spent enabling interrupts. Assuming you are referring to legacy interrupts, maybe is possible to disable them and use only MSI/MSI-X for PCIe Root Ports (based on user input, we can't disable INTx for all the ports) [ 2899.394288] ACPI: PCI Interrupt Link [GSIG] enabled at IRQ 22 [ 2899.531324] ACPI: PCI Interrupt Link [GSIH] enabled at IRQ 23 [ 2899.534778] ACPI: PCI Interrupt Link [GSIE] enabled at IRQ 20 [ 6726.914388] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 6726.937932] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 6726.964699] Linux agpgart interface v0.103 There finally there is another 20 minutes to find in the boot. [ 7489.202589] virtio_net virtio515 enp193s0f0: renamed from eth513 Poky (Yocto Project Reference Distro) 2.3 qemux86-64 ttyS0 qemux86-64 login: root I will remove the virtio-net-pci devices and hotplug them instead. In theory it should improve boot time, at expense of incurring some of these costs at runtime. I would appreciate if you can share the results. Thanks, Marcel Ray K -Original Message- From: Kevin O'Connor [mailto:ke...@koconnor.net] Sent: Sunday, July 23, 2017 1:05 PM To: Marcel Apfelbaum; Kinsella, Ray Cc: qemu-devel@nongnu.org; seab...@seabios.org; Gerd Hoffmann ; Michael Tsirkin Subject: Re: >256 Virtio-net-pci hotplug Devices On Sun, Jul 23, 2017 at 07:28:01PM +0300, Marcel Apfelbaum wrote: On 22/07/2017 2:57, Kinsella, Ray wrote: When scaling up to 512 Virtio-net devices SeaBIOS appears to really slow down when configuring PCI Config space - haven't manage to get this to work yet. If there is a slowdown in SeaBIOS, it would help to produce a log with timing information - see: https://www.seabios.org/Debugging#Timing_debug_messages It may also help to increase the debug level in SeaBIOS to get more fine grained timing reports. -Kevin
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
So as it turns out at 512 devices, it is nothing to do SeaBIOS, it was the Kernel again. It is taking quite a while to startup, a little over two hours (7489 seconds). The main culprits appear to be enumerating/initializing the PCI Express ports and enabling interrupts. The PCI Express Root Ports are taking a long time to enumerate/ initializing. 42 minutes in total=2579/60=64 ports in total, 40 seconds each. [ 50.612822] pci_bus :80: root bus resource [bus 80-c1] [ 172.345361] pci :80:00.0: PCI bridge to [bus 81] ... [ 2724.734240] pci :80:08.0: PCI bridge to [bus c1] [ 2751.154702] ACPI: Enabled 2 GPEs in block 00 to 3F I assume the 1 hour (3827 seconds) below is being spent enabling interrupts. [ 2899.394288] ACPI: PCI Interrupt Link [GSIG] enabled at IRQ 22 [ 2899.531324] ACPI: PCI Interrupt Link [GSIH] enabled at IRQ 23 [ 2899.534778] ACPI: PCI Interrupt Link [GSIE] enabled at IRQ 20 [ 6726.914388] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 6726.937932] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 6726.964699] Linux agpgart interface v0.103 There finally there is another 20 minutes to find in the boot. [ 7489.202589] virtio_net virtio515 enp193s0f0: renamed from eth513 Poky (Yocto Project Reference Distro) 2.3 qemux86-64 ttyS0 qemux86-64 login: root I will remove the virtio-net-pci devices and hotplug them instead. In theory it should improve boot time, at expense of incurring some of these costs at runtime. Ray K -Original Message- From: Kevin O'Connor [mailto:ke...@koconnor.net] Sent: Sunday, July 23, 2017 1:05 PM To: Marcel Apfelbaum; Kinsella, Ray Cc: qemu-devel@nongnu.org; seab...@seabios.org; Gerd Hoffmann ; Michael Tsirkin Subject: Re: >256 Virtio-net-pci hotplug Devices On Sun, Jul 23, 2017 at 07:28:01PM +0300, Marcel Apfelbaum wrote: > On 22/07/2017 2:57, Kinsella, Ray wrote: > > When scaling up to 512 Virtio-net devices SeaBIOS appears to really > > slow down when configuring PCI Config space - haven't manage to get > > this to work yet. If there is a slowdown in SeaBIOS, it would help to produce a log with timing information - see: https://www.seabios.org/Debugging#Timing_debug_messages It may also help to increase the debug level in SeaBIOS to get more fine grained timing reports. -Kevin
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On Sun, Jul 23, 2017 at 07:28:01PM +0300, Marcel Apfelbaum wrote: > On 22/07/2017 2:57, Kinsella, Ray wrote: > > When scaling up to 512 Virtio-net devices SeaBIOS appears to really slow > > down when configuring PCI Config space - haven't manage to get this to > > work yet. If there is a slowdown in SeaBIOS, it would help to produce a log with timing information - see: https://www.seabios.org/Debugging#Timing_debug_messages It may also help to increase the debug level in SeaBIOS to get more fine grained timing reports. -Kevin
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 22/07/2017 2:57, Kinsella, Ray wrote: Hi Marcel Hi Ray, On 21/07/2017 01:33, Marcel Apfelbaum wrote: On 20/07/2017 3:44, Kinsella, Ray wrote: That's strange. Please ensure the virtio devices are working in virtio 1.0 mode (disable-modern=0,disable-legacy=1). Let us know any problems you see. Not sure what yet, I will try scaling it with hotplugging tomorrow. Updates? I have managed to scale it to 128 devices. The kernel does complain about IO address space exhaustion. [ 83.697956] pci :80:00.0: BAR 13: no space for [io size 0x1000] [ 83.700958] pci :80:00.0: BAR 13: failed to assign [io size 0x1000] [ 83.701689] pci :80:00.1: BAR 13: no space for [io size 0x1000] [ 83.702378] pci :80:00.1: BAR 13: failed to assign [io size 0x1000] [ 83.703093] pci :80:00.2: BAR 13: no space for [io size 0x1000] I was surprised that I am running out of IO address space, as I am disabling legacy virtio. I assumed that this would remove the need for SeaBIOS to allocate the PCI Express Root Port IO address space. Indeed, SeeBIOS does not reserve IO ports in this case, but Linux kernel still decides ""it knows better" and tries to allocate IO resources anyway. It does not affect the "modern" virtio-net devices because they don't need IO ports anyway. One way to work around the error message is to have the PCIe Root Port have the corresponding IO headers read-only since IO support is optional. I tried this some time ago, I'll get back to it. In any case - it doesn't stop the virtio-net device coming up and working as expected. Right. [ 668.692081] virtio_net virtio103 enp141s0f4: renamed from eth101 [ 668.707114] virtio_net virtio130 enp144s0f7: renamed from eth128 [ 668.719795] virtio_net virtio129 enp144s0f6: renamed from eth127 I encountered some issues in vhost, due to open file exhaustion but resolved these with 'ulimit' in the usual way - burned alot of time on that today. When scaling up to 512 Virtio-net devices SeaBIOS appears to really slow down when configuring PCI Config space - haven't manage to get this to work yet. Adding SeaBIOS mailing list and maintainers, maybe there is a known issue about 500+ PCI devices configuration. Not really. All you have to do is to add a property to the pxb-pci/pxb devices: pci_domain=x; then update the ACPI table to include the pxb domain. You also have to tweak a little the pxb-pcie/pxb devices to not share the bus numbers if pci_domain > 0. Thanks for information, will add to the list. Is also on my todo list :) Thanks, Marcel Ray K \
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi Marcel On 21/07/2017 01:33, Marcel Apfelbaum wrote: On 20/07/2017 3:44, Kinsella, Ray wrote: That's strange. Please ensure the virtio devices are working in virtio 1.0 mode (disable-modern=0,disable-legacy=1). Let us know any problems you see. Not sure what yet, I will try scaling it with hotplugging tomorrow. Updates? I have managed to scale it to 128 devices. The kernel does complain about IO address space exhaustion. [ 83.697956] pci :80:00.0: BAR 13: no space for [io size 0x1000] [ 83.700958] pci :80:00.0: BAR 13: failed to assign [io size 0x1000] [ 83.701689] pci :80:00.1: BAR 13: no space for [io size 0x1000] [ 83.702378] pci :80:00.1: BAR 13: failed to assign [io size 0x1000] [ 83.703093] pci :80:00.2: BAR 13: no space for [io size 0x1000] I was surprised that I am running out of IO address space, as I am disabling legacy virtio. I assumed that this would remove the need for SeaBIOS to allocate the PCI Express Root Port IO address space. In any case - it doesn't stop the virtio-net device coming up and working as expected. [ 668.692081] virtio_net virtio103 enp141s0f4: renamed from eth101 [ 668.707114] virtio_net virtio130 enp144s0f7: renamed from eth128 [ 668.719795] virtio_net virtio129 enp144s0f6: renamed from eth127 I encountered some issues in vhost, due to open file exhaustion but resolved these with 'ulimit' in the usual way - burned alot of time on that today. When scaling up to 512 Virtio-net devices SeaBIOS appears to really slow down when configuring PCI Config space - haven't manage to get this to work yet. Not really. All you have to do is to add a property to the pxb-pci/pxb devices: pci_domain=x; then update the ACPI table to include the pxb domain. You also have to tweak a little the pxb-pcie/pxb devices to not share the bus numbers if pci_domain > 0. Thanks for information, will add to the list. Ray K \
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 20/07/2017 3:44, Kinsella, Ray wrote: Hi Marcel, Hi Ray, You can use multi-function PCIe Root Ports, this will give you 8 ports per slot, if you have 16 empty slots (I think we have more) you reach 128 root ports. Then you can use multi-function virtio-net-pci devices, this will give you 8 functions per port, so you reach the target of 1024 devices. You loose hot-plug granularity since you can hot-plug 8-functions group, but maybe is OK, depending on your scenario. Thanks for the advice losing the hotplug granularity is something I think I can live with. It would mean, I would have to track how many ports are allocated to a VM, and create 8 new ports when 1 is required, caching the other 7 for when they are needed. Even so, you can use one cold-plugged pxb-pcie if you don't have enough empty slots on pcie.0, in order to reach the maximum number of PCIe Root Ports (256) which is the maximum for a single PCI domain. Took your advice see the attached cfg, it works exactly as you indicated. If you are interested, you can use it from your VM adding -readconfig to your qemu cmd line. I can currently only manage to start a VM with around 50 coldplugged virtio devices before something breaks. That's strange. Please ensure the virtio devices are working in virtio 1.0 mode (disable-modern=0,disable-legacy=1). Let us know any problems you see. Not sure what yet, I will try scaling it with hotplugging tomorrow. Updates? If you need granularity per single device (1000+ hot-pluggable), you could enhance the pxb-pcie to support multiple pci domains. Do think there would be much work in this? Not really. All you have to do is to add a property to the pxb-pci/pxb devices: pci_domain=x; then update the ACPI table to include the pxb domain. You also have to tweak a little the pxb-pcie/pxb devices to not share the bus numbers if pci_domain > 0. Thanks, Marcel Thanks, Ray K
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi Marcel, You can use multi-function PCIe Root Ports, this will give you 8 ports per slot, if you have 16 empty slots (I think we have more) you reach 128 root ports. Then you can use multi-function virtio-net-pci devices, this will give you 8 functions per port, so you reach the target of 1024 devices. You loose hot-plug granularity since you can hot-plug 8-functions group, but maybe is OK, depending on your scenario. Thanks for the advice losing the hotplug granularity is something I think I can live with. It would mean, I would have to track how many ports are allocated to a VM, and create 8 new ports when 1 is required, caching the other 7 for when they are needed. Even so, you can use one cold-plugged pxb-pcie if you don't have enough empty slots on pcie.0, in order to reach the maximum number of PCIe Root Ports (256) which is the maximum for a single PCI domain. Took your advice see the attached cfg, it works exactly as you indicated. If you are interested, you can use it from your VM adding -readconfig to your qemu cmd line. I can currently only manage to start a VM with around 50 coldplugged virtio devices before something breaks. Not sure what yet, I will try scaling it with hotplugging tomorrow. If you need granularity per single device (1000+ hot-pluggable), you could enhance the pxb-pcie to support multiple pci domains. Do think there would be much work in this? Thanks, Ray K test.cfg.gz Description: application/gzip
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
On 18/07/2017 0:50, Kinsella, Ray wrote: Hi folks, Hi Ray, I am trying to create a VM that supports hot-plugging a large number of virtio-net-pci device, up to 1000 devices initially. From the docs (see below) and from playing with QEMU, it looks like there are two options. Both with limitations. PCI Express switch It looks like using a PCI Express switch hierarchy is not an option due to bus exhaustion. Each downstream port creates a separate bus and each downstream port only supports hot-plugging a single device. So this gives us a max of 256-ish buses/devices pairs. Right. PCI Root Ports The other option is use a flatter hierarchy, with a number of multi-function PCI Root Ports hanging off 'pcie.0'. However each 'PCI Root Port' can support hot-plugging a single device. So this method really becomes a function of how many free address we have on 'pci.0'. If we make room for say 16 multifunction devices, we get 16*8 ... 128 So ultimately, this will approach will give us a similar to number to using a switch. Is there another method? You can use multi-function PCIe Root Ports, this will give you 8 ports per slot, if you have 16 empty slots (I think we have more) you reach 128 root ports. Then you can use multi-function virtio-net-pci devices, this will give you 8 functions per port, so you reach the target of 1024 devices. You loose hot-plug granularity since you can hot-plug 8-functions group, but maybe is OK, depending on your scenario. ( pxb-pcie doesn't support hotplug for instance, and only a single pcie domain is supported qemu ) Even so, you can use one cold-plugged pxb-pcie if you don't have enough empty slots on pcie.0, in order to reach the maximum number of PCIe Root Ports (256) which is the maximum for a single PCI domain. If you need granularity per single device (1000+ hot-pluggable), you could enhance the pxb-pcie to support multiple pci domains. Thanks, Marcel Thanks, Ray K Pcie.txt, ( https://github.com/qemu/qemu/blob/master/docs/pcie.txt ) Q35 preso ( http://wiki.qemu.org/images/4/4e/Q35.pdf )
[Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi folks, I am trying to create a VM that supports hot-plugging a large number of virtio-net-pci device, up to 1000 devices initially. >From the docs (see below) and from playing with QEMU, it looks like there are >two options. Both with limitations. PCI Express switch It looks like using a PCI Express switch hierarchy is not an option due to bus exhaustion. Each downstream port creates a separate bus and each downstream port only supports hot-plugging a single device. So this gives us a max of 256-ish buses/devices pairs. PCI Root Ports The other option is use a flatter hierarchy, with a number of multi-function PCI Root Ports hanging off 'pcie.0'. However each 'PCI Root Port' can support hot-plugging a single device. So this method really becomes a function of how many free address we have on 'pci.0'. If we make room for say 16 multifunction devices, we get 16*8 ... 128 So ultimately, this will approach will give us a similar to number to using a switch. Is there another method? ( pxb-pcie doesn't support hotplug for instance, and only a single pcie domain is supported qemu ) Thanks, Ray K Pcie.txt, ( https://github.com/qemu/qemu/blob/master/docs/pcie.txt ) Q35 preso ( http://wiki.qemu.org/images/4/4e/Q35.pdf )