On Thu, Nov 05, 2020 at 09:20:45PM +0000, Ashish Kalra wrote: > On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote: > > On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote: > > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote: > > > > . > > > > > > Right, so I am wondering if we can do this better. > > > > > > > > > > > > That is you are never going to get any 32-bit devices with SEV > > > > > > right? That > > > > > > is there is nothing that bounds you to always use the memory below > > > > > > 4GB? > > > > > > > > > > > > > > > > We do support 32-bit PCIe passthrough devices with SEV. > > > > > > > > Ewww.. Which devices would this be? > > > > > > That will be difficult to predict as customers could be doing > > > passthrough of all kinds of devices. > > > > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in > > there. > > > > Is it really possible to have a PCIe device that can't do more than 32-bit > > DMA? > > > > > > > > > > > > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce > > > > > buffering > > > > > when there is I/O pressure, because we do need to support device > > > > > passthrough of 32-bit devices. > > > > > > > > Presumarily there is just a handful of them? > > > > > > > Again, it will be incorrect to assume this. > > > > > > > > > > > > > Considering this, we believe that this patch needs to adjust/extend > > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this > > > > > within a range detemined by amount of allocated guest memory. > > > > > > > > I would prefer to not have to revert this in a year as customers > > > > complain about "I paid $$$ and I am wasting half a gig on something > > > > I am not using" and giving customers knobs to tweak this instead of > > > > doing the right thing from the start. > > > > > > Currently, we face a lot of situations where we have to tell our > > > internal teams/external customers to explicitly increase SWIOTLB buffer > > > via the swiotlb parameter on the kernel command line, especially to > > > get better I/O performance numbers with SEV. > > > > Presumarily these are 64-bit? > > > > And what devices do you speak off that are actually affected by > > this performance? Increasing the SWIOTLB just means we have more > > memory, which in mind means you can have _more_ devices in the guest > > that won't handle the fact that DMA mapping returns an error. > > > > Not neccessarily that one device suddenly can go faster. > > > > > > > > So by having this SWIOTLB size adjustment done implicitly (even using a > > > static logic) is a great win-win situation. In other words, having even > > > a simple and static default increase of SWIOTLB buffer size for SEV is > > > really useful for us. > > > > > > We can always think of adding all kinds of heuristics to this, but that > > > just adds too much complexity without any predictable performance gain. > > > > > > And to add, the patch extends the SWIOTLB size as an architecture > > > specific callback, currently it is a simple and static logic for SEV/x86 > > > specific, but there is always an option to tweak/extend it with > > > additional logic in the future. > > > > Right, and that is what I would like to talk about as I think you > > are going to disappear (aka, busy with other stuff) after this patch goes > > in. > > > > I need to understand this more than "performance" and "internal teams" > > requirements to come up with a better way going forward as surely other > > platforms will hit the same issue anyhow. > > > > Lets break this down: > > > > How does the performance improve for one single device if you increase the > > SWIOTLB? > > Is there a specific device/driver that you can talk about that improve with > > this patch? > > > > > > Yes, these are mainly for multi-queue devices such as NICs or even > multi-queue virtio. > > This basically improves performance with concurrent DMA, hence, > basically multi-queue devices.
OK, and for _1GB_ guest - what are the "internal teams/external customers" amount of CPUs they use? Please lets use real use-cases.