On Fri, Mar 10, 2023 at 01:04:12PM -0600, Eric Blake wrote: > How many of these timing numbers can be repeated with TLS in the mix?
While I have been playing with TLS and kTLS recently, it's not something that is especially important to v2v since all NBD traffic goes over Unix domain sockets only (ie. it's used as kind of interprocess communication). I could certainly provide benchmarks, although as I'm going on holiday shortly it may be a little while. > > Curl local server test (./multi-conn.pl curlhttp) > > ================================================= > > > > Localhost Apache serving a file over http > > | > > | http > > v > > nbdkit-curl-plugin or qemu-nbd > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download an image from a local web server through > > nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD. > > The image is copied to /dev/null. > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 8.88s > > qemu-nbd nbdcopy 2 8.64s > > qemu-nbd nbdcopy 4 8.37s > > qemu-nbd qemu-img [u/s] 6.47s > > Do we have any good feel for why qemu-img is faster than nbdcopy in > the baseline? But improving that is orthogonal to this series. I do not, but we have in the past found that results can be very sensitive to request size. By default (and also in all of these tests) nbdcopy is using a request size of 256K, and qemu-img is using a request size of 2M. > > qemu-nbd qemu-img 1 6.56s > > qemu-nbd qemu-img 2 6.63s > > qemu-nbd qemu-img 4 6.50s > > nbdkit nbdcopy 1 12.15s > > I'm assuming this is nbdkit with your recent in-progress patches to > have the curl plugin serve parallel requests. But another place where > we can investigate why nbdkit is not as performant as qemu-nbd at > utilizing curl. > > > nbdkit nbdcopy 2 7.05s (72.36% better) > > nbdkit nbdcopy 4 3.54s (242.90% better) > > That one is impressive! > > > nbdkit qemu-img [u/s] 6.90s > > nbdkit qemu-img 1 7.00s > > Minimal penalty for adding the code but not utilizing it... [u/s] and qemu-img with multi-conn:1 ought to be identical actually. After all, the only difference should be the restructuring of the code to add the intermediate NBDConnState struct In this case it's probably just measurement error. > > nbdkit qemu-img 2 3.85s (79.15% better) > > nbdkit qemu-img 4 3.85s (79.15% better) > > ...and definitely shows its worth. > > > > > > > Curl local file test (./multi-conn.pl curlfile) > > =============================================== > > > > nbdkit-curl-plugin using file:/// URI > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download from a file:/// URI. This test is designed to exercise > > NBD and some curl internal paths without the overhead from an external > > server. qemu-nbd doesn't support file:/// URIs so we cannot duplicate > > the test for qemu as server. > > > > server client multi-conn > > --------------------------------------------------------------- > > nbdkit nbdcopy 1 31.32s > > nbdkit nbdcopy 2 20.29s (54.38% better) > > nbdkit nbdcopy 4 13.22s (136.91% better) > > nbdkit qemu-img [u/s] 31.55s > > Here, the baseline is already comparable; both nbdcopy and qemu-img > are parsing the image off nbdkit in about the same amount of time. > > > nbdkit qemu-img 1 31.70s > > And again, minimal penalty for having the new code in place but not > exploiting it. > > > nbdkit qemu-img 2 21.60s (46.07% better) > > nbdkit qemu-img 4 13.88s (127.25% better) > > Plus an obvious benefit when the parallel sockets matter. > > > > > > > Curl remote server test (./multi-conn.pl curlremote) > > ==================================================== > > > > nbdkit-curl-plugin using http://remote/*.qcow2 URI > > | > > | nbd+unix > > v > > qemu-img convert > > > > We download from a remote qcow2 file to a local raw file, converting > > between formats during copying. > > > > qemu-nbd using http://remote/*.qcow2 URI > > | > > | nbd+unix > > v > > qemu-img convert > > > > Similarly, replacing nbdkit with qemu-nbd (treating the remote file as > > if it is raw, so the conversion is still done by qemu-img). > > > > Additionally we compare downloading the file with wget (note this > > doesn't include the time for conversion, but that should only be a few > > seconds). > > > > server client multi-conn > > --------------------------------------------------------------- > > - wget 1 58.19s > > nbdkit qemu-img [u/s] 68.29s (17.36% worse) > > nbdkit qemu-img 1 67.85s (16.60% worse) > > nbdkit qemu-img 2 58.17s > > Comparable to wget on paper, but a win in practice (since the wget > step also has to add a post-download qemu-img local conversion step). Yes, correct. Best case that would be another ~ 2-3 seconds on this machine. > > nbdkit qemu-img 4 59.80s > > nbdkit qemu-img 6 59.15s > > nbdkit qemu-img 8 59.52s > > > > qemu-nbd qemu-img [u/s] 202.55s > > qemu-nbd qemu-img 1 204.61s > > qemu-nbd qemu-img 2 196.73s > > qemu-nbd qemu-img 4 179.53s (12.83% better) > > qemu-nbd qemu-img 6 181.70s (11.48% better) > > qemu-nbd qemu-img 8 181.05s (11.88% better) > > > > Less dramatic results here, but still nothing horrible. > > > > > Local file test (./multi-conn.pl file) > > ====================================== > > > > qemu-nbd or nbdkit serving a large local file > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download a local file over NBD. The image is copied to /dev/null. > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 15.50s > > qemu-nbd nbdcopy 2 14.36s > > qemu-nbd nbdcopy 4 14.32s > > qemu-nbd qemu-img [u/s] 10.16s > > Once again, we're seeing qemu-img baseline faster than nbdcopy as > client. But throwing more sockets at either client does improve > performance, except for... > > > qemu-nbd qemu-img 1 11.17s (10.01% worse) > > ...this one looks bad. Is it a case of this series adding more mutex > work (qemu-img is making parallel requests; each request then contends > for the mutex only to learn that it will be using the same NBD > connection)? And your comments about smarter round-robin schemes mean > there may still be room to avoid this much of a penalty. This was reproducible and I don't have a good explanation for it. As far as I know just adding the NBDConnState struct should not add any overhead. The only locking is the call to choose_connection, and that's just the access to an atomic variable which I can't imagine could cause such a difference. > > qemu-nbd qemu-img 2 10.35s > > qemu-nbd qemu-img 4 10.39s > > nbdkit nbdcopy 1 9.10s > > This one in interesting: nbdkit as server performs better than > qemu-nbd. > > > nbdkit nbdcopy 2 8.25s > > nbdkit nbdcopy 4 8.60s > > nbdkit qemu-img [u/s] 8.64s > > nbdkit qemu-img 1 9.38s > > nbdkit qemu-img 2 8.69s > > nbdkit qemu-img 4 8.87s > > > > > > Null test (./multi-conn.pl null) > > ================================ > > > > qemu-nbd with null-co driver or nbdkit-null-plugin + noextents filter > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > This is like the local file test above, but without needing a file. > > Instead all zeroes (fully allocated) are downloaded over NBD. > > And I'm sure that if you allowed block status to show the holes, the > performance would be a lot faster, but that would be testing something > completely differently ;) > > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 14.86s > > qemu-nbd nbdcopy 2 17.08s (14.90% worse) > > qemu-nbd nbdcopy 4 17.89s (20.37% worse) > > Oh, that's weird. I wonder if qemu's null-co driver has some poor > mutex behavior when being hit by parallel I/O. Seems like > investigating that can be separate from this series, though. Yes, I noticed in other tests that null-co has some odd behaviour, but I couldn't understand it from looking at the code which seems very simple. It does a memset, maybe that is expensive because it uses newly allocated buffers every time or something like that? > > qemu-nbd qemu-img [u/s] 13.29s > > And another point where qemu-img is faster than nbdcopy as a > single-client baseline. > > > qemu-nbd qemu-img 1 13.31s > > qemu-nbd qemu-img 2 13.00s > > qemu-nbd qemu-img 4 12.62s > > nbdkit nbdcopy 1 15.06s > > nbdkit nbdcopy 2 12.21s (23.32% better) > > nbdkit nbdcopy 4 11.67s (29.10% better) > > nbdkit qemu-img [u/s] 17.13s > > nbdkit qemu-img 1 17.11s > > nbdkit qemu-img 2 16.82s > > nbdkit qemu-img 4 18.81s > > Overall, I'm looking forward to seeing this go in (8.1 material; we're > too close to 8.0) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org