Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Richard W.M. Jones
On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> rhv-upload plugin is translating every NBD command to HTTP request,
> translated back to NBD command on imageio server. The HTTP client and
> server, and the NBD client on the imageio server side are synchronous
> and implemented in python, so they have high overhead per request. To
> get good performance we need to use larger request size.
> 
> Testing shows that request size of 8MiB is best, speeding up the copy
> disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).

Unfortunately this will break VDDK since it cannot handle very large
requests (I think 4M is about the max without reconfiguring the
server).  Also larger requests have adverse performance effects in
other configurations, although I understand this patch tries to
retrict the change to when the output mode is rhv-upload.

We need to think of some other approach, but I'm not sure what it is.
I'd really like to be able to talk to imageio's NBD server directly!

Other relevant commits:
https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165

[...]
> This is an ugly hack; the preferred request size should be a function of
> the output module that only output_rhv_upload will override, but I don't
> know how to implement this with the current code.

Just add a new value to output/output.ml{,i}.  There is no
superclassing (this is not OO) so you'll have to add the value to
every output module implementation, defaulting to None.

However I'd like to think of another approach first.

 - Have nbdcopy split and combine requests so request size for input
   and output can be different?  Sounds complicated but might be
   necessary one day to support minimum block size.

 - More efficient Python plugin that might combine requests?  Also
   complicated ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones  wrote:
>
> On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > rhv-upload plugin is translating every NBD command to HTTP request,
> > translated back to NBD command on imageio server. The HTTP client and
> > server, and the NBD client on the imageio server side are synchronous
> > and implemented in python, so they have high overhead per request. To
> > get good performance we need to use larger request size.
> >
> > Testing shows that request size of 8MiB is best, speeding up the copy
> > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
>
> Unfortunately this will break VDDK since it cannot handle very large
> requests (I think 4M is about the max without reconfiguring the
> server).

Are you sure it will break VDDK?

Request size limit is in VDDK API, not in the nbdkit plugin. When you
request 8M request, the VDDK plugin should allocated a 8M buffer,
and issue multiple calls to VDDK APIs, using VDDK maximum
request size to fill the buffer.

If the VDDK plugin does not do this, this is a bug in the plugin, since
it must respect the underlying API.

If 8M does break the vddk plugin, we can use 4M, it is only a little
slower then 8M but still much faster than 256k.

> Also larger requests have adverse performance effects in
> other configurations, although I understand this patch tries to
> retrict the change to when the output mode is rhv-upload.

Yes, this affects only -o rhv-upload.

> We need to think of some other approach, but I'm not sure what it is.
> I'd really like to be able to talk to imageio's NBD server directly!

We have a RFE to implement a local nbd socket, which should be easy,
but the only attention we got so far was an attempt to close it ;-)

Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
help existing users.

> Other relevant commits:
> https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165

This looks like a nicer way to implement this change.

>
> [...]
> > This is an ugly hack; the preferred request size should be a function of
> > the output module that only output_rhv_upload will override, but I don't
> > know how to implement this with the current code.
>
> Just add a new value to output/output.ml{,i}.  There is no
> superclassing (this is not OO) so you'll have to add the value to
> every output module implementation, defaulting to None.

Sounds reasonable, we have only a few outputs.

> However I'd like to think of another approach first.
>
>  - Have nbdcopy split and combine requests so request size for input
>and output can be different?  Sounds complicated but might be
>necessary one day to support minimum block size.

I think this is not needed for the case of large request size,
since on nbd client/server size it is easy to handle large requests
regardless of the underlying API limits.

>  - More efficient Python plugin that might combine requests?  Also
>complicated ...

This will be slower and complicated.

First we need to solve the issue of getting the right connection to handle
the nbd command - now every command is handled by a random connection
from the pool.

Assuming we solved connection pooling, we can keep a buffer for the next
request, and on every pwrite() fill this buffer. When the buffer is full, or
when receiving a zero()/flush()/close() request, we need to send the
buffer. This introduces additional copy which will slow down the transfer.

Finally there is the issue of reporting errors - since we buffer nbd commands
data, we cannot report errors for every commands,  we need to report
errors later, either in the middle of the command (buffer becomes full)
or when the next command is received.

So this will be slower, very hard to implement, and hard to debug.

The way we solved this in imageio client - copying data between nbd
and http backends, is to implement the copy loop in the http client.

In imageio we have this (simplified) copy loop:

for extent in extents:
if zero:
dst.zero(extent)
else:
copy(src, dst, extent)

copy() is checking if the src or dst can do efficient copy,
and if not it fallbacks to a generic copy:

if hasattr(dst, "read_from"):
dst.read_from(src, extent)
elif hasattr(src, "write_to"):
srt.write_to(dst, extent)
else:
for chunk in extent:
src.read(chunk)
dst.write(chunk)

The http backend implements read_from like this:

send put request headers
for chunk in extent:
src.read(chunk)
socket.write(chunk)

In this way we can use the most efficient request size for the nbd
input (256k), for the http socket (256k is good), but minimize the
number of http requests.

Of course this will be harder with the async API, but it is possible.
We can keep the pread requests in a FIFO

Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Richard W.M. Jones
On Sun, Feb 13, 2022 at 05:13:37PM +0200, Nir Soffer wrote:
> On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones  wrote:
> >
> > On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > > rhv-upload plugin is translating every NBD command to HTTP request,
> > > translated back to NBD command on imageio server. The HTTP client and
> > > server, and the NBD client on the imageio server side are synchronous
> > > and implemented in python, so they have high overhead per request. To
> > > get good performance we need to use larger request size.
> > >
> > > Testing shows that request size of 8MiB is best, speeding up the copy
> > > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
> >
> > Unfortunately this will break VDDK since it cannot handle very large
> > requests (I think 4M is about the max without reconfiguring the
> > server).
> 
> Are you sure it will break VDDK?
> 
> Request size limit is in VDDK API, not in the nbdkit plugin. When you
> request 8M request, the VDDK plugin should allocated a 8M buffer,
> and issue multiple calls to VDDK APIs, using VDDK maximum
> request size to fill the buffer.
> 
> If the VDDK plugin does not do this, this is a bug in the plugin, since
> it must respect the underlying API.

nbdkit-vddk-plugin doesn't do this at the moment.

With VDDK it's kind of complicated because it's actually a maximum
memory setting for a particular daemon on the server side (hostd), so
the limit is roughly

  number of parallel requests x size of each request <= memory limit - overhead

Also this can be adjusted on the server side by the administrator (and
we recommend virt-v2v users do this) but there's no way to query the
setting from the client.  More here:

https://vdc-download.vmware.com/vmwb-repository/dcr-public/48f32c72-7f7f-49da-9887-63b7ae1fd6f0/b820bcb4-f4bd-4141-bb9e-68ca9aded55d/nbd-perf.html

> If 8M does break the vddk plugin, we can use 4M, it is only a little
> slower then 8M but still much faster than 256k.
> 
> > Also larger requests have adverse performance effects in
> > other configurations, although I understand this patch tries to
> > retrict the change to when the output mode is rhv-upload.
> 
> Yes, this affects only -o rhv-upload.
> 
> > We need to think of some other approach, but I'm not sure what it is.
> > I'd really like to be able to talk to imageio's NBD server directly!
> 
> We have a RFE to implement a local nbd socket, which should be easy,
> but the only attention we got so far was an attempt to close it ;-)
> 
> Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
> help existing users.
> 
> > Other relevant commits:
> > https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> > https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
> 
> This looks like a nicer way to implement this change.
> 
> >
> > [...]
> > > This is an ugly hack; the preferred request size should be a function of
> > > the output module that only output_rhv_upload will override, but I don't
> > > know how to implement this with the current code.
> >
> > Just add a new value to output/output.ml{,i}.  There is no
> > superclassing (this is not OO) so you'll have to add the value to
> > every output module implementation, defaulting to None.
> 
> Sounds reasonable, we have only a few outputs.
> 
> > However I'd like to think of another approach first.
> >
> >  - Have nbdcopy split and combine requests so request size for input
> >and output can be different?  Sounds complicated but might be
> >necessary one day to support minimum block size.
> 
> I think this is not needed for the case of large request size,
> since on nbd client/server size it is easy to handle large requests
> regardless of the underlying API limits.
> 
> >  - More efficient Python plugin that might combine requests?  Also
> >complicated ...
> 
> This will be slower and complicated.
> 
> First we need to solve the issue of getting the right connection to handle
> the nbd command - now every command is handled by a random connection
> from the pool.
> 
> Assuming we solved connection pooling, we can keep a buffer for the next
> request, and on every pwrite() fill this buffer. When the buffer is full, or
> when receiving a zero()/flush()/close() request, we need to send the
> buffer. This introduces additional copy which will slow down the transfer.
> 
> Finally there is the issue of reporting errors - since we buffer nbd commands
> data, we cannot report errors for every commands,  we need to report
> errors later, either in the middle of the command (buffer becomes full)
> or when the next command is received.
> 
> So this will be slower, very hard to implement, and hard to debug.
> 
> The way we solved this in imageio client - copying data between nbd
> and http backends, is to implement the copy loop in the http client.
> 
> In imageio we have this (simplified) copy loop:
> 
> for extent i

Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
On Sun, Feb 13, 2022 at 5:13 PM Nir Soffer  wrote:
>
> On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones  wrote:
> >
> > On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > > rhv-upload plugin is translating every NBD command to HTTP request,
> > > translated back to NBD command on imageio server. The HTTP client and
> > > server, and the NBD client on the imageio server side are synchronous
> > > and implemented in python, so they have high overhead per request. To
> > > get good performance we need to use larger request size.
> > >
> > > Testing shows that request size of 8MiB is best, speeding up the copy
> > > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
> >
> > Unfortunately this will break VDDK since it cannot handle very large
> > requests (I think 4M is about the max without reconfiguring the
> > server).
>
> Are you sure it will break VDDK?
>
> Request size limit is in VDDK API, not in the nbdkit plugin. When you
> request 8M request, the VDDK plugin should allocated a 8M buffer,
> and issue multiple calls to VDDK APIs, using VDDK maximum
> request size to fill the buffer.
>
> If the VDDK plugin does not do this, this is a bug in the plugin, since
> it must respect the underlying API.
>
> If 8M does break the vddk plugin, we can use 4M, it is only a little
> slower then 8M but still much faster than 256k.
>
> > Also larger requests have adverse performance effects in
> > other configurations, although I understand this patch tries to
> > retrict the change to when the output mode is rhv-upload.
>
> Yes, this affects only -o rhv-upload.
>
> > We need to think of some other approach, but I'm not sure what it is.
> > I'd really like to be able to talk to imageio's NBD server directly!
>
> We have a RFE to implement a local nbd socket, which should be easy,
> but the only attention we got so far was an attempt to close it ;-)
>
> Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
> help existing users.
>
> > Other relevant commits:
> > https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> > https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
>
> This looks like a nicer way to implement this change.
>
> >
> > [...]
> > > This is an ugly hack; the preferred request size should be a function of
> > > the output module that only output_rhv_upload will override, but I don't
> > > know how to implement this with the current code.
> >
> > Just add a new value to output/output.ml{,i}.  There is no
> > superclassing (this is not OO) so you'll have to add the value to
> > every output module implementation, defaulting to None.
>
> Sounds reasonable, we have only a few outputs.
>
> > However I'd like to think of another approach first.
> >
> >  - Have nbdcopy split and combine requests so request size for input
> >and output can be different?  Sounds complicated but might be
> >necessary one day to support minimum block size.
>
> I think this is not needed for the case of large request size,
> since on nbd client/server size it is easy to handle large requests
> regardless of the underlying API limits.
>
> >  - More efficient Python plugin that might combine requests?  Also
> >complicated ...
>
> This will be slower and complicated.
>
> First we need to solve the issue of getting the right connection to handle
> the nbd command - now every command is handled by a random connection
> from the pool.
>
> Assuming we solved connection pooling, we can keep a buffer for the next
> request, and on every pwrite() fill this buffer. When the buffer is full, or
> when receiving a zero()/flush()/close() request, we need to send the
> buffer. This introduces additional copy which will slow down the transfer.
>
> Finally there is the issue of reporting errors - since we buffer nbd commands
> data, we cannot report errors for every commands,  we need to report
> errors later, either in the middle of the command (buffer becomes full)
> or when the next command is received.
>
> So this will be slower, very hard to implement, and hard to debug.
>
> The way we solved this in imageio client - copying data between nbd
> and http backends, is to implement the copy loop in the http client.
>
> In imageio we have this (simplified) copy loop:
>
> for extent in extents:
> if zero:
> dst.zero(extent)
> else:
> copy(src, dst, extent)
>
> copy() is checking if the src or dst can do efficient copy,
> and if not it fallbacks to a generic copy:
>
> if hasattr(dst, "read_from"):
> dst.read_from(src, extent)
> elif hasattr(src, "write_to"):
> srt.write_to(dst, extent)
> else:
> for chunk in extent:
> src.read(chunk)
> dst.write(chunk)
>
> The http backend implements read_from like this:
>
> send put request headers
> for chunk in extent:
> src.read(chunk)
> socket.write(chunk)

O

[Libguestfs] [PATCH v2] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
Output modules can specify now request_size to override the default
request size in nbdcopy.

The rhv-upload plugin is translating every NBD command to HTTP request,
translated back to NBD command on imageio server. The HTTP client and
server, and the NBD client on the imageio server side are synchronous
and implemented in python, so they have high overhead per request. To
get good performance we need to use larger request size.

Testing shows that request size of 4 MiB speeds up the copy disk phase
from 14.7 seconds to 7.9 seconds (1.8x times faster). Request size of 8
MiB is a little bit faster but is not compatible with VDDK input.

Here are stats extracted from imageio log when importing Fedora 35 image
with 3 GiB of random data. For each copy, we have 4 connection stats.

Before:

connection 1 ops, 14.767843 s
dispatch 4023 ops, 11.427662 s
zero 38 ops, 0.053840 s, 327.91 MiB, 5.95 GiB/s
write 3981 ops, 8.975877 s, 988.61 MiB, 110.14 MiB/s
flush 4 ops, 0.001023 s

connection 1 ops, 14.770026 s
dispatch 4006 ops, 11.408732 s
zero 37 ops, 0.057205 s, 633.21 MiB, 10.81 GiB/s
write 3965 ops, 8.907420 s, 986.65 MiB, 110.77 MiB/s
flush 4 ops, 0.000280 s

connection 1 ops, 14.768180 s
dispatch 4057 ops, 11.430712 s
zero 42 ops, 0.030011 s, 470.47 MiB, 15.31 GiB/s
write 4011 ops, 9.002055 s, 996.98 MiB, 110.75 MiB/s
flush 4 ops, 0.000261 s

connection 1 ops, 14.770744 s
dispatch 4037 ops, 11.462050 s
zero 45 ops, 0.026668 s, 750.82 MiB, 27.49 GiB/s
write 3988 ops, 9.002721 s, 989.36 MiB, 109.90 MiB/s
flush 4 ops, 0.000282 s

After:

connection 1 ops, 7.940377 s
dispatch 323 ops, 6.695582 s
zero 37 ops, 0.079958 s, 641.12 MiB, 7.83 GiB/s
write 282 ops, 6.300242 s, 1.01 GiB, 164.54 MiB/s
flush 4 ops, 0.000537 s

connection 1 ops, 7.908156 s
dispatch 305 ops, 6.643475 s
zero 36 ops, 0.144166 s, 509.43 MiB, 3.45 GiB/s
write 265 ops, 6.179187 s, 941.23 MiB, 152.32 MiB/s
flush 4 ops, 0.000324 s

connection 1 ops, 7.942349 s
dispatch 325 ops, 6.744800 s
zero 45 ops, 0.185335 s, 622.19 MiB, 3.28 GiB/s
write 276 ops, 6.236819 s, 995.45 MiB, 159.61 MiB/s
flush 4 ops, 0.000369 s

connection 1 ops, 7.955572 s
dispatch 317 ops, 6.721212 s
zero 43 ops, 0.135771 s, 409.68 MiB, 2.95 GiB/s
write 270 ops, 6.326366 s, 988.26 MiB, 156.21 MiB/s
flush 4 ops, 0.001439 s
---

Changes since v1:

- Decrease request size to 4 MiB for compatibility with VDDK input.
  (Richard)
- Reimplement in a nicer way based on
  
https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
  (Richard)

v1 was here:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00183.html

 output/output.ml|  1 +
 output/output.mli   |  2 ++
 output/output_disk.ml   |  2 ++
 output/output_glance.ml |  2 ++
 output/output_json.ml   |  2 ++
 output/output_libvirt.ml|  2 ++
 output/output_null.ml   |  2 ++
 output/output_openstack.ml  |  2 +-
 output/output_qemu.ml   |  2 ++
 output/output_rhv.ml|  2 ++
 output/output_rhv_upload.ml |  7 +++
 output/output_vdsm.ml   |  2 ++
 v2v/v2v.ml  | 11 ---
 13 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/output/output.ml b/output/output.ml
index 786ee5d5..7256b547 100644
--- a/output/output.ml
+++ b/output/output.ml
@@ -39,20 +39,21 @@ type options = {
 module type OUTPUT = sig
   type poptions
   type t
   val to_string : options -> string
   val query_output_options : unit -> unit
   val parse_options : options -> Types.source -> poptions
   val setup : string -> poptions -> Types.source -> t
   val finalize : string -> poptions -> t ->
  Types.source -> Types.inspect -> Types.target_meta ->
  unit
+  val request_size : int option
 end
 
 let error_option_cannot_be_used_in_output_mode mode opt =
   error (f_"-o %s: %s option cannot be used in this output mode") mode opt
 
 let get_disks dir =
   let rec loop acc i =
 let socket = sprintf "%s/in%d" dir i in
 if Sys.file_exists socket then (
   let size = Utils.with_nbd_connect_unix ~socket NBD.get_size in
diff --git a/output/output.mli b/output/output.mli
index eed204ed..8e3efd8e 100644
--- a/output/output.mli
+++ b/output/output.mli
@@ -52,20 +52,22 @@ module type OUTPUT = sig
 
   Set up the output mode.  Sets up a disk pipeline
   [dir // "outX"] for each output disk. *)
 
   val finalize : string -> poptions -> t ->
  Types.source -> Types.inspect -> Types.target_meta ->
  unit
   (** [finalize dir poptions t inspect target_meta]
 
   Finalizes the conversion and writes metadata. *)
+
+  val request_size : int option
 end
 
 (** Helper functions for output modes. *)
 
 val error_option_cannot_be_used_in_output_mode : string -> string -> unit
 (** [error_option_cannot_be_used_in_output_mode mode option]
 prints error message that option cannot be used in this output mode. *)
 
 val get_disks : string -> (int * int64) list
 (** Examines the v2v d