On Wed, Oct 25, 2023 at 02:30:01PM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Wed, Oct 25, 2023 at 11:32:00AM -0300, Fabiano Rosas wrote: > >> Daniel P. Berrangé <berra...@redhat.com> writes: > >> > >> > On Tue, Oct 24, 2023 at 04:32:10PM -0300, Fabiano Rosas wrote: > >> >> Markus Armbruster <arm...@redhat.com> writes: > >> >> > >> >> > Fabiano Rosas <faro...@suse.de> writes: > >> >> > > >> >> >> Add the direct-io migration parameter that tells the migration code > >> >> >> to > >> >> >> use O_DIRECT when opening the migration stream file whenever > >> >> >> possible. > >> >> >> > >> >> >> This is currently only used for the secondary channels of fixed-ram > >> >> >> migration, which can guarantee that writes are page aligned. > >> >> >> > >> >> >> However the parameter could be made to affect other types of > >> >> >> file-based migrations in the future. > >> >> >> > >> >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> > >> >> > > >> >> > When would you want to enable @direct-io, and when would you want to > >> >> > leave it disabled? > >> >> > >> >> That depends on a performance analysis. You'd generally leave it > >> >> disabled unless there's some indication that the operating system is > >> >> having trouble draining the page cache. > >> > > >> > That's not the usage model I would suggest. > >> > > >> > >> Hehe I took a shot at answering but I really wanted to say "ask Daniel". > >> > >> > The biggest value of the page cache comes when it holds data that > >> > will be repeatedly accessed. > >> > > >> > When you are saving/restoring a guest to file, that data is used > >> > once only (assuming there's a large gap between save & restore). > >> > By using the page cache to save a big guest we essentially purge > >> > the page cache of most of its existing data that is likely to be > >> > reaccessed, to fill it up with data never to be reaccessed. > >> > > >> > I usually describe save/restore operations as trashing the page > >> > cache. > >> > > >> > IMHO, mgmt apps should request O_DIRECT always unless they expect > >> > the save/restore operation to run in quick succession, or if they > >> > know that the host has oodles of free RAM such that existing data > >> > in the page cache won't be trashed, or > >> > >> Thanks, I'll try to incorporate this to some kind of doc in the next > >> version. > >> > >> > if the host FS does not support O_DIRECT of course. > >> > >> Should we try to probe for this and inform the user? > > > > qemu_open_internall will already do a nice error message. If it gets > > EINVAL when using O_DIRECT, it'll retry without O_DIRECT and if that > > works, it'll reoprt "filesystem does not support O_DIRECT" > > > > Having said that I see a problem with /dev/fdset handling, because > > we're only validating O_ACCMODE and that excludes O_DIRECT. > > > > If the mgmt apps passes an FD with O_DIRECT already set, then it > > won't work for VMstate saving which is unaligned. > > > > If the mgmt app passes an FD without O_DIRECT set, then we are > > not setting O_DIRECT for the multifd RAM threads. > > Worse, the fds get dup'ed so even without O_DIRECT, we we enable it for > the secondary channels the main channel will break on unaligned writes. > > For now I can only think of requiring two fds. One for the main channel > and a second one for the rest of the channels. And validating the fd > flags to make sure O_DIRECT is only allowed to be set in the second fd.
In this new model I think there's no reason for libvirt to set O_DIRECT for its own initial I/O. So we could just totally ignore O_DIRECT when initially opening the QIOCHannelFile. Then provide a method on QIOCHannelFile to enable O_DIRECT on the fly which can be called for the multifd threads setup ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|