On Fri, May 03, 2024 at 06:19:30PM -0300, Fabiano Rosas wrote: > Peter Xu <pet...@redhat.com> writes: > > > On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote: > >> We're about to enable the use of O_DIRECT in the migration code and > >> due to the alignment restrictions imposed by filesystems we need to > >> make sure the flag is only used when doing aligned IO. > >> > >> The migration will do parallel IO to different regions of a file, so > >> we need to use more than one file descriptor. Those cannot be obtained > >> by duplicating (dup()) since duplicated file descriptors share the > >> file status flags, including O_DIRECT. If one migration channel does > >> unaligned IO while another sets O_DIRECT to do aligned IO, the > >> filesystem would fail the unaligned operation. > >> > >> The add-fd QMP command along with the fdset code are specifically > >> designed to allow the user to pass a set of file descriptors with > >> different access flags into QEMU to be later fetched by code that > >> needs to alternate between those flags when doing IO. > >> > >> Extend the fdset matching to behave the same with the O_DIRECT flag. > >> > >> Signed-off-by: Fabiano Rosas <faro...@suse.de> > >> --- > >> monitor/fds.c | 7 ++++++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/monitor/fds.c b/monitor/fds.c > >> index 4ec3b7eea9..62e324fcec 100644 > >> --- a/monitor/fds.c > >> +++ b/monitor/fds.c > >> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int > >> flags) > >> int fd = -1; > >> int dup_fd; > >> int mon_fd_flags; > >> + int mask = O_ACCMODE; > >> + > >> +#ifdef O_DIRECT > >> + mask |= O_DIRECT; > >> +#endif > >> > >> if (mon_fdset->id != fdset_id) { > >> continue; > >> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int > >> flags) > >> return -1; > >> } > >> > >> - if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) { > >> + if ((flags & mask) == (mon_fd_flags & mask)) { > >> fd = mon_fdset_fd->fd; > >> break; > >> } > > > > I think I see what you wanted to do, picking out the right fd out of two > > when qemu_open_old(), which makes sense. > > > > However what happens if the mgmt app only passes in 1 fd to the fdset? The > > issue is we have a "fallback dup()" plan right after this chunk of code: > > > > I'm validating the fdset at file_parse_fdset() beforehand. If there's > anything else than 2 fds then we'll error out: > > if (nfds != 2) { > error_setg(errp, "Outgoing migration needs two fds in the fdset, " > "got %d", nfds); > qmp_remove_fd(*id, false, -1, NULL); > *id = -1; > return false; > } > > > dup_fd = qemu_dup_flags(fd, flags); > > if (dup_fd == -1) { > > return -1; > > } > > > > mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup)); > > mon_fdset_fd_dup->fd = dup_fd; > > QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next); > > > > I think it means even if the mgmt app only passes in 1 fd (rather than 2, > > one with O_DIRECT, one without), QEMU can always successfully call > > qemu_open_old() twice for each case, even though silently the two FDs will > > actually impact on each other. This doesn't look ideal if it's true. > > > > But I also must confess I don't really understand this code at all: we > > dup(), then we try F_SETFL on all the possible flags got passed in. > > However AFAICT due to the fact that dup()ed FDs will share "struct file" it > > means mostly all flags will be shared, except close-on-exec. I don't ever > > see anything protecting that F_SETFL to only touch close-on-exec, I think > > it means it'll silently change file status flags for the other fd which we > > dup()ed from. Does it mean that we have issue already with such dup() > > usage? > > I think you're right, but I also think there's a requirement even from > this code that the fds in the fdset cannot be dup()ed. I don't see it > enforced anywhere, but maybe that's a consequence of the larger use-case > for which this feature was introduced.
I think that's the thing we need to figure out for add-fd usages. The bad thing is there're too many qemu_open_internal() users... so we can't easily tell what we're looking for. May need some time reading the code or the history.. pretty sad. I hope someone can chim in. > > For our scenario, the open() man page says one can use kcmp() to compare > the fds and determine if they are a result of dup(). Maybe we should do > that extra check? We're defining a pretty rigid interface between QEMU > and the management layer, so not likely to break once it's written. I'm > also not sure how bad would it be to call syscall() directly from QEMU > (kcmp has no libc wrapper). That should be all fine, see: $ git grep " syscall(" | wc -l 28 And if we want we can also do fcntl(F_GETFL) on both fds later, making sure they have proper flags (one must have O_DIRECT, one must not). -- Peter Xu