Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 09.07.2012 19:35, schrieb Corey Bryant: On 07/09/2012 11:46 AM, Kevin Wolf wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? Because if refcount is 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close. The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. fds in an fd set really need to be kept around until zero block devices reference them. At that point, if '(refcount == 0 (!inuse || remove))' is true, then we'll officially close the fd. Block devices don't reference an fd in the fdset. There are two references in a block device. The first one is obviously the file descriptor they are using; it is a fd dup()ed from an fd in the fdset, but it's now independent of it. The other reference is the file name that is kept in the BlockDriverState, and it always points to /dev/fdset/X, that is, the whole fdset instead of a single fd. What happens if you remove a file descriptor from an fdset that is in use, is that you can't reopen the fdset with the flags of the removed file descriptor any more. Which I believe is exactly the expected behaviour. libvirt would use this to revoke r/w access, for example (and which behaviour you already provide by checking removed in qemu_open). Are there any other use cases where it makes a difference whether a file descriptor is kept in the fdset with removed=1 or whether it's actually removed from the fdset? I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before. Ok no problem. 6. qemu_close (need to replace all close calls in block layer with qemu_close) - qemu_close decrements refcount for fdset - qemu_close closes all fds that have (refcount == 0 (!inuse || remove)) - qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor - monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset And close all fds with refcount == 0. Yes, this makes sense. It also makes sense to close removed fds with refcount == 0 in the remove-fd function. Basically this will be the same thing we do in qemu_close. We'll close any fds that evaulate the following as true: (refcount == 0 (!inuse || remove)) Yes, whatever condition we'll come up with, but it should be the same and checked in all places where its value might change. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant cor...@linux.vnet.ibm.com wrote: On 07/04/2012 04:09 AM, Kevin Wolf wrote: Am 03.07.2012 20:21, schrieb Corey Bryant: On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. Yes, it's less than ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call. I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP. If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection. But to be honest, I don't fully understand why this is needed. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 06.07.2012 19:40, schrieb Corey Bryant: On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. Does it matter? I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 It also decreases the reference count because the monitor doesn't use it any more. 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) So here the refcount becomes 0 and the fdset is closed. 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 10:05 AM, Luiz Capitulino wrote: On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant cor...@linux.vnet.ibm.com wrote: On 07/04/2012 04:09 AM, Kevin Wolf wrote: Am 03.07.2012 20:21, schrieb Corey Bryant: On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. Yes, it's less than ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call. I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP. If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection. But to be honest, I don't fully understand why this is needed. I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 6. qemu_close (need to replace all close calls in block layer with qemu_close) - qemu_close decrements refcount for fdset - qemu_close closes all fds that have (refcount == 0 (!inuse || remove)) - qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor - monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset 8. connecting the QMP monitor - monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset 9.
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 10:04 AM, Kevin Wolf wrote: Am 06.07.2012 19:40, schrieb Corey Bryant: On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. Does it matter? I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 It also decreases the reference count because the monitor doesn't use it any more. I don't think that will work because refcount is for the entire fdset. So we can't decrement the refcount for every fd that is removed from the fdset. I think it is much simpler if we only increment refcount for an fdset on qemu_open, and only decrement refcount on qemu_close. 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) So here the refcount becomes 0 and the fdset is closed. 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters. Before we go back and forth on this thread, would you mind taking a look at the last email I sent to Luiz? It includes all the design points that I'm currently working from. I think it's a good level set and we can work off that thread if there are still any issues. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 09.07.2012 17:23, schrieb Corey Bryant: I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 It also decreases the reference count because the monitor doesn't use it any more. I don't think that will work because refcount is for the entire fdset. So we can't decrement the refcount for every fd that is removed from the fdset. I think it is much simpler if we only increment refcount for an fdset on qemu_open, and only decrement refcount on qemu_close. Ah right... So this would only work if we had explicit fdset-create/close commands, where the former would increase the refcount and the latter decrease it (fdset-open would be optional but I like symmetry) Maybe we need (or want) that anyway, but I need to think more about it first. 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) So here the refcount becomes 0 and the fdset is closed. 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters. Before we go back and forth on this thread, would you mind taking a look at the last email I sent to Luiz? It includes all the design points that I'm currently working from. I think it's a good level set and we can work off that thread if there are still any issues. Ok, I'll have a look. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before. 6. qemu_close (need to replace all close calls in block layer with qemu_close) - qemu_close decrements refcount for fdset - qemu_close closes all fds that have (refcount == 0 (!inuse || remove)) - qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor - monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset And close all fds with refcount == 0. 8. connecting the QMP monitor - monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset 9. query-fdsets - returns all fdsets and fds that don't have remove flag on Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Mon, 09 Jul 2012 17:46:00 +0200 Kevin Wolf kw...@redhat.com wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) The fdset should be specified by the client, like: { execute: add-fd-set, arguments: { set-name: /dev/fdset/1 } } - add-fd monitor function initializes the monitor inuse flag for the fdset to true Why do we need the inuse flag? - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. Agreed. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 11:46 AM, Kevin Wolf wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? Because if refcount is 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close. The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. fds in an fd set really need to be kept around until zero block devices reference them. At that point, if '(refcount == 0 (!inuse || remove))' is true, then we'll officially close the fd. I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before. Ok no problem. 6. qemu_close (need to replace all close calls in block layer with qemu_close) - qemu_close decrements refcount for fdset - qemu_close closes all fds that have (refcount == 0 (!inuse || remove)) - qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor - monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset And close all fds with refcount == 0. Yes, this makes sense. It also makes sense to close removed fds with refcount == 0 in the remove-fd function. Basically this will be the same thing we do in qemu_close. We'll close any fds that evaulate the following as true: (refcount == 0 (!inuse || remove)) 8. connecting the QMP monitor - monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset 9. query-fdsets - returns all fdsets and fds that don't have remove flag on -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Mon, 09 Jul 2012 13:35:19 -0400 Corey Bryant cor...@linux.vnet.ibm.com wrote: On 07/09/2012 11:46 AM, Kevin Wolf wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? Because if refcount is 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close. But then the refcount is associated with the set, not with any particular fd. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 12:18 PM, Luiz Capitulino wrote: On Mon, 09 Jul 2012 17:46:00 +0200 Kevin Wolf kw...@redhat.com wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) The fdset should be specified by the client, like: { execute: add-fd-set, arguments: { set-name: /dev/fdset/1 } } We could make the fdset name configurable. Then we wouldn't force clients into using the file=/dev/fdset/1 alias on commands like device_add. The risk with this is that clients need to be careful and use a unique name that doesn't conflict with any existing file names. The way it is currently, if add-fd is not given an fdset name, it will generate a new fdset and add the fd to it. add-fd always returns the fdset (int) and received fd (int) on success. (e.g. fdset=1 corresponds to file=/dev/fdset/1). Then the 2nd time you want to add an fd to this set, you have to specify fdset=1 on add-fd. I'll do whatever you all prefer. I think there are advantages in both approaches, however I'm leaning toward the current approach and forcing use of /dev/fdset/1 to keep all fd set names consistent. - add-fd monitor function initializes the monitor inuse flag for the fdset to true Why do we need the inuse flag? This helps to prevent fd leakage. Let's say the client adds fd=3 to /dev/fdset/1 and then the QMP monitor disconnects. Since the following evaluates to true when the monitor disconnects, the fd will be closed: (refcount == 0 (!inuse || remove)) Note: refcount is incremented for the fdset on qemu_open and decremented on qemu_close, and no commands caused it to be incremented from zero in this example. - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. Agreed. Please take a look at my recent reply to Kevin about this and let me know if it clears things up. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 01:48 PM, Luiz Capitulino wrote: On Mon, 09 Jul 2012 13:35:19 -0400 Corey Bryant cor...@linux.vnet.ibm.com wrote: On 07/09/2012 11:46 AM, Kevin Wolf wrote: Am 09.07.2012 17:05, schrieb Corey Bryant: I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 What was again the reason why we keep removed fds in the fdset at all? Because if refcount is 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close. But then the refcount is associated with the set, not with any particular fd. Exactly, yes that's what we're doing. Sorry, I thought that was clear in the design overview I sent earlier today. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/09/2012 11:05 AM, Corey Bryant wrote: On 07/09/2012 10:05 AM, Luiz Capitulino wrote: On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant cor...@linux.vnet.ibm.com wrote: On 07/04/2012 04:09 AM, Kevin Wolf wrote: Am 03.07.2012 20:21, schrieb Corey Bryant: On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. Yes, it's less than ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call. I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP. If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection. But to be honest, I don't fully understand why this is needed. I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: --- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -- 1. add-fd - fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) - add-fd monitor function initializes the monitor inuse flag for the fdset to true - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 - fd is passed via SCM rights - add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) - add-fd monitor function initializes the remove flag for the fd to false - add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit - qemu_open performs reopen by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false - qemu_open increments refcount for the fdset - Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 - remove-fd monitor function fails if fdset doesn't exist - remove-fd monitor function turns on remove flag for fd=4 6. qemu_close (need to replace all close calls in block layer with qemu_close) - qemu_close decrements refcount for fdset - qemu_close closes all fds that have (refcount == 0 (!inuse || remove)) - qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor - monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset 8. connecting the QMP monitor - monitor connect visits all fdsets on monitor and turns on monitor in-use flag for
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 05.07.2012 18:37, schrieb Corey Bryant: There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour. If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either. Yeah this is a problem. Could we introduce another flag to cover this? Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag. I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open(). Can this fix be delivered after the fd passing patch series? Sure, we can't fix everything at once. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin I think we need the granularity of having an in-use flag per fd. Here's an example of why: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu turns on the remove flag for fd=4; refcount is 0 so all fds in fdset1 are closed I think we need the granularity of having an in-use flag per fd. If we track monitor in-use in the reference count, then we are tracking it for the fdset and I think this could cause leaks. In the following example, we have a refcount for the fdset, an in-use flag per fd, and a remove flag per fd. We're only incrementing/decrementing refcount in qemu_open/qemu_close. In the following example, we have a refcount for the fdset, and a remove flag per fd. We're incrementing refcount when the fdset is first created, when QMP re-connects, and in qemu_open. We're decrementing refcount when QMP disconnects, and in qemu_close. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client crashes, so all tracked fdsets are visited; fd=4 has not yet been passed to 'remove-fd', so its remove flag is off; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Ugh... please disregard this. I hit send accidentally. On 07/06/2012 01:14 PM, Corey Bryant wrote: On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin I think we need the granularity of having an in-use flag per fd. Here's an example of why: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu turns on the remove flag for fd=4; refcount is 0 so all fds in fdset1 are closed I think we need the granularity of having an in-use flag per fd. If we track monitor in-use in the reference count, then we are tracking it for the fdset and I think this could cause leaks. In the following example, we have a refcount for the fdset, an in-use flag per fd, and a remove flag per fd. We're only incrementing/decrementing refcount in qemu_open/qemu_close. In the following example, we have a refcount for the fdset, and a remove flag per fd. We're incrementing refcount when the fdset is first created, when QMP re-connects, and in qemu_open. We're decrementing refcount when QMP disconnects, and in qemu_close. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client crashes, so all tracked fdsets are visited; fd=4 has not yet been passed to 'remove-fd', so its remove flag is off; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed In the following case, we have an in-use and remove flag per fd and we only increment/decrement refcount on qemu_open/qemu_close: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's remove flag is initialized to off and in-use flag is initialized to on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; fd=4's in-use flag is turned off; fd=4 is left open because it is still in-use by the block device (refcount is still 1) 4. client re-establishes QMP connection, refcount for fdset1 is still 1; fd=4's in-use flag is turned on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 1 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 0; fd=4 is closed because (refcount == 0 (!inuse || removed)) is true 7. Sometime later.. QMP disconnects -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/06/2012 01:40 PM, Corey Bryant wrote: On 07/06/2012 05:11 AM, Kevin Wolf wrote: Am 05.07.2012 19:00, schrieb Eric Blake: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed In the following case, we have an in-use and remove flag per fd and we only increment/decrement refcount on qemu_open/qemu_close: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's remove flag is initialized to off and in-use flag is initialized to on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; fd=4's in-use flag is turned off; fd=4 is left open because it is still in-use by the block device (refcount is still 1) 4. client re-establishes QMP connection, refcount for fdset1 is still 1; fd=4's in-use flag is turned on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 1 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 0; fd=4 is closed because (refcount == 0 (!inuse || removed)) is true 7. Sometime later.. QMP disconnects I have one modification to this. It looks the inuse flag could be at the fdset level, rather than an inuse flag per fd. Although.. you did mention multiple monitors, so perhaps inuse should be a counter? -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/04/2012 04:00 AM, Kevin Wolf wrote: Am 03.07.2012 19:03, schrieb Eric Blake: 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove? Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number. Good point. So pass-fd or whatever it will be called needs to returnn both fdset number and the individual fd number. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. I'm not sure I understand what you mean. pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment This is an interesting idea. I've never thought about what to do with a reconnect. It felt a bit odd at first, but now your proposal totally makes sense to me. For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more. If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue? 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged 1. client
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 05.07.2012 16:22, schrieb Corey Bryant: For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more. If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue? Come on, requiring realistic examples isn't fair. ;-) Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction. There are enough reasons that an fd in an fdset exists and is not opened, but I can't think of one why it should be dropped. In use by whom? If it's still in use in qemu (as in in-use flag would be set) and we have a refcount of zero, then that's a bug. In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd. I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions. There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour. If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either. Yeah this is a problem. Could we introduce another flag to cover this? Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag. I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open(). Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/04/2012 04:09 AM, Kevin Wolf wrote: Am 03.07.2012 20:21, schrieb Corey Bryant: On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. Yes, it's less than ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call. I'll go with this approach until someone objects. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/05/2012 10:51 AM, Kevin Wolf wrote: Am 05.07.2012 16:22, schrieb Corey Bryant: For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more. If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue? Come on, requiring realistic examples isn't fair. ;-) Heh :) Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction. Good point. I thought it would be useful to run through the examples again with each fdset having a refcount, and each fd having an in-use flag. One difference though is that refcount is only incremented/decremented when qemu_open/qemu_close are called. I think this works and covers Kevin's concerns. Actually it might be a bit simpler too. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/05/2012 12:35 PM, Corey Bryant wrote: On 07/05/2012 10:51 AM, Kevin Wolf wrote: Am 05.07.2012 16:22, schrieb Corey Bryant: For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more. If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue? Come on, requiring realistic examples isn't fair. ;-) Heh :) Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction. Good point. I thought it would be useful to run through the examples again with each fdset having a refcount, and each fd having an in-use flag. One difference though is that refcount is only incremented/decremented when qemu_open/qemu_close are called. I think this works and covers Kevin's concerns. Actually it might be a bit simpler too. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 This says that an in-use fd comes back into use after a crash... 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). An fd is then closed when 'refcount == 0 (!inuse || removed)'. On monitor disconnect, the inuse flag is cleared, and on reconnect, it is set; but that does not impact the removed flag. And the 'query-fds' command would omit any fd with the 'removed' flag, even when the fd is still kept open internally thanks to the refcount being nonzero. But I'm definitely agreeing that tying the refcount to the set rather than to individual fds within the set makes sense; you still avoid the fd leak in that all fds in the set are closed when both the monitor has disavowed the set and when qemu_close() has finished using any of the fds. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/05/2012 01:00 PM, Eric Blake wrote: On 07/05/2012 10:35 AM, Corey Bryant wrote: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 This says that an in-use fd comes back into use after a crash... 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. Ah, yeah I missed that. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). An fd is then closed when 'refcount == 0 (!inuse || removed)'. On monitor disconnect, the inuse flag is cleared, and on reconnect, it is set; but that does not impact the removed flag. And the 'query-fds' command would omit any fd with the 'removed' flag, even when the fd is still kept open internally thanks to the refcount being nonzero. I agree. Having the 2 flags makes sense and solves the issues you mentioned above. -- Regards, Corey But I'm definitely agreeing that tying the refcount to the set rather than to individual fds within the set makes sense; you still avoid the fd leak in that all fds in the set are closed when both the monitor has disavowed the set and when qemu_close() has finished using any of the fds. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 03.07.2012 19:03, schrieb Eric Blake: 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove? Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number. Good point. So pass-fd or whatever it will be called needs to returnn both fdset number and the individual fd number. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. I'm not sure I understand what you mean. pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment This is an interesting idea. I've never thought about what to do with a reconnect. It felt a bit odd at first, but now your proposal totally makes sense to me. For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more. 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 03.07.2012 20:21, schrieb Corey Bryant: On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. Yes, it's less than ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Mon, Jul 02, 2012 at 04:31:09PM -0600, Eric Blake wrote: On 07/02/2012 04:02 PM, Corey Bryant wrote: Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset. I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; If libvirt has only granted read-only access to the file with sVirt, then passing a O_RDWR file handle to QEMU will result in an SELinux denial, even if QEMU doesn't try to do I/O on it. So this is out of the question. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 03.07.2012 00:31, schrieb Eric Blake: On 07/02/2012 04:02 PM, Corey Bryant wrote: Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset. It doesn't have to be up front. There's no reason not to have monitor commands that add or remove fds from a given fdset later. I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd, Yes, this is exactly my consideration. Having a read-only file opened as O_RDWR gives us the chance to make stupid mistakes. but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads. But this would assume that libvirt knows exactly when a reopen happens, for each current and future qemu version. I wouldn't want to tie qemu's internals so closely to the management application, even if libvirt could probably get it reasonably right (allow rw before sending a monitor command; revoke rw when receiving a QMP event that the commit has completed). It wouldn't work if we had a qemu-initiated ro-rw transition, but I don't think we have one at the moment. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu. If you want a query-fdsets, that should be easy enough. Actually, we might be able to even make the command transactionable. This would of course require blockdev-add to be transactionable as well to be of any use. And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'? Do you think a timeout is helpful? Can't we just drop libvirt's reference automatically if the monitor connection gets closed? The BH/timer thing Corey mentioned is more about the qemu internal problem that during a reopen there may be a short period where the old fd is closed, but the new one not yet opened, so the fdset might need to survive a short period with refcount 0. Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)? As I said above, yes, I think this makes a lot sense. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/02/2012 06:31 PM, Eric Blake wrote: On 07/02/2012 04:02 PM, Corey Bryant wrote: Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset. I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd, but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu. And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'? Yes see the description of the close bool on pass-fds below which determines how the fds are closed. It's not an atomic operation, but it does handle Dan's concern of preventing fd leaks. If libvirt did crash after the pass-fd, then couldn't it just abandon the fd set (which will get closed) and libvirt could send a new fd set and work with that? Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)? Yeah I see what you're saying. I was also wondering if it would make sense for qemu to consume the passed fds and use those (ie. no dup()). This would prevent the issues of having to close the fds that linger on the monitor. But I don't know if it's realistic for libvirt to know how many open calls qemu will need to make (this would need 1 fd passed per open call). close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} - returns a string /dev/fdset/1) that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens /dev/fdset/1 by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds *How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close(). Still, it is certainly an option worth thinking about, and I'm hoping we can come to a solid design that everyone agrees provides the desired security and flexibility without having to recode every single existing command. I agree. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/02/2012 06:02 PM, Corey Bryant wrote: On 06/26/2012 06:54 PM, Eric Blake wrote: On 06/26/2012 04:28 PM, Corey Bryant wrote: With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals: Proposal One: enhance every command to take an fd:name protocol. PRO: Successful use of a name removes the fd from the 'getfd' list. CON: Lots of commands to change. CON: The command sequence is not atomic. CON: Block layer does not have a file name tied to the fd, so the reopen operation also needs to learn the fd:name protocol, but since we're already touching the command it's not much more work. USAGE: 1. getfd FDSET={M}, ties new fd to name 2. drive_add fd=name - looks up the fd by name from the list 3a. on success, fd is no longer in the list, drive_add consumed it 3b. on failure, libvirt must call 'closefd name' to avoid a leak 4. getfd FDSET={M2}, ties new fd to newname 5. block-commit fd=newname - looks up fd by name from the list 6a. on success, fd is no longer in the list, block-commit consumed it 6b. on failure, libvirt must call 'closefd newname' to avoid a leak Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then update that name to the new fd in advance of any operation that needs to do a reopen. PRO: All other commands work without impact by using qemu_open(), by getting a clone of the permanent name. CON: The permanent name must remain open as long as qemu might need to reopen it, and reopening needs the pass-fd force option. CON: The command sequence is not atomic. USAGE: 1. pass_fd FDSET={M} - returns an integer N (or a string /dev/fd/N) showing QEMU's permanent name of the FD 2. drive_add file=permanent name means that qemu_open() will clone the fd, and drive_add is now using yet another FD while N remains open; meanwhile, the block layer knows that the drive is tied to the permanent name 3. pass-fd force FDSET={M2} - replaces fd N with the passed M2, and still returns /dev/fd/N 4. block-commit - looks up the block-device name (/dev/fd/N), which maps back to the permanent fd, and gets a copy of the newly passed M2 5. on completion (success or failure), libvirt must call 'closefd name' to avoid a leak Proposal Three: Use thread-local fds passed alongside any command, rather than passing via a separate command PRO: All commands can now atomically handle fd passing PRO: Commands that only need a one-shot open can now use fd CON: Commands that need to reopen still need modification to allow a name override during the reopen 1. drive_add nfds=1 file=/dev/fdlist/1 FDSET={M} - on success, the fd is used as the block file, on failure it is atomically closed, so there is no leak either way. However, the block file has no permanent name. 2. block-commit nfds=1 file-override=/dev/fdlist/1 FDSET={M2} - file override must be passed again (since we have no guarantee that the /dev/fdlist/1 of _this_ command matches the command-local naming used in the previous command), but the fd is again used atomically Under proposal 3, there is no need to dup fd's, merely only to check that qemu_open(/dev/fdlist/n, flags) has compatible flags with the fd received via FDSET. And the fact that things are made atomic means that libvirt no longer has to worry about calling closefd, nor does it have to worry about being interrupted between two monitor commands and not knowing what state qemu is in. But it does mean teaching every possible command that wants to do a reopen to learn how to use fd overrides rather than to reuse the permanent name that was originally in place on the first open. Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns':
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 03.07.2012 17:40, schrieb Corey Bryant: Thanks again for taking time to discuss this at today's QEMU community call. Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design. Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd. I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd? If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet. 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. More functional details: -If libvirt crashes it can call query-fd /dev/fdset/1 to determine which fds are open in the set. We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset. -If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use? In use by whom? If it's still in use in qemu (as in in-use flag would be set) and we have a refcount of zero, then that's a bug. -This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called. Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied. Thanks, this is a good and as far as I can tell complete summary of what we discussed. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/03/2012 11:59 AM, Kevin Wolf wrote: Am 03.07.2012 17:40, schrieb Corey Bryant: Thanks again for taking time to discuss this at today's QEMU community call. Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design. Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd. I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd? If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet. That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set. I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd. 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove? 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. I'm not sure I understand what you mean. More functional details: -If libvirt crashes it can call query-fd /dev/fdset/1 to determine which fds are open in the set. We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset. Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets. -If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use? In use by whom? If it's still in use in qemu (as in in-use flag would be set) and we have a refcount of zero, then that's a bug. In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd. -This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called. Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied. Thanks, this is a good and as far as I can tell complete summary of what we discussed. Kevin Definitely! Thank you for all the input. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/03/2012 10:25 AM, Corey Bryant wrote: I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd? If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet. That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set. I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd. Correct - since we will be adding a remove-fd, then that command needs to know both the fdset name and the individual fd within the set to be removed. 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove? Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. I'm not sure I understand what you mean. pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in use by the monitor, refcount is now decremented to 0 and fd=4
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/03/2012 01:03 PM, Eric Blake wrote: On 07/03/2012 10:25 AM, Corey Bryant wrote: I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd? If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet. That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set. I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd. Correct - since we will be adding a remove-fd, then that command needs to know both the fdset name and the individual fd within the set to be removed. Ok 2. drive_add file=/dev/fdset/1 - qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} - qemu adds fd to set named /dev/fdset/1 - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit - qemu_open reopens /dev/fdset/1 by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 - caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here. Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove? Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number. I understand what you were saying now, although I guess it's not applicable at this point. I'll plan on returning the fd from add-fd and passing the fd on the remove-fd command. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail. I'm not sure I understand what you mean. pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment For some examples: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it Just a note that the fd above also hasn't yet been referenced by a drive-add/device-add, so it will be closed in step 2. 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?) 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?) I'm not sure we need the refcount increment on reconnect. 'query-fds' should provide enough information for the new monitor to know what fdsets are still in use by qemu, even though they are no longer available to 'remove-fd' from the monitor, and if the monitor is worried about keeping the fd set alive it can call 'add-fd' to again associate a new fd with the set. The lifetime of a set is thus as long as any of its associated fds have a non-zero refcount. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/03/2012 02:00 PM, Eric Blake wrote: On 07/03/2012 11:46 AM, Corey Bryant wrote: Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects. Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. This doesn't sound ideal. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd. I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect. 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?) I'm not sure we need the refcount increment on reconnect. 'query-fds' should provide enough information for the new monitor to know what fdsets are still in use by qemu, even though they are no longer available to 'remove-fd' from the monitor, and if the monitor is worried about keeping the fd set alive it can call 'add-fd' to again associate a new fd with the set. The lifetime of a set is thus as long as any of its associated fds have a non-zero refcount. This sounds good to me. And qemu_open will need to make sure the monitor in_use flag is true before dup'ing an fd. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/26/2012 06:54 PM, Eric Blake wrote: On 06/26/2012 04:28 PM, Corey Bryant wrote: With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals: Proposal One: enhance every command to take an fd:name protocol. PRO: Successful use of a name removes the fd from the 'getfd' list. CON: Lots of commands to change. CON: The command sequence is not atomic. CON: Block layer does not have a file name tied to the fd, so the reopen operation also needs to learn the fd:name protocol, but since we're already touching the command it's not much more work. USAGE: 1. getfd FDSET={M}, ties new fd to name 2. drive_add fd=name - looks up the fd by name from the list 3a. on success, fd is no longer in the list, drive_add consumed it 3b. on failure, libvirt must call 'closefd name' to avoid a leak 4. getfd FDSET={M2}, ties new fd to newname 5. block-commit fd=newname - looks up fd by name from the list 6a. on success, fd is no longer in the list, block-commit consumed it 6b. on failure, libvirt must call 'closefd newname' to avoid a leak Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then update that name to the new fd in advance of any operation that needs to do a reopen. PRO: All other commands work without impact by using qemu_open(), by getting a clone of the permanent name. CON: The permanent name must remain open as long as qemu might need to reopen it, and reopening needs the pass-fd force option. CON: The command sequence is not atomic. USAGE: 1. pass_fd FDSET={M} - returns an integer N (or a string /dev/fd/N) showing QEMU's permanent name of the FD 2. drive_add file=permanent name means that qemu_open() will clone the fd, and drive_add is now using yet another FD while N remains open; meanwhile, the block layer knows that the drive is tied to the permanent name 3. pass-fd force FDSET={M2} - replaces fd N with the passed M2, and still returns /dev/fd/N 4. block-commit - looks up the block-device name (/dev/fd/N), which maps back to the permanent fd, and gets a copy of the newly passed M2 5. on completion (success or failure), libvirt must call 'closefd name' to avoid a leak Proposal Three: Use thread-local fds passed alongside any command, rather than passing via a separate command PRO: All commands can now atomically handle fd passing PRO: Commands that only need a one-shot open can now use fd CON: Commands that need to reopen still need modification to allow a name override during the reopen 1. drive_add nfds=1 file=/dev/fdlist/1 FDSET={M} - on success, the fd is used as the block file, on failure it is atomically closed, so there is no leak either way. However, the block file has no permanent name. 2. block-commit nfds=1 file-override=/dev/fdlist/1 FDSET={M2} - file override must be passed again (since we have no guarantee that the /dev/fdlist/1 of _this_ command matches the command-local naming used in the previous command), but the fd is again used atomically Under proposal 3, there is no need to dup fd's, merely only to check that qemu_open(/dev/fdlist/n, flags) has compatible flags with the fd received via FDSET. And the fact that things are made atomic means that libvirt no longer has to worry about calling closefd, nor does it have to worry about being interrupted between two monitor commands and not knowing what state qemu is in. But it does mean teaching every possible command that wants to do a reopen to learn how to use fd overrides rather than to reuse the permanent name that was originally in place on the first open. Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } close-fds: { 'command':
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 07/02/2012 04:02 PM, Corey Bryant wrote: Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset. I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd, but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu. And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'? Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)? close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} - returns a string /dev/fdset/1) that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens /dev/fdset/1 by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds *How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close(). Still, it is certainly an option worth thinking about, and I'm hoping we can come to a solid design that everyone agrees provides the desired security and flexibility without having to recode every single existing command. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Wed, 27 Jun 2012 10:58:01 +0200 Kevin Wolf kw...@redhat.com wrote: Am 26.06.2012 20:40, schrieb Corey Bryant: Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach. I'm not against introducing this nfd thing as a general feature for QMP commands instead of a separate pass-fd command. It's not obvious to me that everyone would agree with that, so let's CC Luiz at least. I don't have objections either, but I want to point out two things. First, we've agreed on adding new commands instead of extending existing ones. I feel that not everyone is agreement with this and maybe we could revisit this topic, but in principle new commands should be added. Secondly, this is just a small suggestion, but we're in a point in time where several block commands are being added. I think it's a good moment to think hard how the ideal block API would look like and try to design new commands based on that. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/27/2012 05:51 AM, Kevin Wolf wrote: Am 27.06.2012 11:20, schrieb Daniel P. Berrange: On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote: The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands. This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns. What I don't understand though is that when you're reopening FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have. That's easy: Because I don't know that I'm dealing with an FD. I think originally the whole point in this /dev/fd/ thing was that it would transparently enable the functionality for all parts of qemu: The block layer, of course, but also new char devices, migration targets, screenshot files or whatever else can deal with files. In this design the block layer doesn't even know that it's not a regular file. If we now decided to go for a design where the /dev/fd path isn't enough, but changes are needed to each subsystem where it should work, then this point doesn't apply any more and I cast my vote for abandoning everything starting with /dev/ and introducing a proper fd: protocol in the block layer, so that the block layer at least has a chance to know that it has to do treat this file specially. I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own. If we're not going to make it transparent and if we don't care about QMP command bloat, we can choose completely different designs, yes. And then qemu could probably try to internally work around a suboptimal API. Kevin Are there any more thoughts on how to move forward with this? It seems like we're at a stand-still with the discussion. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 27.06.2012 00:28, schrieb Corey Bryant: On 06/26/2012 04:50 PM, Luiz Capitulino wrote: On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf kw...@redhat.com wrote: Am 26.06.2012 11:10, schrieb Daniel P. Berrange: I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. No, libvirt doesn't know it, because the original file descriptor is still needed when qemu decides to reopen the file. So I think qemu needs some kind of refcounting anyway. One of the references is held by libvirt and it can drop it with close_fd, and the other one would be for the BlockDriverState or whatever you use the FD with. (There's a tricky part: When do you actually close the FD? If libvirt has dropped its reference and qemu reopens, for example because it has just probed the image format, we have a short time where the refcount would be 0, but we can't drop it anyway.) Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 26.06.2012 20:40, schrieb Corey Bryant: Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach. I'm not against introducing this nfd thing as a general feature for QMP commands instead of a separate pass-fd command. It's not obvious to me that everyone would agree with that, so let's CC Luiz at least. The that I'm unsure about is what we should do with qemu reopening the file. If you close the fd immediately, you obviously can't do that any more. Even worse, libvirt doesn't have a unique ID for each passed file descriptor any more, so even though we have introduced a QMP feature for file descriptor passing, we would still need to touch all commands to allow assigning a new fd. I think having one stable original fd that libvirt can refer to is much nicer. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 27.06.2012 00:54, schrieb Eric Blake: It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals: Thanks for the summary. In fact, after having read this, I start thinking that we're oversimplifying things because we're only thinking about one use case, block-commit. There are more: Live snapshot doesn't only put a new image on top, it must also make the old top-level image read-only. This isn't bad per se, but it shows that QMP commands can easily become bloated if you decide to change every command. The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands. This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote: Am 27.06.2012 00:54, schrieb Eric Blake: It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals: Thanks for the summary. In fact, after having read this, I start thinking that we're oversimplifying things because we're only thinking about one use case, block-commit. There are more: Live snapshot doesn't only put a new image on top, it must also make the old top-level image read-only. This isn't bad per se, but it shows that QMP commands can easily become bloated if you decide to change every command. The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands. This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns. What I don't understand though is that when you're reopening FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have. I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 27.06.2012 11:20, schrieb Daniel P. Berrange: On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote: The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands. This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns. What I don't understand though is that when you're reopening FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have. That's easy: Because I don't know that I'm dealing with an FD. I think originally the whole point in this /dev/fd/ thing was that it would transparently enable the functionality for all parts of qemu: The block layer, of course, but also new char devices, migration targets, screenshot files or whatever else can deal with files. In this design the block layer doesn't even know that it's not a regular file. If we now decided to go for a design where the /dev/fd path isn't enough, but changes are needed to each subsystem where it should work, then this point doesn't apply any more and I cast my vote for abandoning everything starting with /dev/ and introducing a proper fd: protocol in the block layer, so that the block layer at least has a chance to know that it has to do treat this file specially. I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own. If we're not going to make it transparent and if we don't care about QMP command bloat, we can choose completely different designs, yes. And then qemu could probably try to internally work around a suboptimal API. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/27/2012 04:43 AM, Kevin Wolf wrote: Am 27.06.2012 00:28, schrieb Corey Bryant: On 06/26/2012 04:50 PM, Luiz Capitulino wrote: On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf kw...@redhat.com wrote: Am 26.06.2012 11:10, schrieb Daniel P. Berrange: I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. No, libvirt doesn't know it, because the original file descriptor is still needed when qemu decides to reopen the file. So I think qemu needs some kind of refcounting anyway. One of the references is held by libvirt and it can drop it with close_fd, and the other one would be for the BlockDriverState or whatever you use the FD with. (There's a tricky part: When do you actually close the FD? If libvirt has dropped its reference and qemu reopens, for example because it has just probed the image format, we have a short time where the refcount would be 0, but we can't drop it anyway.) Kevin Yes, the refcount getting to 0 while the fd is still in use is a tough one. It just seems like the management app would be better equipped to understand when a drive is no longer needed. Isn't this what drive_del is for? If qemu is never told that the drive is no longer needed, then the fd remains on the monitor, and it's not a leak. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Am 26.06.2012 11:10, schrieb Daniel P. Berrange: I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. The point of close_fd is more that libvirt still keeps a handle and can make use of this fd. One example where this is necessary is reopening the FD of a backing file in order to perform a live commit operation. Kevin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: Daniel P. Berrange berra...@redhat.com Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them Add support for a syntax /dev/fdlist/N, where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread. The QMP client sends { execute: drive_add, arguments: { filename: /dev/fdlist/0, backingfilename: /dev/fdlist/1 }, nfds: 2 } followed by nfds * SCM_RIGHTS packets The FDs received along with the drive_add command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path /dev/fdlist/0 to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed. THe reason for choice of /dev/fdlist/N, is to allow the /dev/fd/N syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index. The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads. The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing. NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging - Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN. diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) const mon_cmd_t *cmd; const char *cmd_name; Monitor *mon = cur_mon; +int nfds; +size_t i; args = input = NULL; @@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) goto err_out; } +nfds = qdict_get_int(input, nfds); +for (i = 0 ; i nfds ; i++) { + int fd = qemu_chr_fe_get_msgfd(mon-chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + goto err_out; + } + + qemu_fdlist_add(fd); +} + if (handler_is_async(cmd)) { err = qmp_async_cmd_handler(mon, cmd, args); if (err) { @@ -4552,6 +4565,7 @@ err_out: out: QDECREF(input); QDECREF(args); +qemu_fdlist_closeall(); } /** diff --git a/osdep.c b/osdep.c index 03817f0..0849cb6 100644 --- a/osdep.c +++ b/osdep.c @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif } +typedef
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. 2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd. -- Regards, Corey From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: Daniel P. Berrange berra...@redhat.com Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them Add support for a syntax /dev/fdlist/N, where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread. The QMP client sends { execute: drive_add, arguments: { filename: /dev/fdlist/0, backingfilename: /dev/fdlist/1 }, nfds: 2 } followed by nfds * SCM_RIGHTS packets The FDs received along with the drive_add command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path /dev/fdlist/0 to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed. THe reason for choice of /dev/fdlist/N, is to allow the /dev/fd/N syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index. The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads. The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing. NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging - Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN. diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) const mon_cmd_t *cmd; const char *cmd_name; Monitor *mon = cur_mon; +int nfds; +size_t i; args = input = NULL; @@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) goto err_out; } +nfds = qdict_get_int(input, nfds); +for (i = 0 ; i nfds ; i++) { + int fd = qemu_chr_fe_get_msgfd(mon-chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + goto err_out; + } + + qemu_fdlist_add(fd); +}
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/26/2012 11:37 AM, Corey Bryant wrote: On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach. 2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd. I'm still not quite sure how we'd go about this. We need a way to determine the existing QEMU fd that is to be re-associated with the new fd, when using a /dev/fdlist/0 filename. In this new approach, 0 corresponds to an index, not an fd. The prior approach of using the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy. From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: Daniel P. Berrange berra...@redhat.com Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them Add support for a syntax /dev/fdlist/N, where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread. The QMP client sends { execute: drive_add, arguments: { filename: /dev/fdlist/0, backingfilename: /dev/fdlist/1 }, nfds: 2 } followed by nfds * SCM_RIGHTS packets The FDs received along with the drive_add command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path /dev/fdlist/0 to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed. THe reason for choice of /dev/fdlist/N, is to allow the /dev/fd/N syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index. The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads. The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing. NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging - Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN. diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens)
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote: On 06/26/2012 11:37 AM, Corey Bryant wrote: On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach. 2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd. I'm still not quite sure how we'd go about this. We need a way to determine the existing QEMU fd that is to be re-associated with the new fd, when using a /dev/fdlist/0 filename. In this new approach, 0 corresponds to an index, not an fd. The prior approach of using the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy. Hmm, I'm not sure I understand what the use case is for the 'force' parameter ? In my proof of concept I left out the block of code from qemu_open() you had for dup'ing the FD with various different flags set, but that was just for brevity. I think it ought to fit in the same way, unless you're refering to a different area of the code. @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) { int ret; int mode = 0; +const char *p; + +/* Attempt dup of fd for pre-opened file */ +if (strstart(name, /dev/fdlist/, p)) { +int idx; +int fd; + +idx = qemu_parse_fd(p); +if (idx == -1) { +return -1; +} + +fd = qemu_fdlist_dup(idx); +if (fd == -1) { +return -1; +} + +/* XXX verify flags match */ eg, this part of the code you had some work related to 'flags' +return fd; +} if (flags O_CREAT) { va_list ap; Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf kw...@redhat.com wrote: Am 26.06.2012 11:10, schrieb Daniel P. Berrange: I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/26/2012 04:50 PM, Luiz Capitulino wrote: On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf kw...@redhat.com wrote: Am 26.06.2012 11:10, schrieb Daniel P. Berrange: I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} - returns a string /dev/fd/N showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd /dev/fd/N In fact, there are more steps: 4. use it successfully 5. close_fd /dev/fd/N I think it would well be possible that qemu just closes the fd when it's not used internally any more. pass-fd could have a flag indicating qemu to do that. It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
On 06/26/2012 04:42 PM, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote: On 06/26/2012 11:37 AM, Corey Bryant wrote: On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: So now from a client's POV you'd have a flow like * drive_add file=/dev/fd/N FDSET={N} IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS? Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received. And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local FDContext context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from FDContext - If success remove /dev/fd/N from FDContext - close() all FDs left in FDContext The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support /dev/fd/N for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, X, closefd). Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach. 2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd. I'm still not quite sure how we'd go about this. We need a way to determine the existing QEMU fd that is to be re-associated with the new fd, when using a /dev/fdlist/0 filename. In this new approach, 0 corresponds to an index, not an fd. The prior approach of using the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy. Hmm, I'm not sure I understand what the use case is for the 'force' parameter ? In my proof of concept I left out the block of code from qemu_open() you had for dup'ing the FD with various different flags set, but that was just for brevity. I think it ought to fit in the same way, unless you're refering to a different area of the code. The access modes (O_RDONLY, O_WRONLY, or O_RDWR) can only be set on the open() call, which is performed by libvirt. fcntl(F_SETFL) can't change them. So the point of pass-fd's force option is to allow libvirt to open() the same file with a new access mode, pass it to qemu, and copy it to the existing qemu fd. For example: fd1 = open(/path/to/file.img, O_RDONLY) fd2 = open(/path/to/file.img, O_RDWR) pass-fd disk0 (fd1) returns fd=3 (/dev/fd/3 in qemu has O_RDONLY) drive_add file:/dev/fd/3 pass-fd disk0 (fd2) returns fd=3 (/dev/fd/3 in qemu now has O_RDWR) This is required because the access_mode flags used by qemu_open() are the same as the passed fd's access_mode flags. And qemu will re-open files with different access modes. @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) { int ret; int mode = 0; +const char *p; + +/* Attempt dup of fd for pre-opened file */ +if (strstart(name, /dev/fdlist/, p)) { +int idx; +int fd; + +idx = qemu_parse_fd(p); +if (idx == -1) { +return -1; +} + +fd = qemu_fdlist_dup(idx); +if (fd == -1) { +return -1; +} + +/* XXX verify flags match */ eg, this part of the code you had some work related to 'flags' Yes, no problem and thanks again for the code. :) I'm hoping we can get some more input soon so I can get moving in one direction or another. +return fd; +} if (flags O_CREAT) { va_list ap; Daniel -- Regards, Corey -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list