Re: [RFC PATCH 09/21] contrib/gitdm: Add Nutanix to the domain map
On 04/10/20, 11:35 PM, "Philippe Mathieu-Daudé" wrote: There is a number of contributors from this domain, add its own entry to the gitdm domain map. Cc: Ani Sinha Cc: David Vrabel Cc: Felipe Franciosi Cc: Jonathan Davies Cc: Malcolm Crossley Cc: Mike Cui Cc: Peter Turschmid Cc: Prerna Saxena Cc: Raphael Norwitz Cc: Swapnil Ingle Cc: Ani Sinha Signed-off-by: Philippe Mathieu-Daudé --- One Reviewed-by/Ack-by from someone from this domain should be sufficient to get this patch merged. Ani, can you confirm the a...@anisinha.ca email? Should it go into 'individual contributors' instead? --- contrib/gitdm/domain-map| 1 + contrib/gitdm/group-map-nutanix | 2 ++ gitdm.config| 1 + 3 files changed, 4 insertions(+) create mode 100644 contrib/gitdm/group-map-nutanix diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map index 4850eab4c4..39251fd97c 100644 --- a/contrib/gitdm/domain-map +++ b/contrib/gitdm/domain-map @@ -24,6 +24,7 @@ linaro.org Linaro codesourcery.com Mentor Graphics microsoft.com Microsoft nokia.com Nokia +nutanix.com Nutanix oracle.com Oracle proxmox.com Proxmox redhat.com Red Hat diff --git a/contrib/gitdm/group-map-nutanix b/contrib/gitdm/group-map-nutanix new file mode 100644 index 00..a3f11425b3 --- /dev/null +++ b/contrib/gitdm/group-map-nutanix @@ -0,0 +1,2 @@ +raphael.s.norw...@gmail.com +a...@anisinha.ca diff --git a/gitdm.config b/gitdm.config index c01c219078..4f821ab8ba 100644 --- a/gitdm.config +++ b/gitdm.config @@ -37,6 +37,7 @@ GroupMap contrib/gitdm/group-map-cadence Cadence Design Systems GroupMap contrib/gitdm/group-map-codeweavers CodeWeavers GroupMap contrib/gitdm/group-map-ibm IBM GroupMap contrib/gitdm/group-map-janustech Janus Technologies +GroupMap contrib/gitdm/group-map-nutanix Nutanix -- 2.26.2 LGTM. Raphael is still a part of Nutanix. I see Ani has already responded about him not being with the company anymore, so you might want to add him to the individual contributors' list. Regards, Prerna
Re: [Qemu-devel] [PATCH 2/2] vhost-user: only seek a reply if needed in set_mem_table
Hi Maxime, On 08/09/16 2:04 pm, "Maxime Coquelin"wrote: >The goal of this patch is to only request a sync (reply_ack, >or get_features) in set_mem_table only when necessary. > >It should not be necessary the first time we set the table, >or when we add a new regions which hadn't been merged with an >existing ones. I don’t think so. This patch is not helping us solve the issue. The hang introduced by original use of get_features() in set_mem_table was traced down to use of TCG mode for vhost-user test. This has now been fixed via: - commit cdafe929615ec5eca71bcd5a3d12bab5678e5886 Author: Eduardo Habkost Date: Fri Sep 2 15:59:43 2016 -0300 vhost-user-test: Use libqos instead of pxe-virtio.rom vhost-user-test relies on iPXE just to initialize the virtio-net device, and doesn't do any actual packet tx/rx testing. In addition to that, the test relies on TCG, which is imcompatible with vhost. The test only worked by accident: a bug the memory backend initialization made memory regions not have the DIRTY_MEMORY_CODE bit set in dirty_log_mask. This changes vhost-user-test to initialize the virtio-net device using libqos, and not use TCG nor pxe-virtio.rom. Signed-off-by: Eduardo Habkost --- So I think the original hang seems to have been fixed with Patch 1/2 of this series alone. Regarding Patch 2/2: This patch seems to filter responses from set_mem_table only for certain updates of memory regions. It violates the definition of the REPLY_ACK feature. This feature expects the client to send a response for every call of set_mem_table. And here, qemu exits the set_mem_table() function in some cases without even waiting for the reply that is going to come in. As for use of this approach with get_features, we have already debated that on the list before : https://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg00689.html To quote: "I do not entirely agree with that. The first set_mem_table command is not much different from subsequent set_mem_table calls." Regards, Prerna
Re: [Qemu-devel] [PATCH] Revert "vhost-user: Attempt to fix a race with set_mem_table."
On 16/08/16 2:39 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Mon, Aug 15, 2016 at 04:15:08PM +0100, Peter Maydell wrote: >> On 15 August 2016 at 14:35, Michael S. Tsirkin <m...@redhat.com> wrote: >> > This reverts commit 28ed5ef16384f12500abd3647973ee21b03cbe23. >> > >> > I still think it's the right thing to do, but >> > tests have been failing sporadically. >> > >> > Revert for now, and hope to fix it before the release. >> > >> > Cc: Prerna Saxena <prerna.sax...@nutanix.com> >> > Cc: Peter Maydell <peter.mayd...@linaro.org> >> > Cc: Marc-André Lureau <mlur...@redhat.com> >> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> >> > --- >> >> Applied, thanks. I found my clang-on-x86-64 Linux Ubuntu xenial >> build would hang in vhost-user/read-guest-mem after 10 or >> so iterations, but with this revert applied it seems fine, >> so I think this commit was definitely the culprit. >> >> -- PMM > >That's nice for the RC, but I think we do want to >have the underlying issue fixed down the road. >Prerna, Marc André - any chance you could try >reproducing on an ubuntu guest? > >In particular to make sure the issue whatever it is >will not tigger once clients negotiate the >new feature bit. Sure, I’ll give it a try. Prerna
Re: [Qemu-devel] [PATCH] Revert "vhost-user: Attempt to fix a race with set_mem_table."
Ack. You beat me to the patch by a few minutes :) Prerna On 15/08/16 7:05 pm, "Michael S. Tsirkin" <m...@redhat.com> wrote: >This reverts commit 28ed5ef16384f12500abd3647973ee21b03cbe23. > >I still think it's the right thing to do, but >tests have been failing sporadically. > >Revert for now, and hope to fix it before the release. > >Cc: Prerna Saxena <prerna.sax...@nutanix.com> >Cc: Peter Maydell <peter.mayd...@linaro.org> >Cc: Marc-André Lureau <mlur...@redhat.com> >Signed-off-by: Michael S. Tsirkin <m...@redhat.com> >--- > hw/virtio/vhost-user.c | 127 +++-- > 1 file changed, 60 insertions(+), 67 deletions(-) > >diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >index 1a7d53c..b57454a 100644 >--- a/hw/virtio/vhost-user.c >+++ b/hw/virtio/vhost-user.c >@@ -263,6 +263,66 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, >uint64_t base, > return 0; > } > >+static int vhost_user_set_mem_table(struct vhost_dev *dev, >+struct vhost_memory *mem) >+{ >+int fds[VHOST_MEMORY_MAX_NREGIONS]; >+int i, fd; >+size_t fd_num = 0; >+bool reply_supported = virtio_has_feature(dev->protocol_features, >+ >VHOST_USER_PROTOCOL_F_REPLY_ACK); >+ >+VhostUserMsg msg = { >+.request = VHOST_USER_SET_MEM_TABLE, >+.flags = VHOST_USER_VERSION, >+}; >+ >+if (reply_supported) { >+msg.flags |= VHOST_USER_NEED_REPLY_MASK; >+} >+ >+for (i = 0; i < dev->mem->nregions; ++i) { >+struct vhost_memory_region *reg = dev->mem->regions + i; >+ram_addr_t offset; >+MemoryRegion *mr; >+ >+assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); >+mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, >+ ); >+fd = memory_region_get_fd(mr); >+if (fd > 0) { >+msg.payload.memory.regions[fd_num].userspace_addr = >reg->userspace_addr; >+msg.payload.memory.regions[fd_num].memory_size = >reg->memory_size; >+msg.payload.memory.regions[fd_num].guest_phys_addr = >reg->guest_phys_addr; >+msg.payload.memory.regions[fd_num].mmap_offset = offset; >+assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); >+fds[fd_num++] = fd; >+} >+} >+ >+msg.payload.memory.nregions = fd_num; >+ >+if (!fd_num) { >+error_report("Failed initializing vhost-user memory map, " >+ "consider using -object memory-backend-file share=on"); >+return -1; >+} >+ >+msg.size = sizeof(msg.payload.memory.nregions); >+msg.size += sizeof(msg.payload.memory.padding); >+msg.size += fd_num * sizeof(VhostUserMemoryRegion); >+ >+if (vhost_user_write(dev, , fds, fd_num) < 0) { >+return -1; >+} >+ >+if (reply_supported) { >+return process_message_reply(dev, msg.request); >+} >+ >+return 0; >+} >+ > static int vhost_user_set_vring_addr(struct vhost_dev *dev, > struct vhost_vring_addr *addr) > { >@@ -477,73 +537,6 @@ static int vhost_user_get_features(struct vhost_dev *dev, >uint64_t *features) > return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); > } > >-static int vhost_user_set_mem_table(struct vhost_dev *dev, >-struct vhost_memory *mem) >-{ >-int fds[VHOST_MEMORY_MAX_NREGIONS]; >-int i, fd; >-size_t fd_num = 0; >-uint64_t features; >-bool reply_supported = virtio_has_feature(dev->protocol_features, >- >VHOST_USER_PROTOCOL_F_REPLY_ACK); >- >-VhostUserMsg msg = { >-.request = VHOST_USER_SET_MEM_TABLE, >-.flags = VHOST_USER_VERSION, >-}; >- >-if (reply_supported) { >-msg.flags |= VHOST_USER_NEED_REPLY_MASK; >-} >- >-for (i = 0; i < dev->mem->nregions; ++i) { >-struct vhost_memory_region *reg = dev->mem->regions + i; >-ram_addr_t offset; >-MemoryRegion *mr; >- >-assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); >-mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, >- ); >-fd = memory_region_get_fd(mr); >-if (fd > 0) { >-msg.payload.memory.regions[fd_num].userspace_addr &g
Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
On 14/08/16 8:21 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Fri, Aug 12, 2016 at 07:16:34AM +, Prerna Saxena wrote: >> >> On 12/08/16 12:08 pm, "Fam Zheng" <f...@redhat.com> wrote: >> >> >> >> >> >> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote: >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> The set_mem_table command currently does not seek a reply. Hence, there is >> >> no easy way for a remote application to notify to QEMU when it finished >> >> setting up memory, or if there were errors doing so. >> >> >> >> As an example: >> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> >> application). SET_MEM_TABLE does not require a reply according to the >> >> spec. >> >> (2) Qemu commits the memory to the guest. >> >> (3) Guest issues an I/O operation over a new memory region which was >> >> configured on (1). >> >> (4) The application has not yet remapped the memory, but it sees the I/O >> >> request. >> >> (5) The application cannot satisfy the request because it does not know >> >> about those GPAs. >> >> >> >> While a guaranteed fix would require a protocol extension (committed >> >> separately), >> >> a best-effort workaround for existing applications is to send a >> >> GET_FEATURES >> >> message before completing the vhost_user_set_mem_table() call. >> >> Since GET_FEATURES requires a reply, an application that processes >> >> vhost-user >> >> messages synchronously would probably have completed the SET_MEM_TABLE >> >> before replying. >> >> >> >> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> >> >> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> >> >> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> >> > >> >Sporadic hangs are seen with test-vhost-user after this patch: >> > >> >https://travis-ci.org/qemu/qemu/builds >> > >> >Reverting seems to fix it for me. >> > >> >Is this a known problem? >> > >> >Fam >> >> Hi Fam, >> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my >> Centos 6 environment, so missed this. >> I am setting up the docker test env to repro this, but I think I can guess >> the problem : >> >> In tests/vhost-user-test.c: >> >> static void chr_read(void *opaque, const uint8_t *buf, int size) >> { >> ..[snip].. >> >> case VHOST_USER_SET_MEM_TABLE: >>/* received the mem table */ >>memcpy(>memory, , sizeof(msg.payload.memory)); >>s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, >> G_N_ELEMENTS(s->fds)); >> >> >>/* signal the test that it can continue */ >>g_cond_signal(>data_cond); >>break; >> ..[snip].. >> } >> >> >> The test seems to be marked complete as soon as mem_table is copied. >> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost >> command implementation with qemu. SET_MEM_TABLE now sends out a new message >> GET_FEATURES, and the call is only completed once it receives features from >> the remote application. (or the test framework, as is the case here.) > >Hmm but why does it matter that data_cond is woken up? Michael, sorry, I didn’t quite understand that. Could you pls explain ? > > >> While the test itself can be modified (Do not signal completion until we’ve >> sent a follow-up response to GET_FEATURES), I am now wondering if this patch >> may break existing vhost applications too ? If so, reverting it possibly >> better. > >What bothers me is that the new feature might cause the same >issue once we enable it in the test. No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed. > >How about a patch to tests/vhost-user-test.c adding the new >protocol feature? I would be quite interested to see what >is going on with it. Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test. > > >> What confuses me is why it doesn’t fail all the time, but only about 20% to >> 30% time as Fam reports. > >And succeeds every time on my systems :( +1 to that :( I have had no luck repro’ing it. > >> >> Thoughts : Michael, Fam, MarcAndre ? >> >> Regards, >> Prerna
Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
On 12/08/16 12:08 pm, "Fam Zheng" <f...@redhat.com> wrote: >On Wed, 08/10 18:30, Michael S. Tsirkin wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> The set_mem_table command currently does not seek a reply. Hence, there is >> no easy way for a remote application to notify to QEMU when it finished >> setting up memory, or if there were errors doing so. >> >> As an example: >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> application). SET_MEM_TABLE does not require a reply according to the spec. >> (2) Qemu commits the memory to the guest. >> (3) Guest issues an I/O operation over a new memory region which was >> configured on (1). >> (4) The application has not yet remapped the memory, but it sees the I/O >> request. >> (5) The application cannot satisfy the request because it does not know >> about those GPAs. >> >> While a guaranteed fix would require a protocol extension (committed >> separately), >> a best-effort workaround for existing applications is to send a GET_FEATURES >> message before completing the vhost_user_set_mem_table() call. >> Since GET_FEATURES requires a reply, an application that processes vhost-user >> messages synchronously would probably have completed the SET_MEM_TABLE >> before replying. >> >> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> >> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> >> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >Sporadic hangs are seen with test-vhost-user after this patch: > >https://travis-ci.org/qemu/qemu/builds > >Reverting seems to fix it for me. > >Is this a known problem? > >Fam Hi Fam, Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this. I am setting up the docker test env to repro this, but I think I can guess the problem : In tests/vhost-user-test.c: static void chr_read(void *opaque, const uint8_t *buf, int size) { ..[snip].. case VHOST_USER_SET_MEM_TABLE: /* received the mem table */ memcpy(>memory, , sizeof(msg.payload.memory)); s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds)); /* signal the test that it can continue */ g_cond_signal(>data_cond); break; ..[snip].. } The test seems to be marked complete as soon as mem_table is copied. However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.) While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better. What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. Thoughts : Michael, Fam, MarcAndre ? Regards, Prerna
Re: [Qemu-devel] [PATCH for-2.7 v5.1 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
On 04/08/16 9:41 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Sat, Jul 30, 2016 at 06:38:23AM +, Prerna Saxena wrote: >> >> >> >> >> >> On 30/07/16 2:19 am, "Eric Blake" <ebl...@redhat.com> wrote: >> >> >On 07/28/2016 01:07 AM, Prerna Saxena wrote: >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. >> >> >> > >> >> + >> >> +With this protocol extension negotiated, the sender (QEMU) can set the >> >> +"need_reply" [Bit 3] flag to any command. This indicates that >> >> +the client MUST respond with a Payload VhostUserMsg indicating success or >> >> +failure. The payload should be set to zero on success or non-zero on >> >> failure. >> >> +(Unless the message already has an explicit reply body) >> > >> >Rather than make this parenthetical, I would go with: >> > >> >The payload should be set to zero on success or non-zero on failure, >> >unless the message already has an explicit reply body. >> >> Hi Eric, >> Thank you for taking a look, but I think you possibly missed the latest >> patchset posted last night. >> This had already been incorporated in v6 that I’d posted last night before >> your message. >> See https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06772.html >> >> >> > >> >> + >> >> +This indicates to QEMU that the requested operation has deterministically >> >> +been met or not. Today, QEMU is expected to terminate the main vhost-user >> > >> >Reads awkwardly; maybe: >> > >> >The response payload gives QEMU a deterministic indication of the result >> >of the command. >> >> Hmm, it is more of personal taste, so I’ll refrain from commenting either >> way. > >I prefer Eric's form too. "that ... or not" isn't very clear. Done. > >> > >> >> +loop upon receiving such errors. In future, qemu could be taught to be >> >> more >> >> +resilient for selective requests. >> >> + >> >> +For the message types that already solicit a reply from the client, the >> >> +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set >> >> brings >> >> +no behaviourial change. (See the 'Communication' section for details.) >> > >> >s/behaviourial/behavioural/ (or if the document widely favors US >> >spelling, behavioral) >> >> >> The last 3 iterations of this patchset have only seen review comments >> focussed on documentation suggestions and indentation of code, but nothing >> on the idea/code itself. This gives me hope that the patch is possibly close >> to merging within 2.7 timeframe :-) >> May I request the maintainers to please correct this tiny spelling typo as >> this is checked in? >> >> Regards, >> Prerna > >Probably easier to post v7 with above minor things. Posted a v7 which incorporates all suggestions made by Eric. https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg01027.html Regards,
[Qemu-devel] [PATCH for-2.7 v7 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_reply" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 26 ++ hw/virtio/vhost-user.c| 32 2 files changed, 58 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..7890d71 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,24 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands replies for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the +"need_reply" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure, +unless the message already has an explicit reply body. + +The response payload gives QEMU a deterministic indication of the result +of the command. Today, QEMU is expected to terminate the main vhost-user +loop upon receiving such errors. In future, qemu could be taught to be more +resilient for selective requests. + +For the message types that already solicit a reply from the client, the +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings +no behavioural change. (See the 'Communication' section for details.) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 1995fd2..b57454a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -158,6 +160,25 @@ fail: return -1; } +static int process_message_reply(struct vhost_dev *dev, + VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return -1; +} + +if (msg.request != request) { +error_report("Received unexpected msg type." + "Expected %d received %d", + request, msg.request); +return -1; +} + +return msg.payload.u64 ? -1 : 0; +} + static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { @@ -248,11 +269,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK); + VhostUserMsg msg = { .request = VHOST_USER_SET_MEM_TABLE, .flags = VHOST_USER_VERSION, }; +if (reply_supported) { +
[Qemu-devel] [PATCH for-2.7 v7 0/2]vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> [ This series incorporates all suggestions around documentation that were suggested.] vhost-user: Extend protocol to receive replies on any command. The current vhost-user protocol requires the client to send reply to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a reply to any message by setting the newly introduced "need_reply" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. It introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v6 -> v7: 1) Patch 1: In docs/specs/vhost-user.txt * s/behaviourial/behavioural/ * "This indicates to QEMU that the requested operation has deterministically been met or not" -> " The response payload gives QEMU a deterministic indication of the result of the command." 2) Patch 2 : Unchanged. Changes v5.1 -> v6: 1) Patch 1 : fixed some minor indentation issues and a really tiny documentation chang 2) Patch 2 : unchanged. Changes v5->v5.1 : 1) Patch 1 : no change 2) Patch 2 : fixes a tiny typo I'd accidentally introduced while creating v5 from v4. The code itself is unchanged from v4. Changes v4->v5: 1) Patch 1 : * Reword 'response' to 'reply' on public demand. * Documentation is more concise. Patch 2 : unchanged Changes v3->v4: 1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read() 2) Fixed minor suggestions in writeup for this protocol extension. Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html v4 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html v5 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06338.html v5.1:https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06359.html v6 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06772.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 26 + hw/virtio/vhost-user.c| 137 +- 2 files changed, 114 insertions(+), 49 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH for-2.7 v7 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 127 ++--- 1 file changed, 67 insertions(+), 60 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index b57454a..1a7d53c 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -263,66 +263,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, - VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_REPLY_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -if (vhost_user_write(dev, , fds, fd_num) < 0) { -return -1; -} - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -537,6 +477,73 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +bool reply_supported = virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_
Re: [Qemu-devel] [PATCH for-2.7 v5.1 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
On 30/07/16 2:19 am, "Eric Blake" <ebl...@redhat.com> wrote: >On 07/28/2016 01:07 AM, Prerna Saxena wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. >> > >> + >> +With this protocol extension negotiated, the sender (QEMU) can set the >> +"need_reply" [Bit 3] flag to any command. This indicates that >> +the client MUST respond with a Payload VhostUserMsg indicating success or >> +failure. The payload should be set to zero on success or non-zero on >> failure. >> +(Unless the message already has an explicit reply body) > >Rather than make this parenthetical, I would go with: > >The payload should be set to zero on success or non-zero on failure, >unless the message already has an explicit reply body. Hi Eric, Thank you for taking a look, but I think you possibly missed the latest patchset posted last night. This had already been incorporated in v6 that I’d posted last night before your message. See https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06772.html > >> + >> +This indicates to QEMU that the requested operation has deterministically >> +been met or not. Today, QEMU is expected to terminate the main vhost-user > >Reads awkwardly; maybe: > >The response payload gives QEMU a deterministic indication of the result >of the command. Hmm, it is more of personal taste, so I’ll refrain from commenting either way. > >> +loop upon receiving such errors. In future, qemu could be taught to be more >> +resilient for selective requests. >> + >> +For the message types that already solicit a reply from the client, the >> +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set >> brings >> +no behaviourial change. (See the 'Communication' section for details.) > >s/behaviourial/behavioural/ (or if the document widely favors US >spelling, behavioral) The last 3 iterations of this patchset have only seen review comments focussed on documentation suggestions and indentation of code, but nothing on the idea/code itself. This gives me hope that the patch is possibly close to merging within 2.7 timeframe :-) May I request the maintainers to please correct this tiny spelling typo as this is checked in? Regards, Prerna
[Qemu-devel] [PATCH for-2.7 v6 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 125 ++--- 1 file changed, 67 insertions(+), 58 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 521a5db..53c37a6 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -254,64 +254,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, - VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_REPLY_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -514,6 +456,73 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +bool reply_supported = virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +
[Qemu-devel] [PATCH for-2.7 v6 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_reply" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 26 ++ hw/virtio/vhost-user.c| 32 2 files changed, 58 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..57a8357 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,24 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands replies for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the +"need_reply" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure, +unless the message already has an explicit reply body. + +This indicates to QEMU that the requested operation has deterministically +been met or not. Today, QEMU is expected to terminate the main vhost-user +loop upon receiving such errors. In future, qemu could be taught to be more +resilient for selective requests. + +For the message types that already solicit a reply from the client, the +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings +no behaviourial change. (See the 'Communication' section for details.) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..521a5db 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -158,6 +160,25 @@ fail: return -1; } +static int process_message_reply(struct vhost_dev *dev, + VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return -1; +} + +if (msg.request != request) { +error_report("Received unexpected msg type." + "Expected %d received %d", + request, msg.request); +return -1; +} + +return msg.payload.u64 ? -1 : 0; +} + static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { @@ -239,11 +260,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK); + VhostUserMsg msg = { .request = VHOST_USER_SET_MEM_TABLE, .flags = VHOST_USER_VERSION, }; +if (reply_supported) { +
[Qemu-devel] [PATCH for-2.7 v6 0/2] vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> *** BLURB HERE *** vhost-user: Extend protocol to receive replies on any command. The current vhost-user protocol requires the client to send reply to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a reply to any message by setting the newly introduced "need_reply" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. It introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v5.1 -> v6: 1) Patch 1 : fixed some minor indentation issues and a really tiny documentation chang 2) Patch 2 : unchanged. Changes v5->v5.1 : 1) Patch 1 : no change 2) Patch 2 : fixes a tiny typo I'd accidentally introduced while creating v5 from v4. The code itself is unchanged from v4. Changes v4->v5: 1) Patch 1 : * Reword 'response' to 'reply' on public demand. * Documentation is more concise. Patch 2 : unchanged Changes v3->v4: 1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read() 2) Fixed minor suggestions in writeup for this protocol extension. Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html v4 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html v5 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06338.html v5.1:https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06359.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 26 + hw/virtio/vhost-user.c| 135 ++ 2 files changed, 114 insertions(+), 47 deletions(-) -- 1.8.1.2
Re: [Qemu-devel] [PATCH v4 2/2] vhost-user: Attempt to fix a race with set_mem_table.
On 27/07/16 7:00 pm, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Wed, Jul 27, 2016 at 02:52:37AM -0700, Prerna Saxena wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> The set_mem_table command currently does not seek a reply. Hence, there is >> no easy way for a remote application to notify to QEMU when it finished >> setting up memory, or if there were errors doing so. >> >> As an example: >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> application). SET_MEM_TABLE does not require a reply according to the spec. >> (2) Qemu commits the memory to the guest. >> (3) Guest issues an I/O operation over a new memory region which was >> configured on (1). >> (4) The application has not yet remapped the memory, but it sees the I/O >> request. >> (5) The application cannot satisfy the request because it does not know >> about those GPAs. >> >> While a guaranteed fix would require a protocol extension (committed >> separately), >> a best-effort workaround for existing applications is to send a GET_FEATURES >> message before completing the vhost_user_set_mem_table() call. >> Since GET_FEATURES requires a reply, an application that processes vhost-user >> messages synchronously would probably have completed the SET_MEM_TABLE >> before replying. >> >> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> > >Could you pls reorder patchset so this is 1/2? >1/1 is still under review but I'd like to make sure >we have some kind of fix in place for 2.7. Hi Michael, The review comments for patch 1 were around documentation and the choice of name of flag. There has been no recommendation/comment on the code itself. I have fixed all of that and posted a new patch series. (Version v5.1) Hope both the patches make it in time for 2.7. Thanks, once again, for reviewing this. Regards, Prerna
[Qemu-devel] [PATCH for-2.7 v5.1 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 123 ++--- 1 file changed, 65 insertions(+), 58 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 86e7ae0..d0dafa0 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -254,64 +254,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, -VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_REPLY_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -514,6 +456,71 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +msg.payload.memory.regions[fd
[Qemu-devel] [PATCH for-2.7 v5.1 0/2] vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> vhost-user: Extend protocol to receive replies on any command. The current vhost-user protocol requires the client to send reply to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a reply to any message by setting the newly introduced "need_reply" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. It introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v5->v5.1 : 1) Patch 1 : no change 2) Patch 2 : fixes a tiny typo I'd accidentally introduced while creating v5 from v4. The code itself is unchanged from v4. Changes v4->v5: 1) Patch 1 : * Reword 'response' to 'reply' on public demand. * Documentation is more concise. Patch 2 : unchanged Changes v3->v4: 1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read() 2) Fixed minor suggestions in writeup for this protocol extension. Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html v4 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 44 +++ hw/virtio/vhost-user.c| 133 ++ 2 files changed, 130 insertions(+), 47 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH for-2.7 v5.1 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_reply" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 26 ++ hw/virtio/vhost-user.c| 32 2 files changed, 58 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..54b5c8f 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,24 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands replies for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the +"need_reply" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +(Unless the message already has an explicit reply body) + +This indicates to QEMU that the requested operation has deterministically +been met or not. Today, QEMU is expected to terminate the main vhost-user +loop upon receiving such errors. In future, qemu could be taught to be more +resilient for selective requests. + +For the message types that already solicit a reply from the client, the +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings +no behaviourial change. (See the 'Communication' section for details.) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..86e7ae0 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -158,6 +160,25 @@ fail: return -1; } +static int process_message_reply(struct vhost_dev *dev, +VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return 0; +} + +if (msg.request != request) { +error_report("Received unexpected msg type." +"Expected %d received %d", +request, msg.request); +return -1; +} + +return msg.payload.u64 ? -1 : 0; +} + static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { @@ -239,11 +260,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + VhostUserMsg msg = { .request = VHOST_USER_SET_MEM_TABLE, .flags = VHOST_USER_VERSION, }; +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + for
Re: [Qemu-devel] [PATCH v4 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
On 27/07/16 6:58 pm, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Wed, Jul 27, 2016 at 12:56:18PM +, Prerna Saxena wrote: >> Hi Marc, >> Thanks, please find my reply inline. >> >> >> >> >> >> On 27/07/16 4:35 pm, "Marc-André Lureau" <marcandre.lur...@gmail.com> wrote: >> >> >Hi >> > >> >On Wed, Jul 27, 2016 at 1:52 PM, Prerna Saxena <saxenap@gmail.com> >> >wrote: >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. >> >> >> >> If negotiated, client applications should send a u64 payload in >> >> response to any message that contains the "need_response" bit set >> >> on the message flags. Setting the payload to "zero" indicates the >> >> command finished successfully. Likewise, setting it to "non-zero" >> >> indicates an error. >> >> >> >> Currently implemented only for SET_MEM_TABLE. >> >> >> >> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> >> >> --- >> >> docs/specs/vhost-user.txt | 41 + >> >> hw/virtio/vhost-user.c| 32 >> >> 2 files changed, 73 insertions(+) >> >> >> >> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt >> >> index 777c49c..57df586 100644 >> >> --- a/docs/specs/vhost-user.txt >> >> +++ b/docs/specs/vhost-user.txt >> >> @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: >> >> * Flags: 32-bit bit field: >> >> - Lower 2 bits are the version (currently 0x01) >> >> - Bit 2 is the reply flag - needs to be sent on each reply from the >> >> slave >> >> + - Bit 3 is the need_response flag - see >> >> VHOST_USER_PROTOCOL_F_REPLY_ACK for >> >> + details. >> > >> >Why need_response and not "need reply"? >> >> (I’d already pointed this out earlier, but looks like I was possibly not >> very clear.) >> Before deciding on the right name for Bit 3, let us see the nomenclature for >> Bit 2 above : "Bit 2 is the reply flag - needs to be sent on each reply from >> the slave”. >> So we already have a _reply_ flag in use. If the name Bit 3 as the >> _need_reply_ flag, don’t you think it would be ultra-confusing ? I found it >> confusing when I reviewed the documentation with this different term. >> So I chose the name need_response with much deliberation — it conveys the >> essence of what this flag means to achieve, but without adding to confusion. > >I don't see confusion, I think I agree with Marc André. Allright. Posted a new series with the reworded terminology and updated (more concise) documentation. Regards, Prerna
[Qemu-devel] [PATCH for-2.7 v5 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_reply" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 26 ++ hw/virtio/vhost-user.c| 32 2 files changed, 58 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..54b5c8f 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,24 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands replies for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the +"need_reply" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +(Unless the message already has an explicit reply body) + +This indicates to QEMU that the requested operation has deterministically +been met or not. Today, QEMU is expected to terminate the main vhost-user +loop upon receiving such errors. In future, qemu could be taught to be more +resilient for selective requests. + +For the message types that already solicit a reply from the client, the +presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings +no behaviourial change. (See the 'Communication' section for details.) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..86e7ae0 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_REPLY_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -158,6 +160,25 @@ fail: return -1; } +static int process_message_reply(struct vhost_dev *dev, +VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return 0; +} + +if (msg.request != request) { +error_report("Received unexpected msg type." +"Expected %d received %d", +request, msg.request); +return -1; +} + +return msg.payload.u64 ? -1 : 0; +} + static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { @@ -239,11 +260,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + VhostUserMsg msg = { .request = VHOST_USER_SET_MEM_TABLE, .flags = VHOST_USER_VERSION, }; +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + for
[Qemu-devel] [PATCH for-2.7 v5 0/2] vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> vhost-user: Extend protocol to receive replies on any command. The current vhost-user protocol requires the client to send reply to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a reply to any message by setting the newly introduced "need_reply" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. It introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v4->v5: 1) Patch 1 : * Reword 'response' to 'reply' on public demand. * Documentation is more concise. Patch 2 : unchanged Changes v3->v4: 1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read() 2) Fixed minor suggestions in writeup for this protocol extension. Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html v4 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 44 +++ hw/virtio/vhost-user.c| 133 ++ 2 files changed, 130 insertions(+), 47 deletions(-) -- 1.8.1.2 Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 26 + hw/virtio/vhost-user.c| 132 +- 2 files changed, 111 insertions(+), 47 deletions(-) -- 1.8.1.2 Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 26 + hw/virtio/vhost-user.c| 132 +- 2 files changed, 111 insertions(+), 47 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH for-2.7 v5 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 122 ++--- 1 file changed, 64 insertions(+), 58 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 86e7ae0..2fc7f25 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -254,64 +254,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, -VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_REPLY_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -514,6 +456,70 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_REPLY_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +msg.payload.memory.regions[fd_num].userspace_addr
Re: [Qemu-devel] [PATCH v4 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
Hi Marc, Thanks, please find my reply inline. On 27/07/16 4:35 pm, "Marc-André Lureau" <marcandre.lur...@gmail.com> wrote: >Hi > >On Wed, Jul 27, 2016 at 1:52 PM, Prerna Saxena <saxenap@gmail.com> wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. >> >> If negotiated, client applications should send a u64 payload in >> response to any message that contains the "need_response" bit set >> on the message flags. Setting the payload to "zero" indicates the >> command finished successfully. Likewise, setting it to "non-zero" >> indicates an error. >> >> Currently implemented only for SET_MEM_TABLE. >> >> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> >> --- >> docs/specs/vhost-user.txt | 41 + >> hw/virtio/vhost-user.c| 32 >> 2 files changed, 73 insertions(+) >> >> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt >> index 777c49c..57df586 100644 >> --- a/docs/specs/vhost-user.txt >> +++ b/docs/specs/vhost-user.txt >> @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: >> * Flags: 32-bit bit field: >> - Lower 2 bits are the version (currently 0x01) >> - Bit 2 is the reply flag - needs to be sent on each reply from the slave >> + - Bit 3 is the need_response flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK >> for >> + details. > >Why need_response and not "need reply"? (I’d already pointed this out earlier, but looks like I was possibly not very clear.) Before deciding on the right name for Bit 3, let us see the nomenclature for Bit 2 above : "Bit 2 is the reply flag - needs to be sent on each reply from the slave”. So we already have a _reply_ flag in use. If the name Bit 3 as the _need_reply_ flag, don’t you think it would be ultra-confusing ? I found it confusing when I reviewed the documentation with this different term. So I chose the name need_response with much deliberation — it conveys the essence of what this flag means to achieve, but without adding to confusion. > >btw, I wonder if it would be worth to introduce an enum at this point > >> * Size - 32-bit size of the payload >> >> >> @@ -126,6 +128,8 @@ the ones that do: >> * VHOST_GET_VRING_BASE >> * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) >> >> +[ Also see the section on REPLY_ACK protocol extension. ] >> + >> There are several messages that the master sends with file descriptors >> passed >> in the ancillary data: >> >> @@ -254,6 +258,7 @@ Protocol features >> #define VHOST_USER_PROTOCOL_F_MQ 0 >> #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 >> #define VHOST_USER_PROTOCOL_F_RARP 2 >> +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 >> >> Message types >> - >> @@ -464,3 +469,39 @@ Message types >>is present in VHOST_USER_GET_PROTOCOL_FEATURES. >>The first 6 bytes of the payload contain the mac address of the guest >> to >>allow the vhost user backend to construct and broadcast the fake RARP. >> + >> +VHOST_USER_PROTOCOL_F_REPLY_ACK: >> +--- >> +The original vhost-user specification only demands responses for certain > >responses/replies If you feel strongly about it, will change it here. > >> +commands. This differs from the vhost protocol implementation where commands >> +are sent over an ioctl() call and block until the client has completed. >> + >> +With this protocol extension negotiated, the sender (QEMU) can set the newly >> +introduced "need_response" [Bit 3] flag to any command. This indicates that > >need reply, you can remove the "newly introduced" (it's not going to >be so new after a while) * need_reply = no I don’t agree, for reasons cited earlier. * remove the “newly introduced” phrase = agree, will do. > >> +the client MUST respond with a Payload VhostUserMsg indicating success or > >I would put right here for clarity: > >...MUST respond with a Payload VhostUserMsg (unless the message has >already an explicit reply body)... > >alternatively, I would forbid using the bit 3 on commands that have >already an explicit reply. I don’t currently have any code that raises an error for such cases. The implementation silently ignores it. > >> +failure. The payload should be set to zero on success or non-zero on >> failure. >> +In other words, r
Re: [Qemu-devel] [PATCH v3 0/2] vhost-user: Extend protocol to receive replies on any command.
On 27/07/16 9:51 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Mon, Jul 25, 2016 at 02:27:18PM +0400, Marc-André Lureau wrote: >> Hi >> >> On Mon, Jul 25, 2016 at 10:41 AM, Prerna <saxenap@gmail.com> wrote: >> > >> > >> > On Thu, Jul 7, 2016 at 12:04 PM, Prerna Saxena <saxenap@gmail.com> >> > wrote: >> >> >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> The current vhost-user protocol requires the client to send responses to >> >> only a >> >> few commands. For the remaining commands, it is impossible for QEMU to >> >> know the >> >> status of the requested operation -- ie, did it succeed? If so, by what >> >> time? >> >> >> >> This is inconvenient, and can also lead to races. As an example: >> >> [..snip..] >> >> >> >> References: >> >> v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html >> >> v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html >> >> >> >> >> >> Prerna Saxena (2): >> >> vhost-user: Introduce a new protocol feature REPLY_ACK. >> >> vhost-user: Attempt to fix a race with set_mem_table. >> >> >> >> docs/specs/vhost-user.txt | 44 +++ >> >> hw/virtio/vhost-user.c| 133 >> >> ++ >> >> 2 files changed, 130 insertions(+), 47 deletions(-) >> >> >> > >> > Ping ! >> > Michael, MarcAndre, Did you have a chance to look at this patch series? >> > >> >> That's not going to make it in 2.7 I am afraid. > >It's a bugfix so - depends on how quickly can comments be addressed. > >-- >MST Thanks Michael, Marc, I just posted a v4 addressing the review comments. Both make-check and compilation run to completion. Marc, I addressed part of your suggestion on documentation. However, I have been reminded in the past about being more verbose while describing the change : <https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07428.html> Hope this patch series is in time for 2.7 :-) Regards, Prerna
[Qemu-devel] [PATCH v4 0/2] vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> *** BLURB HERE *** vhost-user: Extend protocol to receive replies on any command. The current vhost-user protocol requires the client to send responses to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a response to any message by setting the newly introduced "need_response" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. it introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v3->v4: 1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read() 2) Fixed minor suggestions in writeup for this protocol extension. Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 44 +++ hw/virtio/vhost-user.c| 133 ++ 2 files changed, 130 insertions(+), 47 deletions(-) -- 1.8.1.2 Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 41 ++ hw/virtio/vhost-user.c| 133 ++ 2 files changed, 127 insertions(+), 47 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH v4 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_response" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 41 + hw/virtio/vhost-user.c| 32 2 files changed, 73 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..57df586 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_response flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,39 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands responses for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the newly +introduced "need_response" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +In other words, response must be in the following format : + + +| request | flags | size | payload | + + + * Request: 32-bit type of the request + * Flags: 32-bit bit field: + * Size: size of the payload ( see below) + * Payload : a u64 integer, where a non-zero value indicates a failure. + +This indicates to QEMU that the requested operation has deterministically +been met or not. Today, QEMU is expected to terminate the main vhost-user +loop upon receiving such errors. In future, qemu could be taught to be more +resilient for selective requests. + +Note that as per the original vhost-user protocol, the following four messages +anyway require distinct responses from the vhost-user client process: + * VHOST_GET_FEATURES + * VHOST_GET_PROTOCOL_FEATURES + * VHOST_GET_VRING_BASE + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) + +For these message types, the presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or +need_response bit being set brings no behaviourial change. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..0cdb918 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_RESPONSE_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -158,6 +160,25 @@ fail: return -1; } +static int process_message_reply(struct vhost_dev *dev, +VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return 0; +} + +if (msg.request != request) { +error_report("Received unexpected msg type." +"Expected %d received %d", +request, msg.request); +return -1; +} + +return msg.payload.u64 ? -1 : 0; +} + static bool vhost_user_one_time_reques
[Qemu-devel] [PATCH v4 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 123 ++--- 1 file changed, 65 insertions(+), 58 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 0cdb918..f96607e 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -254,64 +254,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, -VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_RESPONSE_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -514,6 +456,71 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_RESPONSE_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +msg.payload.memo
Re: [Qemu-devel] [PATCH v3 0/2] vhost-user: Extend protocol to receive replies on any command.
Hi Marc, Thank you for taking a look. On 25/07/16 3:57 pm, "Marc-André Lureau" <marcandre.lur...@gmail.com> wrote: >Hi > >On Mon, Jul 25, 2016 at 10:41 AM, Prerna <saxenap@gmail.com> wrote: >> >> >> On Thu, Jul 7, 2016 at 12:04 PM, Prerna Saxena <saxenap....@gmail.com> >> wrote: >>> >>> From: Prerna Saxena <prerna.sax...@nutanix.com> >>> >>> The current vhost-user protocol requires the client to send responses to >>> only a >>> few commands. For the remaining commands, it is impossible for QEMU to >>> know the >>> status of the requested operation -- ie, did it succeed? If so, by what >>> time? >>> >>> This is inconvenient, and can also lead to races. As an example: >>> [..snip..] >>> >>> References: >>> v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html >>> v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html >>> >>> >>> Prerna Saxena (2): >>> vhost-user: Introduce a new protocol feature REPLY_ACK. >>> vhost-user: Attempt to fix a race with set_mem_table. >>> >>> docs/specs/vhost-user.txt | 44 +++ >>> hw/virtio/vhost-user.c| 133 >>> ++ >>> 2 files changed, 130 insertions(+), 47 deletions(-) >>> >> >> Ping ! >> Michael, MarcAndre, Did you have a chance to look at this patch series? >> > >That's not going to make it in 2.7 I am afraid. Beside the second >patch that I think is somewhat superflous or worse, as I said in >previous review (so I won't ack it, but Michael liked it and he is the >maintainer) > >It fails to compile, easy to fix by moving process_message_reply after >vhost_user_read: > >/home/elmarco/src/qemu/hw/virtio/vhost-user.c: In function >‘process_message_reply’: >/home/elmarco/src/qemu/hw/virtio/vhost-user.c:117:9: warning: implicit >declaration of function ‘vhost_user_read’ >[-Wimplicit-function-declaration] > if (vhost_user_read(dev, ) < 0) { > ^~~ >/home/elmarco/src/qemu/hw/virtio/vhost-user.c:117:5: warning: nested >extern declaration of ‘vhost_user_read’ [-Wnested-externs] > if (vhost_user_read(dev, ) < 0) { > ^~ >/home/elmarco/src/qemu/hw/virtio/vhost-user.c: At top level: >/home/elmarco/src/qemu/hw/virtio/vhost-user.c:136:12: error: static >declaration of ‘vhost_user_read’ follows non-static declaration > static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) >^~~ >/home/elmarco/src/qemu/hw/virtio/vhost-user.c:117:9: note: previous >implicit declaration of ‘vhost_user_read’ was here > if (vhost_user_read(dev, ) < 0) { > ^~~ I really need to check on this. I am pretty positive I had verified this before posting, but its been a while since these patches were posted. > >Secondly, make check just hangs in /x86_64/vhost-user/read-guest-mem >(a sign that backward compatibility is broken). > >There is still many "response" wording, where "reply" should be used >for more consistency (VHOST_USER_NEED_RESPONSE_MASK and in the doc) Right. There is a reason I havent reworded it here. We already have a VHOST_USER_REPLY_MASK flag that assumes that the incoming message is a reply to an already-sent vhost command. Use of the word ‘REPLY’ in this context would have caused some confusion. > >Regarding the doc, I would simplify it a bit: > >VHOST_USER_PROTOCOL_F_REPLY_ACK: >--- >The original vhost-user specification only demands replies for certain >commands. This differs from the vhost protocol implementation where commands >are sent over an ioctl() call and block until the client has completed. > >With this protocol extension negotiated, the sender (QEMU) can set the newly >introduced "need_reply" [Bit 3] flag to any command. This indicates that >the client MUST reply with a Payload VhostUserMsg indicating success or >failure. The payload should be set to zero on success or non-zero on failure. >In other words, reply message must be in the following format : > > >| request | flags | size | payload | > > > * Request: 32-bit type of the request > * Flags: 32-bit bit field: > * Size: size of the payload ( see below) > * Payload : a u64 integer, where a non-zero value indicates a failure. > >This indicates to QEMU that the requested operation has >deterministically been met or not. Today, QEMU is expected to terminate >the main vhost-user loop upon receiving such errors. In future, qemu could >be taught to be more resilient for selective requests. > >Note that for messages that already require distinct replies, the presence of >need_reply bit being set brings no behavioural change. > >-- >Marc-André Lureau Regards, Prerna
[Qemu-devel] [PATCH v3 2/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that processes vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 123 ++--- 1 file changed, 65 insertions(+), 58 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 899f354..a3a114d 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -254,64 +254,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -bool reply_supported = virtio_has_feature(dev->protocol_features, -VHOST_USER_PROTOCOL_F_REPLY_ACK); - -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -if (reply_supported) { -msg.flags |= VHOST_USER_NEED_RESPONSE_MASK; -} - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -if (reply_supported) { -return process_message_reply(dev, msg.request); -} - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -514,6 +456,71 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); + +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_RESPONSE_MASK; +} + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +msg.payload.memo
[Qemu-devel] [PATCH v3 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_response" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 44 hw/virtio/vhost-user.c| 32 2 files changed, 76 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..26dbe71 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_response flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,42 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands responses for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the newly +introduced "need_response" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +In other words, response must be in the following format : + + +| request | flags | size | payload | + + + * Request: 32-bit type of the request + * Flags: 32-bit bit field: + * Size: size of the payload ( see below) + * Payload : a u64 integer, where a non-zero value indicates a failure. + +This aids debugging the application's responses from QEMU. More +importantly, it indicates to QEMU that the requested operation has +deterministically (not) been met. Today, QEMU is expected to terminate +the main vhost-user loop upon receiving such errors. In future, qemu could +be taught to be more resilient for selective requests. + +Note that as per the original vhost-user protocol, the following four messages +anyway require distinct responses from the vhost-user client process: + * VHOST_GET_FEATURES + * VHOST_GET_PROTOCOL_FEATURES + * VHOST_GET_VRING_BASE + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) + +For these message types, the presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or +need_response bit being set brings no behaviourial change. +The response from the client is identical whether or not the REPLY_ACK feature +has been negotiated. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..899f354 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_RESPONSE_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -107,6 +109,25 @@ static VhostUserMsg m __attribute__ ((unused)); /* The version of the protocol we support */ #define VHOST_USER_VERSION(0x1) +static int process_message_reply(struct vhost_dev *dev, +VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return 0; +} + +if (msg.request != request) { +error_report
[Qemu-devel] [PATCH v3 0/2] vhost-user: Extend protocol to receive replies on any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> The current vhost-user protocol requires the client to send responses to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a response to any message by setting the newly introduced "need_response" flag. The application must then respond to qemu by providing a status about the requested operation. Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK feature. it introduces a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order. Changelog: -- Changes v2->v3: 1) Swapped the patch numbers 1 & 2 from the previous series. 2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply' 3) Patch 2 (ie patch 1 in v2) : Unchanged from v2. Changes v1->v2: 1) Patch 1 : Ask for get_features before returning from set_mem_table(new). 2) Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. References: v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html Prerna Saxena (2): vhost-user: Introduce a new protocol feature REPLY_ACK. vhost-user: Attempt to fix a race with set_mem_table. docs/specs/vhost-user.txt | 44 +++ hw/virtio/vhost-user.c| 133 ++ 2 files changed, 130 insertions(+), 47 deletions(-) -- 1.8.1.2
Re: [Qemu-devel] [PATCH v2 0/2]vhost-user: Extend protocol to seek response for any command.
Hi Michael, Thank you for taking a look. On 04/07/16 5:29 pm, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Fri, Jul 01, 2016 at 02:46:20AM -0700, Prerna Saxena wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> The current vhost-user protocol requires the client to send responses to >> only a >> few commands. For the remaining commands, it is impossible for QEMU to know >> the >> status of the requested operation -- ie, did it succeed? If so, by what time? >> >> This is inconvenient, and can also lead to races. As an example: >> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> application). >> Note that SET_MEM_TABLE does not require a reply according to the spec. >> (2) Qemu commits the memory to the guest. >> (3) Guest issues an I/O operation over a new memory region which was >> configured on (1). >> (4) The application hasn't yet remapped the memory, but it sees the I/O >> request. >> (5) The application cannot satisfy the request because it does not know >> about those GPAs. >> >> Note that the kernel implementation does not suffer from this limitation >> since messages are sent via an ioctl(). The ioctl() blocks until the backend >> (eg. vhost-net) completes the command and returns (with an error code). >> >> Changing the behaviour of current vhost-user commands would break existing >> applications. >> To work around this race, Patch 1 adds a get_features command to be sent >> before returning from set_mem_table. While this is not a complete fix, it >> will help client applications that strictly process messages in order. >> >> The second patch introduces a protocol extension, >> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to >> request a response to any message by setting the newly introduced >> "need_response" flag. The application must then respond to qemu by providing >> a status about the requested operation. > > >OK this all looks very reasonable (and I do like patch 1 too) >but there's one source of waste here: we do not need to >synchronize when we set up device the first time >when hdev->memory_changed is false. > >I think we should test that and skip synch in both patches >unless hdev->memory_changed is set. I do not entirely agree with that. The first set_mem_table command is not much different from subsequent set_mem_table calls. For all cases, there is a fair chance that the vhost-user application may, for some reason, not be able to map the guest memory. This protocol extension provides a mechanism for such errors to be propagated back to QEMU. It is upto QEMU to acknowledge the failure (by terminating itself or failing the device) or ignore it. However, in the absence of such a mechanism, it would be really bad for QEMU to believe that the vhost application is all set to process guest requests when reality is quite the opposite. Also, as pointed out before, QEMU needs to have a notion of _when_ the memory mapping was finished, so that it may proceed to pass on actual requests to the vhost user application. The race described in the covering letter (above) can potentially happen even at first-time initialization. This protocol extension is an attempt to bridge the subtle behavioural difference between vhost-user and vhost-kernel. Patch 1, in my opinion, makes the code less intuitive. This is because we are calling a GET_FEATURES vhost message from inside the handler for another vhost command— SET_MEM_TABLE. However, if you think it better to have both Patch 1 & 2, I’ll be happy to post both. Regards, Prerna > > >> Changelog: >> - >> Changes since v1: >> Patch 1 : Ask for get_features before returning from set_mem_table(new). >> Patch 2 : * Improve documentation. >> * Abstract out commonly used operations in the form of a function, >> process_message_response(). Also implement this only for SET_MEM_TABLE. >> >> Prerna Saxena (2): >> vhost-user: Attempt to prevent a race on set_mem_table. >> vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK. >> >> docs/specs/vhost-user.txt | 40 >> hw/virtio/vhost-user.c| 157 >> -- >> 2 files changed, 150 insertions(+), 47 deletions(-) >> >> -- >> 1.8.1.2
Re: [Qemu-devel] [PATCH v2 0/2]vhost-user: Extend protocol to seek response for any command.
Hi Marc-Andre, Thank you for taking a look. On 03/07/16 5:17 pm, "Marc-André Lureau" <marcandre.lur...@gmail.com> wrote: >Hi > >On Fri, Jul 1, 2016 at 11:46 AM, Prerna Saxena <saxenap@gmail.com> wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> The current vhost-user protocol requires the client to send responses to >> only a >> few commands. For the remaining commands, it is impossible for QEMU to know >> the >> status of the requested operation -- ie, did it succeed? If so, by what time? >> >> This is inconvenient, and can also lead to races. As an example: >> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> application). >> Note that SET_MEM_TABLE does not require a reply according to the spec. >> (2) Qemu commits the memory to the guest. >> (3) Guest issues an I/O operation over a new memory region which was >> configured on (1). >> (4) The application hasn't yet remapped the memory, but it sees the I/O >> request. >> (5) The application cannot satisfy the request because it does not know >> about those GPAs. >> >> Note that the kernel implementation does not suffer from this limitation >> since messages are sent via an ioctl(). The ioctl() blocks until the backend >> (eg. vhost-net) completes the command and returns (with an error code). >> >> Changing the behaviour of current vhost-user commands would break existing >> applications. >> To work around this race, Patch 1 adds a get_features command to be sent >> before returning from set_mem_table. While this is not a complete fix, it >> will help client applications that strictly process messages in order. >> >> The second patch introduces a protocol extension, >> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to >> request a response to any message by setting the newly introduced >> "need_response" flag. The application must then respond to qemu by providing >> a status about the requested operation. >> >> Changelog: >> - >> Changes since v1: >> Patch 1 : Ask for get_features before returning from set_mem_table(new). >> Patch 2 : * Improve documentation. >> * Abstract out commonly used operations in the form of a function, >> process_message_response(). Also implement this only for SET_MEM_TABLE. >> > >Overall, that looks good to me. > >Why do we have both "response" and "reply" which basically means the >same thing, right? I would rather stick with "reply". Allright, will rename this function to process_message_reply(). > >I am not convinced the first patch is needed, imho it is a >workaround/hack, the solution is given with the patch 2 only. Great, I’ll post a v3 with just Patch2. Regards, Prerna > >> Prerna Saxena (2): >> vhost-user: Attempt to prevent a race on set_mem_table. >> vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK. >> >> docs/specs/vhost-user.txt | 40 >> hw/virtio/vhost-user.c| 157 >> -- >> 2 files changed, 150 insertions(+), 47 deletions(-) >> >> -- >> 1.8.1.2 >> > > > >-- >Marc-André Lureau >
[Qemu-devel] [PATCH 2/2] vhost-user : Introduce a new protocol feature REPLY_ACK.
From: Prerna Saxena <prerna.sax...@nutanix.com> This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK. If negotiated, client applications should send a u64 payload in response to any message that contains the "need_response" bit set on the message flags. Setting the payload to "zero" indicates the command finished successfully. Likewise, setting it to "non-zero" indicates an error. Currently implemented only for SET_MEM_TABLE. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 44 hw/virtio/vhost-user.c| 32 2 files changed, 76 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..26dbe71 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,8 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_response flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for + details. * Size - 32-bit size of the payload @@ -126,6 +128,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension. ] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +258,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +469,42 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: +--- +The original vhost-user specification only demands responses for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. + +With this protocol extension negotiated, the sender (QEMU) can set the newly +introduced "need_response" [Bit 3] flag to any command. This indicates that +the client MUST respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +In other words, response must be in the following format : + + +| request | flags | size | payload | + + + * Request: 32-bit type of the request + * Flags: 32-bit bit field: + * Size: size of the payload ( see below) + * Payload : a u64 integer, where a non-zero value indicates a failure. + +This aids debugging the application's responses from QEMU. More +importantly, it indicates to QEMU that the requested operation has +deterministically (not) been met. Today, QEMU is expected to terminate +the main vhost-user loop upon receiving such errors. In future, qemu could +be taught to be more resilient for selective requests. + +Note that as per the original vhost-user protocol, the following four messages +anyway require distinct responses from the vhost-user client process: + * VHOST_GET_FEATURES + * VHOST_GET_PROTOCOL_FEATURES + * VHOST_GET_VRING_BASE + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) + +For these message types, the presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or +need_response bit being set brings no behaviourial change. +The response from the client is identical whether or not the REPLY_ACK feature +has been negotiated. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 858a1bb..bff229e 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -84,6 +85,7 @@ typedef struct VhostUserMsg { #define VHOST_USER_VERSION_MASK (0x3) #define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_NEED_RESPONSE_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -107,6 +109,25 @@ static VhostUserMsg m __attribute__ ((unused)); /* The version of the protocol we support */ #define VHOST_USER_VERSION(0x1) +static int process_message_response(struct vhost_dev *dev, +VhostUserRequest request) +{ +VhostUserMsg msg; + +if (vhost_user_read(dev, ) < 0) { +return 0; +} + +if (msg.reques
[Qemu-devel] [PATCH v2 0/2]vhost-user: Extend protocol to seek response for any command.
From: Prerna Saxena <prerna.sax...@nutanix.com> The current vhost-user protocol requires the client to send responses to only a few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed? If so, by what time? This is inconvenient, and can also lead to races. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). Note that SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. To work around this race, Patch 1 adds a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messages in order. The second patch introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to request a response to any message by setting the newly introduced "need_response" flag. The application must then respond to qemu by providing a status about the requested operation. Changelog: - Changes since v1: Patch 1 : Ask for get_features before returning from set_mem_table(new). Patch 2 : * Improve documentation. * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE. Prerna Saxena (2): vhost-user: Attempt to prevent a race on set_mem_table. vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK. docs/specs/vhost-user.txt | 40 hw/virtio/vhost-user.c| 157 -- 2 files changed, 150 insertions(+), 47 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH 1/2] vhost-user: Attempt to fix a race with set_mem_table.
From: Prerna Saxena <prerna.sax...@nutanix.com> The set_mem_table command currently does not seek a reply. Hence, there is no easy way for a remote application to notify to QEMU when it finished setting up memory, or if there were errors doing the so. As an example: (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application). SET_MEM_TABLE does not require a reply according to the spec. (2) Qemu commits the memory to the guest. (3) Guest issues an I/O operation over a new memory region which was configured on (1). (4) The application has not yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it does not know about those GPAs. While a guaranteed fix would require a protocol extension (committed separately), a best-effort workaround for existing applications is to send a GET_FEATURES message before completing the vhost_user_set_mem_table() call. Since GET_FEATURES requires a reply, an application that process vhost-user messages synchronously would probably have completed the SET_MEM_TABLE before replying. For a vhost-user application that processes mesages strictly in order, a response against GET_FEATURES will ensure that the application has finished processing the previous set_mem request too. Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- hw/virtio/vhost-user.c | 104 +++-- 1 file changed, 57 insertions(+), 47 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..858a1bb 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -233,53 +233,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table(struct vhost_dev *dev, -struct vhost_memory *mem) -{ -int fds[VHOST_MEMORY_MAX_NREGIONS]; -int i, fd; -size_t fd_num = 0; -VhostUserMsg msg = { -.request = VHOST_USER_SET_MEM_TABLE, -.flags = VHOST_USER_VERSION, -}; - -for (i = 0; i < dev->mem->nregions; ++i) { -struct vhost_memory_region *reg = dev->mem->regions + i; -ram_addr_t offset; -MemoryRegion *mr; - -assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); -mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - ); -fd = memory_region_get_fd(mr); -if (fd > 0) { -msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr; -msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; -msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; -msg.payload.memory.regions[fd_num].mmap_offset = offset; -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); -fds[fd_num++] = fd; -} -} - -msg.payload.memory.nregions = fd_num; - -if (!fd_num) { -error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); -return -1; -} - -msg.size = sizeof(msg.payload.memory.nregions); -msg.size += sizeof(msg.payload.memory.padding); -msg.size += fd_num * sizeof(VhostUserMemoryRegion); - -vhost_user_write(dev, , fds, fd_num); - -return 0; -} - static int vhost_user_set_vring_addr(struct vhost_dev *dev, struct vhost_vring_addr *addr) { @@ -482,6 +435,63 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_set_mem_table(struct vhost_dev *dev, +struct vhost_memory *mem) +{ +int fds[VHOST_MEMORY_MAX_NREGIONS]; +int i, fd; +size_t fd_num = 0; +uint64_t features; +VhostUserMsg msg = { +.request = VHOST_USER_SET_MEM_TABLE, +.flags = VHOST_USER_VERSION, +}; + +for (i = 0; i < dev->mem->nregions; ++i) { +struct vhost_memory_region *reg = dev->mem->regions + i; +ram_addr_t offset; +MemoryRegion *mr; + +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); +mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + ); +fd = memory_region_get_fd(mr); +if (fd > 0) { +msg.payload.memory.regions[fd_num].userspace_addr \ += reg->userspace_addr; +msg.payload.memory.regions[fd_num].memory_size \ += reg->memory_size; +msg.payload.memory.regions[fd_num].guest_phys_addr \ += reg->guest_
Re: [Qemu-devel] [PATCH 0/1] vhost-user: Add a protocol extension for client responses to vhost commands.
On 26/06/16 8:15 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Sat, Jun 25, 2016 at 03:13:54AM +, Prerna Saxena wrote: >> >> >> >> >> >> On 25/06/16 4:43 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >> >> >On Fri, Jun 24, 2016 at 05:39:31PM +, Prerna Saxena wrote: >> >> >> >> >> >> On 24/06/16 9:15 pm, "Felipe Franciosi" <fel...@nutanix.com> wrote: >> >> >> >> >We talked to MST on IRC a while back and he brainstormed the idea of >> >> >doing this per-message. >> >> >(I even recall proposing to call this feature REPLY_ALL and he suggested >> >> >REPLY_ANY due to that.) >> >> > >> >> >I agree with doing it per message, as the protocol itself should be >> >> >flexible in that sense. >> >> >(Even if qemu today will probably want to ask for a reply in all >> >> >messages.) >> >> >> >> In fact, the current implementation does exactly this. If >> >> VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, the current QEMU patch >> >> sets the NEED_RESPONSE flag bit for all outgoing messages — basically >> >> enforcing the vhost-user application to respond to all messages. >> > >> > >> >This seems unnecessary. Let's only do that for messages that actually >> >need to be synchronous. >> >> It would be nice to distinguish the vhost-user protocol itself from its QEMU >> implementation. >> The protocol should, in theory, have provision for an implementation (such >> as QEMU’s vhost-user implementation) to seek response for _any_ command. >> However, we can choose to be selective in our QEMU implementation and just >> have limited commands currently send a response, such as SET_MEM_TABLE. >> >> In other words, we will still require the NEED_RESPONSE flag bit defined, >> but we can just set it to 1 it for SET_MEM_TABLE command in our QEMU >> implementation. All other vhost-user commands are sent from QEMU setting >> this to 0, so the application does not send an ack. >> >> Michael, Does that correctly summarize what you were meaning to suggest here >> ? >> >> Regards, >> Prerna > >Exactly. Thanks for your response. I will rework and send out a patch to that end. Regards, Prerna > >> >> > >> >> > >> >> >On 24/06/2016, 14:59, "Qemu-devel on behalf of Marc-André Lureau" >> >> ><qemu-devel-bounces+felipe=nutanix@nongnu.org on behalf of >> >> >marcandre.lur...@gmail.com> wrote: >> >> > >> >> >Hi >> >> > >> >> >On Fri, Jun 24, 2016 at 10:17 AM, Prerna Saxena <saxenap@gmail.com> >> >> >wrote: >> >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> >> >> The current vhost-user protocol requires the client to send responses >> >> >> to only few commands. For the remaining commands, it is impossible for >> >> >> QEMU to know the status of the requested operation -- ie, did it >> >> >> succeed at all, and if so, at what time. >> >> >> >> >> >> This is inconvenient, and can also lead to races. As an example: >> >> >> >> >> >> (1) qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> >> >> application) and SET_MEM_TABLE doesn't require a reply according to >> >> >> the spec. >> >> >> (2) qemu commits the memory to the guest. >> >> >> (3) guest issues an I/O operation over a new memory region which was >> >> >> configured on (1) >> >> >> (4) The application hasn't yet remapped the memory, but it sees the >> >> >> I/O request. >> >> >> (5) The application cannot satisfy the request because it doesn't know >> >> >> about those GPAs >> >> >> >> >> >> Note that the kernel implementation does not suffer from this >> >> >> limitation since messages are sent via an ioctl(). The ioctl() blocks >> >> >> until the backend (eg. vhost-net) completes the command and returns >> >> >> (with an error code). >> >> >> >> >> >> Changing the behaviour of current vhost-user commands would break >> >> >> existing applications. This patch introduces a protocol extension, >> >> >> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows >> >> >> QEMU to annotate messages to the application that it seeks a response >> >> >> for. The application must then respond to qemu by providing a status >> >> >> about the requested operation. >> >> > >> >> >I like the idea, as I encountered a similar issue in my >> >> >"vhost-user-gpu" development (which I worked around by sending a dump >> >> >GET_FEATURES.. to sync things). But I question the need to have a flag >> >> >per message. I think if the protocol feature is negociated, all >> >> >messages should have a reply. Why do you want it to be per-message? >> >> > >> >> >thanks >> >> > >> >> >-- >> >> >Marc-André Lureau >> >> > >> >> > >> >> >
Re: [Qemu-devel] [PATCH 0/1] vhost-user: Add a protocol extension for client responses to vhost commands.
On 25/06/16 4:43 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: >On Fri, Jun 24, 2016 at 05:39:31PM +, Prerna Saxena wrote: >> >> >> On 24/06/16 9:15 pm, "Felipe Franciosi" <fel...@nutanix.com> wrote: >> >> >We talked to MST on IRC a while back and he brainstormed the idea of doing >> >this per-message. >> >(I even recall proposing to call this feature REPLY_ALL and he suggested >> >REPLY_ANY due to that.) >> > >> >I agree with doing it per message, as the protocol itself should be >> >flexible in that sense. >> >(Even if qemu today will probably want to ask for a reply in all messages.) >> >> In fact, the current implementation does exactly this. If >> VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, the current QEMU patch sets >> the NEED_RESPONSE flag bit for all outgoing messages — basically enforcing >> the vhost-user application to respond to all messages. > > >This seems unnecessary. Let's only do that for messages that actually >need to be synchronous. It would be nice to distinguish the vhost-user protocol itself from its QEMU implementation. The protocol should, in theory, have provision for an implementation (such as QEMU’s vhost-user implementation) to seek response for _any_ command. However, we can choose to be selective in our QEMU implementation and just have limited commands currently send a response, such as SET_MEM_TABLE. In other words, we will still require the NEED_RESPONSE flag bit defined, but we can just set it to 1 it for SET_MEM_TABLE command in our QEMU implementation. All other vhost-user commands are sent from QEMU setting this to 0, so the application does not send an ack. Michael, Does that correctly summarize what you were meaning to suggest here ? Regards, Prerna > >> > >> >On 24/06/2016, 14:59, "Qemu-devel on behalf of Marc-André Lureau" >> ><qemu-devel-bounces+felipe=nutanix@nongnu.org on behalf of >> >marcandre.lur...@gmail.com> wrote: >> > >> >Hi >> > >> >On Fri, Jun 24, 2016 at 10:17 AM, Prerna Saxena <saxenap@gmail.com> >> >wrote: >> >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> >> >> The current vhost-user protocol requires the client to send responses to >> >> only few commands. For the remaining commands, it is impossible for QEMU >> >> to know the status of the requested operation -- ie, did it succeed at >> >> all, and if so, at what time. >> >> >> >> This is inconvenient, and can also lead to races. As an example: >> >> >> >> (1) qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> >> application) and SET_MEM_TABLE doesn't require a reply according to the >> >> spec. >> >> (2) qemu commits the memory to the guest. >> >> (3) guest issues an I/O operation over a new memory region which was >> >> configured on (1) >> >> (4) The application hasn't yet remapped the memory, but it sees the I/O >> >> request. >> >> (5) The application cannot satisfy the request because it doesn't know >> >> about those GPAs >> >> >> >> Note that the kernel implementation does not suffer from this limitation >> >> since messages are sent via an ioctl(). The ioctl() blocks until the >> >> backend (eg. vhost-net) completes the command and returns (with an error >> >> code). >> >> >> >> Changing the behaviour of current vhost-user commands would break >> >> existing applications. This patch introduces a protocol extension, >> >> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU >> >> to annotate messages to the application that it seeks a response for. The >> >> application must then respond to qemu by providing a status about the >> >> requested operation. >> > >> >I like the idea, as I encountered a similar issue in my >> >"vhost-user-gpu" development (which I worked around by sending a dump >> >GET_FEATURES.. to sync things). But I question the need to have a flag >> >per message. I think if the protocol feature is negociated, all >> >messages should have a reply. Why do you want it to be per-message? >> > >> >thanks >> > >> >-- >> >Marc-André Lureau >> > >> > >> >
Re: [Qemu-devel] [PATCH 0/1] vhost-user: Add a protocol extension for client responses to vhost commands.
On 24/06/16 9:15 pm, "Felipe Franciosi" <fel...@nutanix.com> wrote: >We talked to MST on IRC a while back and he brainstormed the idea of doing >this per-message. >(I even recall proposing to call this feature REPLY_ALL and he suggested >REPLY_ANY due to that.) > >I agree with doing it per message, as the protocol itself should be flexible >in that sense. >(Even if qemu today will probably want to ask for a reply in all messages.) In fact, the current implementation does exactly this. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, the current QEMU patch sets the NEED_RESPONSE flag bit for all outgoing messages — basically enforcing the vhost-user application to respond to all messages. > >On 24/06/2016, 14:59, "Qemu-devel on behalf of Marc-André Lureau" ><qemu-devel-bounces+felipe=nutanix@nongnu.org on behalf of >marcandre.lur...@gmail.com> wrote: > >Hi > >On Fri, Jun 24, 2016 at 10:17 AM, Prerna Saxena <saxenap@gmail.com> wrote: >> From: Prerna Saxena <prerna.sax...@nutanix.com> >> >> The current vhost-user protocol requires the client to send responses to >> only few commands. For the remaining commands, it is impossible for QEMU to >> know the status of the requested operation -- ie, did it succeed at all, and >> if so, at what time. >> >> This is inconvenient, and can also lead to races. As an example: >> >> (1) qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net >> application) and SET_MEM_TABLE doesn't require a reply according to the spec. >> (2) qemu commits the memory to the guest. >> (3) guest issues an I/O operation over a new memory region which was >> configured on (1) >> (4) The application hasn't yet remapped the memory, but it sees the I/O >> request. >> (5) The application cannot satisfy the request because it doesn't know about >> those GPAs >> >> Note that the kernel implementation does not suffer from this limitation >> since messages are sent via an ioctl(). The ioctl() blocks until the backend >> (eg. vhost-net) completes the command and returns (with an error code). >> >> Changing the behaviour of current vhost-user commands would break existing >> applications. This patch introduces a protocol extension, >> VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to >> annotate messages to the application that it seeks a response for. The >> application must then respond to qemu by providing a status about the >> requested operation. > >I like the idea, as I encountered a similar issue in my >"vhost-user-gpu" development (which I worked around by sending a dump >GET_FEATURES.. to sync things). But I question the need to have a flag >per message. I think if the protocol feature is negociated, all >messages should have a reply. Why do you want it to be per-message? > >thanks > >-- >Marc-André Lureau > > >
[Qemu-devel] [PATCH 1/1] vhost-user : Introduce a new feature VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, forces the remote vhost-user process to send a u64 reply containing a status
From: Prerna Saxena <prerna.sax...@nutanix.com> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> --- docs/specs/vhost-user.txt | 36 +++ hw/virtio/vhost-user.c| 153 +- 2 files changed, 186 insertions(+), 3 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 777c49c..e5388b2 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -37,6 +37,7 @@ consists of 3 header fields and a payload: * Flags: 32-bit bit field: - Lower 2 bits are the version (currently 0x01) - Bit 2 is the reply flag - needs to be sent on each reply from the slave + - Bit 3 is the need_response flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for details. * Size - 32-bit size of the payload @@ -126,6 +127,8 @@ the ones that do: * VHOST_GET_VRING_BASE * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +[ Also see the section on REPLY_ACK protocol extension] + There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -254,6 +257,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 Message types - @@ -464,3 +468,35 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + +VHOST_USER_PROTOCOL_F_REPLY_ACK: + +The original vhost-user specification only demands responses for certain +commands. This differs from the vhost protocol implementation where commands +are sent over an ioctl() call and block until the client has completed. Not +receiving a response for commands like VHOST_SET_MEM_TABLE makes the sender +unable to tell when the client has finished (re)mapping the GPA, or whether it +has failed altogether. + +With this protocol extension negotiated, the sender can set the newly +introduced "need_response" [Bit 3] flag to any command. This indicates that +the client MUST to respond with a Payload VhostUserMsg indicating success or +failure. The payload should be set to zero on success or non-zero on failure. +In other words, response must be in the following format : + +| request | flags | size | payload | + + + * Request: 32-bit type of the original request which is being responded to. + * Flags: 32-bit bit field: (VHOST_USER_VERSION | VHOST_USER_REPLY_MASK) + * Size: size of the payload ( see below) + * Payload : a u64 integer, where a non-zero value indicates a failure. + +Note that as per the original vhost-user protocol, the following four messages anyway +require distinct responses from the vhost-user client process : + * VHOST_GET_FEATURES + * VHOST_GET_PROTOCOL_FEATURES + * VHOST_GET_VRING_BASE + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) +For these message types, the presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or +need_response bit being set brings no behaviourial change. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 495e09f..f01ebb4 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -31,6 +31,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_REPLY_ACK = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -82,8 +83,9 @@ typedef struct VhostUserLog { typedef struct VhostUserMsg { VhostUserRequest request; -#define VHOST_USER_VERSION_MASK (0x3) -#define VHOST_USER_REPLY_MASK (0x1<<2) +#define VHOST_USER_VERSION_MASK (0x3) +#define VHOST_USER_REPLY_MASK (0x1 << 2) +#define VHOST_USER_NEED_RESPONSE_MASK (0x1 << 3) uint32_t flags; uint32_t size; /* the following payload size */ union { @@ -239,10 +241,17 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num = 0; +bool reply_supported = virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_REPLY_ACK); VhostUserMsg msg = { .request = VHOST_USER_SET_MEM_TABLE, .flags = VHOST_USER_VERSION, }; +VhostUserRequest request = msg.request; + +if (reply_supported) { +msg.flags |= VHOST_USER_NEED_RESPONSE_MASK; +} for (i = 0; i < dev->mem->nregions; ++i) { struct vhost_memory_region *reg = dev->mem->regions + i; @@ -277,6 +286,20 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, vhost_user_write(dev, , fds, fd_num); +if (reply_supported) { +
[Qemu-devel] [PATCH 0/1] vhost-user: Add a protocol extension for client responses to vhost commands.
From: Prerna Saxena <prerna.sax...@nutanix.com> The current vhost-user protocol requires the client to send responses to only few commands. For the remaining commands, it is impossible for QEMU to know the status of the requested operation -- ie, did it succeed at all, and if so, at what time. This is inconvenient, and can also lead to races. As an example: (1) qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application) and SET_MEM_TABLE doesn't require a reply according to the spec. (2) qemu commits the memory to the guest. (3) guest issues an I/O operation over a new memory region which was configured on (1) (4) The application hasn't yet remapped the memory, but it sees the I/O request. (5) The application cannot satisfy the request because it doesn't know about those GPAs Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code). Changing the behaviour of current vhost-user commands would break existing applications. This patch introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This feature, if negotiated, allows QEMU to annotate messages to the application that it seeks a response for. The application must then respond to qemu by providing a status about the requested operation. Prerna Saxena (1): vhost-user : Introduce a new feature, VHOST_USER_PROTOCOL_F_REPLY_ACK This feature, if negotiated, forces the remote vhost-user process to send a u64 reply containin status code for each requested operation. Status codes are '0' for success, and non-zero for error. docs/specs/vhost-user.txt | 36 +++ hw/virtio/vhost-user.c| 153 +- 2 files changed, 186 insertions(+), 3 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH 2/2] Debug : Add error messages before a call to debug().
Qemu code has abort() calls in various places which raises a SIGABRT; This patch adds error messages before (most)calls to abort(), so that it is easier to determine why QEMU died. Signed-off-by: Prerna Saxena <saxenap@gmail.com> --- block.c| 1 + block/block-backend.c | 4 block/curl.c | 1 + block/io.c | 1 + block/linux-aio.c | 1 + block/mirror.c | 2 ++ block/qcow2-cache.c| 1 + block/qcow2-cluster.c | 3 +++ block/qcow2-refcount.c | 7 +++ block/qcow2.c | 2 ++ blockdev.c | 3 +++ crypto/aes.c | 1 + exec.c | 4 hw/scsi/scsi-disk.c| 2 ++ hw/virtio/virtio.c | 5 - vl.c | 2 ++ 16 files changed, 39 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index d4939b4..160f277 100644 --- a/block.c +++ b/block.c @@ -3725,6 +3725,7 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs, } } +error_report("Matching context notifier not found for removal. Aborting"); abort(); } diff --git a/block/block-backend.c b/block/block-backend.c index d74f670..0aa8692 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -407,6 +407,7 @@ BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo) return blk; } } +error_report("Drive Info not found, Aborting."); abort(); } @@ -463,6 +464,8 @@ int blk_attach_dev(BlockBackend *blk, void *dev) void blk_attach_dev_nofail(BlockBackend *blk, void *dev) { if (blk_attach_dev(blk, dev) < 0) { +error_report("Attaching device model to block %s failed. Aborting", +blk->name); abort(); } } @@ -1143,6 +1146,7 @@ BlockErrorAction blk_get_error_action(BlockBackend *blk, bool is_read, case BLOCKDEV_ON_ERROR_IGNORE: return BLOCK_ERROR_ACTION_IGNORE; default: +error_report("Unrecognized Block Error Action %d. Aborting.",on_err); abort(); } } diff --git a/block/curl.c b/block/curl.c index 5a8f8b6..fe2225a 100644 --- a/block/curl.c +++ b/block/curl.c @@ -382,6 +382,7 @@ static void curl_multi_timeout_do(void *arg) curl_multi_check_completion(s); #else +error_report("Curl timer expired, Aborting."); abort(); #endif } diff --git a/block/io.c b/block/io.c index a7dbf85..6f45959 100644 --- a/block/io.c +++ b/block/io.c @@ -2045,6 +2045,7 @@ void bdrv_aio_cancel(BlockAIOCB *acb) } else if (acb->bs) { aio_poll(bdrv_get_aio_context(acb->bs), true); } else { +error_report("Aio context not found. Aborting."); abort(); } } diff --git a/block/linux-aio.c b/block/linux-aio.c index 805757e..38d7812 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -206,6 +206,7 @@ static void ioq_submit(struct qemu_laio_state *s) break; } if (ret < 0) { +error_report("Error %d submitting io. Aborting.", ret); abort(); } diff --git a/block/mirror.c b/block/mirror.c index c2cfc1a..600e3c2 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -389,6 +389,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) mirror_do_zero_or_discard(s, sector_num, io_sectors, true); break; default: +error_report("Unrecognized mirror option %d. Aborting.", +mirror_method); abort(); } assert(io_sectors); diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 0fe8eda..80766a2 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -334,6 +334,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, if (min_lru_index == -1) { /* This can't happen in current synchronous code, but leave the check * here as a reminder for whoever starts using AIO with the cache */ +error_report("Invalid Index %d, Aborting", min_lru_index); abort(); } diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 31ecc10..1914d97 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -583,6 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset, } break; default: +error_report("Invalid cluster type %d. Aborting.", ret); abort(); } @@ -868,6 +869,7 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters, case QCOW2_CLUSTER_ZERO: break; default: +error_report("Invalid cluster type %d, Aborting.", cluster_type); abort(); } } @@ -1494,6 +1496,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset, break; default: +error_report("Invalid clus
[Qemu-devel] [PATCH 0/2] Cleanup and instrumenting qemu exits due to abort().
Today, some calls to abort() do not have a preceding error string that might hint to the end user why QEMU died. Debugging such scenarios is painful today. This patchset attempts to clean up some dead code in vvfat.c; it also aims to improve qemu error-reporting by placing error messages that precede calls to abort(). Prerna Saxena (2): Block: Cleanup vvfat.c to remove dead code. Debug : Add error messages before a call to debug(). block.c| 1 + block/block-backend.c | 4 block/curl.c | 1 + block/io.c | 1 + block/linux-aio.c | 1 + block/mirror.c | 2 ++ block/qcow2-cache.c| 1 + block/qcow2-cluster.c | 3 +++ block/qcow2-refcount.c | 7 +++ block/qcow2.c | 2 ++ block/vvfat.c | 17 +++-- blockdev.c | 3 +++ crypto/aes.c | 1 + exec.c | 4 hw/scsi/scsi-disk.c| 2 ++ hw/virtio/virtio.c | 5 - vl.c | 2 ++ 17 files changed, 42 insertions(+), 15 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH 1/2] Block: Cleanup vvfat.c to remove dead code.
Commit 43dc2a64 replaced assert() with abort(), but didnt remove statements that followed these calls. So current code still has return values set after a call to abort(). Such statements will never execute and need to be cleaned up. Signed-off-by: Prerna Saxena <saxenap@nutanix.com> --- block/vvfat.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/block/vvfat.c b/block/vvfat.c index 6b85314..ffe739b 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -1747,8 +1747,7 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s, schedule_new_file(s, g_strdup(path), cluster_num); else { abort(); - return 0; - } + } } while(1) { @@ -1768,7 +1767,6 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s, * (cluster_num - mapping->begin)) { /* offset of this cluster in file chain has changed */ abort(); - copy_it = 1; } else if (offset == 0) { const char* basename = get_basename(mapping->path); @@ -1780,7 +1778,6 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s, if (mapping->first_mapping_index != first_mapping_index && mapping->info.file.offset > 0) { abort(); - copy_it = 1; } /* need to write out? */ @@ -1946,8 +1943,6 @@ DLOG(fprintf(stderr, "check direntry %d:\n", i); print_direntry(direntries + i)) } } else abort(); /* cluster_count = 0; */ - - ret += cluster_count; } cluster_num = modified_fat_get(s, cluster_num); @@ -2578,10 +2573,6 @@ static int handle_commits(BDRVVVFATState* s) for (i = 0; !fail && i < s->commits.next; i++) { commit_t* commit = array_get(&(s->commits), i); switch(commit->action) { - case ACTION_RENAME: case ACTION_MKDIR: -abort(); - fail = -2; - break; case ACTION_WRITEOUT: { #ifndef NDEBUG /* these variables are only used by assert() below */ @@ -2639,6 +2630,8 @@ static int handle_commits(BDRVVVFATState* s) break; } +case ACTION_RENAME: +case ACTION_MKDIR: default: abort(); } @@ -2729,7 +2722,6 @@ static int do_commit(BDRVVVFATState* s) if (ret) { fprintf(stderr, "Error handling renames (%d)\n", ret); abort(); - return ret; } /* copy FAT (with bdrv_read) */ @@ -2740,21 +2732,18 @@ static int do_commit(BDRVVVFATState* s) if (ret) { fprintf(stderr, "Fatal: error while committing (%d)\n", ret); abort(); - return ret; } ret = handle_commits(s); if (ret) { fprintf(stderr, "Error handling commits (%d)\n", ret); abort(); - return ret; } ret = handle_deletes(s); if (ret) { fprintf(stderr, "Error deleting\n"); abort(); - return ret; } if (s->qcow->drv->bdrv_make_empty) { -- 1.8.1.2
Re: [Qemu-devel] [PATCH 2/2] [v3] target-ppc: Enhance CPU nodes of device tree to be PAPR compliant.
On 08/08/2013 04:04 PM, Andreas Färber wrote: Am 08.08.2013 09:26, schrieb Prerna Saxena: From: Prerna Saxena pre...@linux.vnet.ibm.com Date: Thu, 8 Aug 2013 06:38:03 +0530 Subject: [PATCH 2/2] Enhance CPU nodes of device tree to be PAPR compliant. This is based on patch from Andreas which enables the default CPU with KVM to show up as -cpu type, such as POWER7_V2.3@0 While this is definitely, more descriptive, PAPR mandates the device tree CPU node names to be of the form : PowerPC,name where name should not have underscores. Hence replacing the CPU model (which has underscores) with CPU alias. With this patch, the CPU nodes of device tree show up as : /proc/device-tree/cpus/PowerPC,POWER7@0/... /proc/device-tree/cpus/PowerPC,POWER7@4/... Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com Not yet happy... :( --- hw/ppc/spapr.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 59e2fea..8efd84e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -43,6 +43,7 @@ #include hw/pci-host/spapr.h #include hw/ppc/xics.h #include hw/pci/msi.h +#include cpu-models.h #include hw/pci/pci.h @@ -80,6 +81,8 @@ #define HTAB_SIZE(spapr)(1ULL ((spapr)-htab_shift)) +#define PPC_DEVTREE_STR PowerPC, + sPAPREnvironment *spapr; int spapr_allocate_irq(int hint, bool lsi) @@ -322,9 +325,16 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property_cell(fdt, #address-cells, 0x1))); _FDT((fdt_property_cell(fdt, #size-cells, 0x0))); -modelname = g_strdup(cpu_model); +/* + * PAPR convention mandates that + * Device tree nodes must be named as: + * PowerPC,CPU-NAME@... + * Also, CPU-NAME must not have underscores.(hence use of CPU-ALIAS) + */ + +modelname = g_strdup_printf(PPC_DEVTREE_STR %s, cpu_model); -for (i = 0; i strlen(modelname); i++) { +for (i = strlen(PPC_DEVTREE_STR); i strlen(modelname); i++) { modelname[i] = toupper(modelname[i]); } One of your colleagues had brought up that PowerPC, prefix were not mandatory - is it *required* by the PAPR spec now, or is it just that the IBM CPUs used with PAPR happen to have such a name? I dont know what context lead to this observation. However, PAPR mentions the following nomenclature guideline: The value of this property shall be of the form: “PowerPC,name”, where name is the name of the processor chip which may be displayed to the user. name shall not contain underscores. I think this name guideline will hold good for all PAPR compliant processors. @@ -1315,6 +1325,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) cpu_model = g_strndup(parent_name, strlen(parent_name) - strlen(- TYPE_POWERPC_CPU)); + +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) { +if (strcmp(ppc_cpu_aliases[i].model, cpu_model) == 0) { +g_free(cpu_model); +cpu_model = g_strndup(ppc_cpu_aliases[i].alias, +strlen(ppc_cpu_aliases[i].alias)); +} +} } /* Prepare the device tree */ This is still fixing up the name in the wrong place: -cpu POWER7_v2.3 will not get fixed, only -cpu host or KVM's default. The solution I had discussed with Alex is the following: When devices need to expose their name to firmware in a special way, we have the DeviceClass::fw_name field. All we have to do is assign it and use it instead of cpu_model if non-NULL, just like we assign DeviceClass::desc. The way to do it would be to extend the family of POWERPC_DEF* macros to specify the additional field on the relevant CPU models. Would this be the same use-case as reflected by: ppc_cpu_aliases.alias ? If so, do we really need a separate field to convey the same information ? Therefore my above question: Would it be sufficient to explicitly name POWER7_v2.3 PowerPC,POWER7 etc. and to drop the upper-casing? Or would we also need to name a CPU such as MPC8572E (random Freescale CPU where I don't know the expected fw_name and that is unlikely to occur/work in sPAPR) PowerPC,MPC8572E if someone specified it with -cpu MPC8572E? If this is not a PAPR-compliant CPU, I dont think the PAPR naming convention is of any good. I havent worked with non-PAPR cpus. Is the device tree for such CPUs generated by routines in hw/ppc/spapr.c ? Or do they have custom routines to generate appropriate device tree nodes ? Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH 0/2] [v3] target-ppc: Enhance CPU nodes of SPAPR-generated device tree
By default on KVM or when user asks for it via -cpu host, cpu_model will be host and sPAPR merely upper-cases it for the SLOF device tree. PATCH 1/2 : Change the SPAPR code so that we get the underlying CPU type, e.g., POWER7_V2.3@0 in the device tree. PATCH 2/2 : Make the device-tree CPU nodes PAPR-compliant. Changelog from v2: PATCH 1/2 : Reworked and augmented by Andres Farber against original posted by Prerna. PATCH 2/2 : New. Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH 1/2] [v3] target-ppc: Get CPU name to correct reflect its model in the SLOF device tree.
From: Andreas Farber afaer...@suse.de Date: Wed, 7 Aug 2013 14:50:41 +0530 Subject: [PATCH 1/2] By default on KVM or when user asks for it via -cpu host, cpu_model will be host and sPAPR merely upper-cases it for the SLOF device tree. Change it so that we get the underlying CPU type, e.g., POWER7_V2.3@0. Tested-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Andreas Färber afaer...@suse.de --- hw/ppc/spapr.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 16bfab9..59e2fea 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1072,7 +1072,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) const char *kernel_cmdline = args-kernel_cmdline; const char *initrd_filename = args-initrd_filename; const char *boot_device = args-boot_device; -PowerPCCPU *cpu; +PowerPCCPU *cpu = NULL; CPUPPCState *env; PCIHostState *phb; int i; @@ -1307,6 +1307,16 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) register_savevm_live(NULL, spapr/htab, -1, 1, savevm_htab_handlers, spapr); +if (kvm_enabled() strcmp(cpu_model, host) == 0) { +ObjectClass *cpu_class = object_get_class(OBJECT(cpu)); +ObjectClass *parent_cpu_class = object_class_get_parent(cpu_class); + +const char *parent_name = object_class_get_name(parent_cpu_class); + +cpu_model = g_strndup(parent_name, +strlen(parent_name) - strlen(- TYPE_POWERPC_CPU)); +} + /* Prepare the device tree */ spapr-fdt_skel = spapr_create_fdt_skel(cpu_model, initrd_base, initrd_size, -- 1.7.11.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH 2/2] [v3] target-ppc: Enhance CPU nodes of device tree to be PAPR compliant.
From: Prerna Saxena pre...@linux.vnet.ibm.com Date: Thu, 8 Aug 2013 06:38:03 +0530 Subject: [PATCH 2/2] Enhance CPU nodes of device tree to be PAPR compliant. This is based on patch from Andreas which enables the default CPU with KVM to show up as -cpu type, such as POWER7_V2.3@0 While this is definitely, more descriptive, PAPR mandates the device tree CPU node names to be of the form : PowerPC,name where name should not have underscores. Hence replacing the CPU model (which has underscores) with CPU alias. With this patch, the CPU nodes of device tree show up as : /proc/device-tree/cpus/PowerPC,POWER7@0/... /proc/device-tree/cpus/PowerPC,POWER7@4/... Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- hw/ppc/spapr.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 59e2fea..8efd84e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -43,6 +43,7 @@ #include hw/pci-host/spapr.h #include hw/ppc/xics.h #include hw/pci/msi.h +#include cpu-models.h #include hw/pci/pci.h @@ -80,6 +81,8 @@ #define HTAB_SIZE(spapr)(1ULL ((spapr)-htab_shift)) +#define PPC_DEVTREE_STR PowerPC, + sPAPREnvironment *spapr; int spapr_allocate_irq(int hint, bool lsi) @@ -322,9 +325,16 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property_cell(fdt, #address-cells, 0x1))); _FDT((fdt_property_cell(fdt, #size-cells, 0x0))); -modelname = g_strdup(cpu_model); +/* + * PAPR convention mandates that + * Device tree nodes must be named as: + * PowerPC,CPU-NAME@... + * Also, CPU-NAME must not have underscores.(hence use of CPU-ALIAS) + */ + +modelname = g_strdup_printf(PPC_DEVTREE_STR %s, cpu_model); -for (i = 0; i strlen(modelname); i++) { +for (i = strlen(PPC_DEVTREE_STR); i strlen(modelname); i++) { modelname[i] = toupper(modelname[i]); } @@ -1315,6 +1325,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) cpu_model = g_strndup(parent_name, strlen(parent_name) - strlen(- TYPE_POWERPC_CPU)); + +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) { +if (strcmp(ppc_cpu_aliases[i].model, cpu_model) == 0) { +g_free(cpu_model); +cpu_model = g_strndup(ppc_cpu_aliases[i].alias, +strlen(ppc_cpu_aliases[i].alias)); +} +} } /* Prepare the device tree */ -- 1.7.11.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH for-next] spapr: Avoid HOST@0 CPU node name in SLOF device tree for -cpu host
On 08/01/2013 06:32 AM, Andreas Färber wrote: By default on KVM or when user asks for it via -cpu host, cpu_model will be host and sPAPR merely upper-cases it for the SLOF device tree. Change it so that we get the underlying CPU type, e.g., POWER7_V2.3@0. Reported-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Andreas Färber afaer...@suse.de --- ACK. Reviewed and tested --Works as expected. I'll send out an updated follow-up patch later in the day which ensures PAPR compliance for nomenclature. Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
Hi Andreas, Thanks for the response. On 07/08/2013 10:15 PM, Andreas Färber wrote: Hi, Am 08.07.2013 17:49, schrieb Prerna Saxena: On 07/08/2013 02:32 PM, Andreas Färber wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 08.07.2013 03:09, schrieb David Gibson: On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy wrote: @@ -1342,6 +1346,13 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) register_savevm_live(NULL, spapr/htab, -1, 1, savevm_htab_handlers, spapr); +/* Ensure that cpu_model is correctly reflected for a KVM guest */ +if (kvm_enabled() !strcmp(cpu_model, host)) { +asm (mfpvr %0 +: =r(pvr)); + cpu_model = ppc_cpu_alias_by_pvr(pvr); This needs to be protected by an ifdef CONFIG_KVM or similar. If the compiler optimization level is turned down, so that it doesn't recognize that the kvm_enabled() is always false, then this could attempt to compile the ppc asm instructions on an x86 (or whatever) host. This hunk can be completely replaced by QOM mechanisms - just didn't get to replying yet... Sorry I already sent out a v2, and only then saw your message. Could you pls explain how I could use QOM to replace this code block ? Well, in short the thing is it has not much to do with KVM. The KVM-specific host-powerpc64-cpu type is derived from the one you're looking for and thus you can use object_class_get_parent() to obtain the parent type and look at its name - stripping - TYPE_POWERPC_CPU from it should be much more efficient but will give you the detailed name including revision. I was planning to propose an alternative patch for that. This is what my patch does :-) +const char *ppc_cpu_alias_by_pvr(uint32_t pvr) +{ +int i; +const char *cpu_alias; +char *offset, *model; + +cpu_alias = object_class_get_name(OBJECT_CLASS +(ppc_cpu_class_by_pvr(pvr))); + [snip] Replacing a concrete model name with its simpler alias is a secondary issue (separate patch) that is not specific to KVM or -cpu host. Compare -cpu POWER8_v1.0 printing .../POWER8_v1.0@0/... presumably. Agree that this is not specific to KVM. That is the reason I have set it in a separate function, which can be called otherwise as well. Just to clarify your response, you want the function I coded to be split into 2 different pieces, to cater to the two specific requirements you mention ? That can be done, but not sure if it is too much code bloat. Further, Alex has already applied a patch of his working around the alias table being a rather archaic construct, not intended for frequent use. Instead of adding even more functions that iterate it, we should turn it into a hashtable for efficient lookup. Can you / Alexander Graf point me to the fix ? I can rework my patch to consume it ? (Note that the cpu_model_str field may contain more than just the model name, it is otherwise unused in softmmu and I was therefore preparing a patch to ban its use to linux-user solely, so the type name seems the most reliable indicator we have and as a bonus no PVR needed for it.) Hmm, maybe obsoleting PVR check is not such a great idea. I'm not sure if my earlier email clearly outlined the use-case this patch was attempting to fix. Here is a detailed explanation : We will still need PVR based lookups for cases such as the one I have described. As an illustration, consider running in a KVM environment where QEMU hasnt been started with a specific CPU type via -CPU PPC_MODEL. In this case, we will be required to do a PVR_based lookup only -- to make sure the guest gets initialized with the same CPU as host. The notion of _same_cpu_model_ can only be built over a PVR check. Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH v2 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
Hi David, Thanks for the review feedback. I have incorporated your changes in v2 of the patch, which follows herewith. Regards, Prerna Subject: [PATCH v2] Target-ppc : Enhance the CPU node labels for the guest device tree for pseries. In absence of a -CPU parameter in the qemu command line, the nodes of KVM-enabled guest device tree look like this : /proc/device-tree/cpus/HOST@0/... /proc/device-tree/cpus/HOST@4/... This patch replaces this obscure 'HOST' label with a more descriptive label. This is gathered by first identifying the PVR of the host, and then determining the host CPU alias which corresponds to the model indicated by this PVR. Sample Final outcome for an KVM-enabled pseries guest running on POWER7: /proc/device-tree/cpus/PowerPC,POWER7@0/... /proc/device-tree/cpus/PowerPC,POWER7@4/... This also helps userspace tools like ppc64_cpu, which expect the device tree to be in this format. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: David Gibson da...@gibson.dropbear.id.au --- hw/ppc/spapr.c | 18 +++--- target-ppc/cpu-qom.h| 1 + target-ppc/translate_init.c | 28 3 files changed, 44 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index fe34291..ddf263a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -79,6 +79,7 @@ #define HTAB_SIZE(spapr)(1ULL ((spapr)-htab_shift)) +#define PPC_DEVTREE_STR PowerPC, sPAPREnvironment *spapr; int spapr_allocate_irq(int hint, bool lsi) @@ -295,9 +296,12 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property_cell(fdt, #address-cells, 0x1))); _FDT((fdt_property_cell(fdt, #size-cells, 0x0))); -modelname = g_strdup(cpu_model); +/* device tree nodes must look like this : + * PowerPC,CPU_ALIAS@0 + */ +modelname = g_strdup_printf(PPC_DEVTREE_STR %s, cpu_model); -for (i = 0; i strlen(modelname); i++) { +for (i = strlen(PPC_DEVTREE_STR); i strlen(modelname); i++) { modelname[i] = toupper(modelname[i]); } @@ -735,7 +739,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) MemoryRegion *sysmem = get_system_memory(); MemoryRegion *ram = g_new(MemoryRegion, 1); hwaddr rma_alloc_size; -uint32_t initrd_base = 0; +uint32_t initrd_base = 0, pvr = 0; long kernel_size = 0, initrd_size = 0; long load_limit, rtas_limit, fw_size; char *filename; @@ -959,6 +963,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) spapr-entry_point = 0x100; +#ifdef CONFIG_KVM +/* Ensure that cpu_model is correctly reflected for a KVM guest */ +if (kvm_enabled() !strcmp(cpu_model, host)) { +asm (mfpvr %0 +: =r(pvr)); +cpu_model = ppc_cpu_alias_by_pvr(pvr); +} +#endif /* Prepare the device tree */ spapr-fdt_skel = spapr_create_fdt_skel(cpu_model, initrd_base, initrd_size, diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index 84ba105..90dd1dd 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -99,6 +99,7 @@ static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env) #define ENV_OFFSET offsetof(PowerPCCPU, env) PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr); +const char *ppc_cpu_alias_by_pvr(uint32_t pvr); void ppc_cpu_do_interrupt(CPUState *cpu); void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf, diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 50e0ee5..21a7f6f 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7913,6 +7913,34 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr) return pcc; } +const char *ppc_cpu_alias_by_pvr(uint32_t pvr) +{ +int i; +const char *cpu_alias; +char *offset, *model; + +cpu_alias = object_class_get_name(OBJECT_CLASS +(ppc_cpu_class_by_pvr(pvr))); + +/* Replace the full class name in cpu_alias with the CPU alias + * Eg, POWER7_V2.3-POWERPC64-CPU can simply be called + * POWER7 + */ + +offset = strstr(cpu_alias, - TYPE_POWERPC_CPU); +if (offset) { +model = g_strndup(cpu_alias, offset - cpu_alias); +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) { +if (strcmp(ppc_cpu_aliases[i].model, model) == 0) { +g_free(model); +return ppc_cpu_aliases[i].alias; +} +} +g_free(model); +} +return NULL; +} + static gint ppc_cpu_compare_class_name(gconstpointer a, gconstpointer b) { ObjectClass *oc = (ObjectClass *)a; -- 1.7.11.7 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
On 07/08/2013 02:32 PM, Andreas Färber wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 08.07.2013 03:09, schrieb David Gibson: On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy wrote: @@ -1342,6 +1346,13 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) register_savevm_live(NULL, spapr/htab, -1, 1, savevm_htab_handlers, spapr); +/* Ensure that cpu_model is correctly reflected for a KVM guest */ +if (kvm_enabled() !strcmp(cpu_model, host)) { +asm (mfpvr %0 +: =r(pvr)); + cpu_model = ppc_cpu_alias_by_pvr(pvr); This needs to be protected by an ifdef CONFIG_KVM or similar. If the compiler optimization level is turned down, so that it doesn't recognize that the kvm_enabled() is always false, then this could attempt to compile the ppc asm instructions on an x86 (or whatever) host. This hunk can be completely replaced by QOM mechanisms - just didn't get to replying yet... Hi Andreas, Sorry I already sent out a v2, and only then saw your message. Could you pls explain how I could use QOM to replace this code block ? Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH] Target-ppc : Enhance the CPU node labels for guest device tree for pseries.
[PATCH] Target-ppc : Enhance the CPU node labels for the guest device tree for pseries. In absence of a -CPU parameter in the qemu command line, the nodes of KVM-enabled guest device tree look like this : /proc/device-tree/cpus/HOST@0/... /proc/device-tree/cpus/HOST@4/... This patch replaces this obscure 'HOST' label with a more descriptive label. This is gathered by first identifying the PVR of the host, and then determining the host CPU alias which corresponds to the model indicated by this PVR. Sample Final outcome for an KVM-enabled pseries guest running on POWER7: /proc/device-tree/cpus/PowerPC,POWER7@0/... /proc/device-tree/cpus/PowerPC,POWER7@4/... This also helps userspace tools like ppc64_cpu, which expect the device tree to be in this format in the guest. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- hw/ppc/spapr.c | 17 ++--- target-ppc/cpu-qom.h| 1 + target-ppc/translate_init.c | 28 3 files changed, 43 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index fe34291..e084f3f 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -79,6 +79,7 @@ #define HTAB_SIZE(spapr)(1ULL ((spapr)-htab_shift)) +#define PPC_DEVTREE_STR PowerPC, sPAPREnvironment *spapr; int spapr_allocate_irq(int hint, bool lsi) @@ -295,9 +296,12 @@ static void *spapr_create_fdt_skel(const char *cpu_model, _FDT((fdt_property_cell(fdt, #address-cells, 0x1))); _FDT((fdt_property_cell(fdt, #size-cells, 0x0))); -modelname = g_strdup(cpu_model); +/* device tree nodes must look like this : + * PowerPC,CPU_ALIAS@0 + */ +modelname = g_strdup_printf(PPC_DEVTREE_STR %s, cpu_model); -for (i = 0; i strlen(modelname); i++) { +for (i = strlen(PPC_DEVTREE_STR); i strlen(modelname); i++) { modelname[i] = toupper(modelname[i]); } @@ -735,7 +739,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) MemoryRegion *sysmem = get_system_memory(); MemoryRegion *ram = g_new(MemoryRegion, 1); hwaddr rma_alloc_size; -uint32_t initrd_base = 0; +uint32_t initrd_base = 0, pvr = 0; long kernel_size = 0, initrd_size = 0; long load_limit, rtas_limit, fw_size; char *filename; @@ -959,6 +963,13 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) spapr-entry_point = 0x100; +/* Ensure that cpu_model is correctly reflected for a KVM guest */ +if (kvm_enabled() !strcmp(cpu_model, host)) { +asm (mfpvr %0 +: =r(pvr)); +cpu_model = ppc_cpu_alias_by_pvr(pvr); +} + /* Prepare the device tree */ spapr-fdt_skel = spapr_create_fdt_skel(cpu_model, initrd_base, initrd_size, diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index 84ba105..90dd1dd 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -99,6 +99,7 @@ static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env) #define ENV_OFFSET offsetof(PowerPCCPU, env) PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr); +const char *ppc_cpu_alias_by_pvr(uint32_t pvr); void ppc_cpu_do_interrupt(CPUState *cpu); void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf, diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 50e0ee5..21a7f6f 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7913,6 +7913,34 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr) return pcc; } +const char *ppc_cpu_alias_by_pvr(uint32_t pvr) +{ +int i; +const char *cpu_alias; +char *offset, *model; + +cpu_alias = object_class_get_name(OBJECT_CLASS +(ppc_cpu_class_by_pvr(pvr))); + +/* Replace the full class name in cpu_alias with the CPU alias + * Eg, POWER7_V2.3-POWERPC64-CPU can simply be called + * POWER7 + */ + +offset = strstr(cpu_alias, - TYPE_POWERPC_CPU); +if (offset) { +model = g_strndup(cpu_alias, offset - cpu_alias); +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) { +if (strcmp(ppc_cpu_aliases[i].model, model) == 0) { +g_free(model); +return ppc_cpu_aliases[i].alias; +} +} +g_free(model); +} +return NULL; +} + static gint ppc_cpu_compare_class_name(gconstpointer a, gconstpointer b) { ObjectClass *oc = (ObjectClass *)a; -- 1.7.11.7 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
Hi Andreas, Thank you for taking a look. I have incorporated your feedback into a new patch, attached herewith. Regards, Prerna Subject: [PATCH] target-ppc: Add POWER8 v1.0 CPU model This patch adds CPU PVR definition for POWER8, and enables QEMU to launch guests on POWER8 hardware. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Paul Mackerras pau...@samba.org Reviewed-by: Andreas Farber afaer...@suse.de --- target-ppc/cpu-models.c | 3 +++ target-ppc/cpu-models.h | 1 + target-ppc/translate_init.c | 34 ++ 3 files changed, 38 insertions(+) diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c index 17f56b7..72f7088 100644 --- a/target-ppc/cpu-models.c +++ b/target-ppc/cpu-models.c @@ -1145,6 +1145,8 @@ POWER7 v2.1) POWERPC_DEF(POWER7_v2.3, CPU_POWERPC_POWER7_v23, POWER7, POWER7 v2.3) +POWERPC_DEF(POWER8_v1.0, CPU_POWERPC_POWER8_v10, POWER8, +POWER8 v1.0) POWERPC_DEF(970, CPU_POWERPC_970,970, PowerPC 970) POWERPC_DEF(970fx_v1.0,CPU_POWERPC_970FX_v10, 970FX, @@ -1390,6 +1392,7 @@ const PowerPCCPUAlias ppc_cpu_aliases[] = { { Dino, POWER3 }, { POWER3+, 631 }, { POWER7, POWER7_v2.3 }, +{ POWER8, POWER8_v1.0 }, { 970fx, 970fx_v3.1 }, { 970mp, 970mp_v1.1 }, { Apache, RS64 }, diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h index a94f835..1c67a0e 100644 --- a/target-ppc/cpu-models.h +++ b/target-ppc/cpu-models.h @@ -555,6 +555,7 @@ enum { CPU_POWERPC_POWER7_v20 = 0x003F0200, CPU_POWERPC_POWER7_v21 = 0x003F0201, CPU_POWERPC_POWER7_v23 = 0x003F0203, +CPU_POWERPC_POWER8_v10 = 0x004B0100, CPU_POWERPC_970= 0x00390202, CPU_POWERPC_970FX_v10 = 0x00391100, CPU_POWERPC_970FX_v20 = 0x003C0200, diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 71e434a..a1d8e70 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7042,6 +7042,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) pcc-l1_dcache_size = 0x8000; pcc-l1_icache_size = 0x8000; } + +POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); +PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc); + +dc-desc = POWER8; +pcc-init_proc = init_proc_POWER7; +pcc-check_pow = check_pow_nocheck; +pcc-insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB | + PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES | + PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE | + PPC_FLOAT_STFIWX | + PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ | + PPC_MEM_SYNC | PPC_MEM_EIEIO | + PPC_MEM_TLBIE | PPC_MEM_TLBSYNC | + PPC_64B | PPC_ALTIVEC | + PPC_SEGMENT_64B | PPC_SLBI | + PPC_POPCNTB | PPC_POPCNTWD; +pcc-insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX; +pcc-msr_mask = 0x8204FF36ULL; +pcc-mmu_model = POWERPC_MMU_2_06; +#if defined(CONFIG_SOFTMMU) +pcc-handle_mmu_fault = ppc_hash64_handle_mmu_fault; +#endif +pcc-excp_model = POWERPC_EXCP_POWER7; +pcc-bus_model = PPC_FLAGS_INPUT_POWER7; +pcc-bfd_mach = bfd_mach_ppc64; +pcc-flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE | + POWERPC_FLAG_BE | POWERPC_FLAG_PMM | + POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR; +pcc-l1_dcache_size = 0x8000; +pcc-l1_icache_size = 0x8000; +} #endif /* defined (TARGET_PPC64) */ -- 1.7.11.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Request for inputs]Qemu parameters that need runtime change.
Hi, QEMU at present can be started with a huge list of parameters, and only a subset of these can be changed at runtime. For the remaining ones, one needs to restart the qemu instance. I've been trying to put together a list of some such parameters, which would make good candidates for a runtime change. Request inputs on more such parameters that could make it here, and also whether the following are good-to-have features: 1. Allowing a runtime change from one chardev backend to another ( Eg, from TCP socket to unix, and vice-versa ) 2. Changing the network interfaces (from -net user to -net tap ? ) I'm presently aware of these; it would be good to get more inputs on what more can be done here. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 0/2] Allow cache settings for block devices to be changed at runtime.
The following patchset introduces monitor commands: 1. set_cache DEVICE CACHE-SETTING Change cache settings for block device, DEVICE, through the monitor. (Available options : 'none', 'writeback', 'writethrough') Eg, (qemu)set_cache ide0-hd0 none - Changes cache setting for ide0-hd0 to 'none' 2. info block Now extended to display cache settings for available block devices. TODOS : --- 1. Support 'unsafe' cache mode. 2. Display current cache setting for device, if the CACHE-SETTING option is not supplied by the user. Eg, (qemu)set_cache ide0-hd0 presently errors out. Ideally, it should display current cache setting for the given device ide0-hd0 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 1/2] Add monitor command 'set-cache' to change cache settings for a block device.
Usage : (qemu) set_cache DEVICE CACHE-MODE where CACHE-MODE can be one of writeback/ writethrough/ none. At present, the image file is closed and re-opened with appropriate flags. It might potentially cause problems if the underlying image is deleted while a running qemu instance is using it. A change in cache operations will cause the image file to be closed, and a deleted file will be gone. Suggestions to fix this ? --- blockdev.c | 76 +++ blockdev.h |1 + hmp-commands.hx | 13 + 3 files changed, 90 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 0690cc8..6735205 100644 --- a/blockdev.c +++ b/blockdev.c @@ -636,6 +636,82 @@ out: return ret; } +int do_set_cache(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *cache = qdict_get_str(qdict, cache); +BlockDriverState *bs; +BlockDriver *drv; +int ret = 0; +int bdrv_flags = 0; + +if (!cache) { + /* TODO: in the absence of a change request, + simply display current cache setting. + Currently one needs 'info block' to query this */ +qerror_report(QERR_MISSING_PARAMETER, cache); +return -1; +} + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +return -1; +} + +/* Clear old flags */ +bdrv_flags = bs-open_flags; +if (bdrv_flags BDRV_O_CACHE_MASK) { +bdrv_flags = ~BDRV_O_CACHE_MASK; +} + +/* Determine flags for requested cache setting */ +if (!strcmp(cache, none)) { +bdrv_flags |= BDRV_O_NOCACHE; +} else if (!strcmp(cache, writeback)) { +bdrv_flags |= BDRV_O_CACHE_WB; +} else if (!strcmp(cache, unsafe)) { + /* TODO : Support unsafe mode */ +qerror_report(QERR_INVALID_PARAMETER_VALUE, cache, + writeback, writethrough, none); +return -1; +} else if (!strcmp(cache, writethrough)) { +/* Default setting */ +} else { +qerror_report(QERR_INVALID_PARAMETER_VALUE, cache, + 'cache' must be one of writeback, writethrough, none); +return -1; +} + +/* Verify that the cache setting specified is different from current. + * Does NOT call for error return, since the 'request' is already + * honoured. + */ +if (bdrv_flags == bs-open_flags) { +qerror_report(QERR_PROPERTY_VALUE_IN_USE, device, cache, cache); +return 0; +} + +/* Quiesce IO for the given block device */ +qemu_aio_flush(); +bdrv_flush(bs); + +/* Change cache value and restart IO on the block device */ +printf(Setting cache=%s for device %s [ filename %s ], cache, device, +bs-filename ); +drv = bs-drv; +bdrv_close(bs); +ret = bdrv_open(bs, bs-filename, bdrv_flags, drv); +/* + * A failed attempt to reopen the image file must lead to 'abort()' + */ +if (ret != 0) { +abort(); +} + +return ret; +} + static int eject_device(Monitor *mon, BlockDriverState *bs, int force) { if (!force) { diff --git a/blockdev.h b/blockdev.h index 2c9e780..9f35817 100644 --- a/blockdev.h +++ b/blockdev.h @@ -63,6 +63,7 @@ int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); +int do_set_cache(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 372bef4..18761cf 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1066,7 +1066,20 @@ STEXI @findex watchdog_action Change watchdog action. ETEXI +{ +.name = set_cache, +.args_type = device:B,cache:s, +.params = device writeback|writethrough|none, +.help = change cache settings for device, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_set_cache, +}, +STEXI +@item set_cache +@findex set_cache +Set cache options for a block device. +ETEXI { .name = acl_show, .args_type = aclname:s, -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 2/2] Extend monitor command 'info block' to display cache settings for block devices.
(qemu)info block SAMPLE output : ide0-hd0: type=hd removable=0 cache=none file=/tmp/abc.img ro=0 drv=qcow2 encrypted=0 --- block.c | 22 -- 1 files changed, 20 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index f7d91a2..c717888 100644 --- a/block.c +++ b/block.c @@ -1707,6 +1707,23 @@ static void bdrv_print_dict(QObject *obj, void *opaque) monitor_printf(mon, locked=%d, qdict_get_bool(bs_dict, locked)); } +if (qdict_haskey(bs_dict, open_flags) +!strcmp(qdict_get_str(bs_dict, type), hd)) { +int open_flags = qdict_get_int(bs_dict, open_flags); +if (open_flags BDRV_O_NOCACHE) { +monitor_printf(mon, cache=none); +} else if (open_flags BDRV_O_CACHE_WB) { +if (open_flags BDRV_O_NO_FLUSH) { +monitor_printf(mon, cache=unsafe); +} +else { +monitor_printf(mon, cache=writeback); +} +} else { +monitor_printf(mon, cache=writethrough); +} +} + if (qdict_haskey(bs_dict, inserted)) { QDict *qdict = qobject_to_qdict(qdict_get(bs_dict, inserted)); @@ -1756,9 +1773,10 @@ void bdrv_info(Monitor *mon, QObject **ret_data) } bs_obj = qobject_from_jsonf({ 'device': %s, 'type': %s, -'removable': %i, 'locked': %i }, +'removable': %i, 'locked': %i, +'open_flags': %d }, bs-device_name, type, bs-removable, -bs-locked); +bs-locked, bs-open_flags); if (bs-drv) { QObject *obj; -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH v2] Add a DTrace tracing backend targetted for SystemTAP compatability
ACK, works well! A suggestion though.. On 10/20/2010 07:39 PM, Daniel P. Berrange wrote: eg, instead of probe process(qemu).mark(qemu_malloc) { printf(Malloc %d %p\n, $arg1, $arg2); } The addition of qemu.stp to /usr/share/systemtap/tapset/ lets users write probe qemu.qemu_malloc { printf(Malloc %d %p\n, size, ptr); } ... diff --git a/tracetool b/tracetool index 7010858..047f16b 100755 --- a/tracetool +++ b/tracetool +linetos_dtrace() +{ +local name args arglist state + +# Define prototype for probe arguments +catEOF +probe qemu.$name = process(qemu).mark($name) +{ The 'process' probes only work by looking for the binary in $PATH, unless the full path is specified. When compiling qemu in non-standard locations ( ie with --prefix), such probes would not point to the correct binary. It could be nice if tracetool could pass the full build path for defining the probe point. Eg, probe qemu.qemu_malloc = process(/Path/to/build/dir/bin/qemu).mark(qemu_malloc) { .. } -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] Re: [RFC][PATCH 4/5] trace-event
On 10/22/2010 08:57 PM, Stefan Hajnoczi wrote: On Thu, Oct 21, 2010 at 03:10:18PM +0530, Prerna Saxena wrote: trace-event : QMP interface to change state of a trace-event. (Analogous to hmp command : trace-event ) Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com --- qmp-commands.hx | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 7e95f4e..f2008e8 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -761,6 +761,38 @@ Example: Note: This command must be issued before issuing any other command. +EQMP + +{ +.name = trace-event, +.args_type = name:s,option:b, +.params = name on|off, +.help = changes state of a specific trace event, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_change_trace_event_state_qmp, +}, + +SQMP +trace-event +--- + +Change state of a trace-event. The name is a little odd because it has no verb. How about set-trace-event or enable-trace-event? Sure, makes sense. I'll incorporate this when I send out the next set of patches. Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] Re: [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
set. +- status: State of trace-event [ '0': disabled; '1':enabled ] (json-int) This should be a json bool called 'enabled' or 'disabled', but what happens when a file is not defined? Changed type to json bool. The trace infrastructure sets the trace-output file to trace-PID ( created in current dir) if no explicit trace-file is specified at startup. (Users can also change the default trace-file at runtime using the hmp command 'trace-file set FILE' I'll be covering QMP interface for the same in the upcoming patchset. ) + +Example: + +- { execute: query-trace-file } +- { + return:{ + trace-file: trace-26609, + status: 1 + } + } + +EQMP -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] Re: [Tracing][RFC] QMP interface to toggle state of a trace-event
Thanks for the review! On 10/21/2010 12:53 AM, Luiz Capitulino wrote: On Wed, 20 Oct 2010 15:28:49 +0530 Prerna Saxenapre...@linux.vnet.ibm.com wrote: QMP command trace-event to toggle state of a trace-event. Illustration : - { execute: trace-event, arguments: { name: qemu_malloc, option: true} } - { return: {} } Posting this as an RFC for now. I'll post the final version as a part of the cumulative QMP patchset for tracing ( including patches for query-* commands posted earlier : http://lists.gnu.org/archive/html/qemu-devel/2010-10/msg01232.html ) Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com --- hmp-commands.hx |2 +- monitor.c | 43 +-- qmp-commands.hx | 32 3 files changed, 70 insertions(+), 7 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index bc79b55..7613d73 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -761,6 +761,38 @@ Example: Note: This command must be issued before issuing any other command. +EQMP + +{ +.name = trace-event, +.args_type = name:s,option:b, +.params = name on|off, +.help = changes state of a specific trace event, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_change_trace_event_state_qmp, +}, + +SQMP +trace-event +--- + +Change state of a trace-event. + +Arguments: + +- name: name of trace-event (json-string) +- option: new state for the trace-event (json-bool) This should be called 'enabled'. I agree, 'enabled' is less ambiguous. Will change in the next patchset. I think you should submit a new series containing only the proposed interfaces documentation (one patch per interface) and the intro email should describe the use cases the proposed interfaces are supposed to address. I'll send out the new documentation patchset series shortly. + +Example: + +- { execute: trace-event, arguments: { name: ABC, option:false } } +- { return: {} } + +Notes: + +(1) The 'query-trace-events' command should be used to check the new state +of the trace-event. + 3. Query Commands = -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 1/5] query-trace command
QMP interface query-trace to list current contents of trace-buffer. ( Analogous to hmp command : info trace ) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 51 +++ 1 files changed, 51 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..f289064 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,54 @@ Example: EQMP +SQMP +query-trace +- + +Show current contents of trace buffer. + +Returns a json-array of json-objects containing the following data: + +- event_id: Event ID for the trace-event(json-int) +- timestamp: trace timestamp in ns (json-int) +- trace-arg: A json-object containing args logged by the trace-event: +- arg1: First trace argument (json-int) +- arg2: Second trace argument (json-int) +- arg3: Third trace argument (json-int) +- arg4: Fourth trace argument (json-int) +- arg5: Fifth trace argument (json-int) +- arg6: Sixth trace argument (json-int) + +Example: + +- { execute: query-trace } +- { + return:[ + { +event: 22, +timestamp: 129456235912365, +trace-arg:{ + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0, +} + }, + { +event: 22, +timestamp: 129456235973407, +trace-arg:{ + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0 +}, + } + ] + } + +EQMP -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 2/5] query-trace-events
'query-trace-events' : QMP interface to display currently available trace-events with their state. ( Analogous to hmp command : info trace-events ) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index f289064..e079eef 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1590,3 +1590,35 @@ Example: } EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a json-array of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- event_id: Event ID of Trace-event (json-int) +- state: State of trace-event (json-bool) + +Example: + +- { execute: query-trace-events } +- { + return:[ + { +name: qemu_malloc, +event_id: 0, +state: false + }, + { +name: qemu_realloc, +event_id: 1, +state: false + }, + ] + } + +EQMP -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 5/5] set-trace-file
set-trace-file : QMP command to: - Enable/disable logging traces to file - Set a new output file - Flush a semi-filled trace-buffer to output file. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 41 + 1 files changed, 41 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index f2008e8..295382f 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -793,6 +793,47 @@ Notes: (1) The 'query-trace-events' command should be used to check the new state of the trace-event. +EQMP + +{ +.name = set-trace-file, +.args_type = enable:-e?,flush:-f?,filename:F?, +.params = [-e] [-f] [filename], +.help = Sets a user-specified output file to write traces to, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_set_trace_file_qmp, +}, + +SQMP +set-trace-file +-- + +Set a new output file to log trace data to. + +Arguments: + +- filename: name of new output file to write trace data to. + (json-string, optional) +- enable: if false, traces are not written to file. + : Only when this is 'true' that trace buffer contents get logged +in a file. (json-bool, optional, defaults to false) +- flush: if true, contents of trace buffer are immediately written to file, + instead of waiting for the buffer to be full. + (json-bool, optional, defaults to false) + +Example: +1. Set a new trace-file: +- { execute: set-trace-file, arguments: { filename: ABC, + enable:true } } +- { return: {} } + +2. Flush the current traces to file: +- { execute: set-trace-file, arguments: { flush: true } } + +Notes: + +(1) The 'query-trace-file' command should be used to check active trace-file. + 3. Query Commands = -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC] [PATCH 3/5] query-trace-file
'query-trace-file' : QMP interface to find currently set trace file and its status. (Analogous to hmp command : trace-file) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index e079eef..7e95f4e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1622,3 +1622,27 @@ Example: } EQMP + +SQMP +query-trace-file + + +Display the trace file name to which trace data is currently logged, and its +status. + +Returns a json-object containing the following data: + +- trace-file: Name + path of Trace-file (json-string) +- enabled: State of trace-event (json-bool) + +Example: + +- { execute: query-trace-file } +- { + return:{ + trace-file: /tmp/trace-26609, + enabled: true + } + } + +EQMP -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC][PATCH 4/5] trace-event
trace-event : QMP interface to change state of a trace-event. (Analogous to hmp command : trace-event ) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 7e95f4e..f2008e8 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -761,6 +761,38 @@ Example: Note: This command must be issued before issuing any other command. +EQMP + +{ +.name = trace-event, +.args_type = name:s,option:b, +.params = name on|off, +.help = changes state of a specific trace event, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_change_trace_event_state_qmp, +}, + +SQMP +trace-event +--- + +Change state of a trace-event. + +Arguments: + +- name: name of trace-event (json-string) +- enable: New state to be set for the trace-event (json-bool) + +Example: + +- { execute: trace-event, arguments: { name: ABC, enable:false } } +- { return: {} } + +Notes: + +(1) The 'query-trace-events' command should be used to check the new state +of the trace-event. + 3. Query Commands = -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC 0/5] QMP interfaces for tracing
As suggested by Luiz, I'm posting this set of documentation patches that elucidate for proposed QMP interfaces for tracing. QMP commands : * trace-event : to toggle state of a trace-event. * set-trace-file : to set a new output file for tracing; enable/disable writing traces to file; flush buffer contents to file. * Query Commands : -- * query-trace : to list current contents of trace buffer that havent been written to file. * query-trace-events : to list all available trace-events and their status. * query-trace-file : to display currently set trace file and its status. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][RFC] QMP interface to toggle state of a trace-event
QMP command trace-event to toggle state of a trace-event. Illustration : - { execute: trace-event, arguments: { name: qemu_malloc, option: true} } - { return: {} } Posting this as an RFC for now. I'll post the final version as a part of the cumulative QMP patchset for tracing ( including patches for query-* commands posted earlier : http://lists.gnu.org/archive/html/qemu-devel/2010-10/msg01232.html ) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- hmp-commands.hx |2 +- monitor.c | 43 +-- qmp-commands.hx | 32 3 files changed, 70 insertions(+), 7 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..76ec2fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -149,7 +149,7 @@ ETEXI .args_type = name:s,option:b, .params = name on|off, .help = changes status of a specific trace event, -.mhandler.cmd = do_change_trace_event_state, +.mhandler.cmd = do_change_trace_event_state_hmp, }, STEXI diff --git a/monitor.c b/monitor.c index c7e1f53..0766ed3 100644 --- a/monitor.c +++ b/monitor.c @@ -545,17 +545,43 @@ static void do_help_cmd(Monitor *mon, const QDict *qdict) } #ifdef CONFIG_SIMPLE_TRACE -static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) + +/** + * HMP handler to change trace event state. + * + */ +void do_change_trace_event_state_hmp(Monitor *mon, const QDict *qdict) { -const char *tp_name = qdict_get_str(qdict, name); -bool new_state = qdict_get_bool(qdict, option); -int ret = st_change_trace_event_state(tp_name, new_state); +if (!do_change_trace_event_state_generic(qdict)) { +monitor_printf(mon, unknown event name \%s\\n, + qdict_get_str(qdict, name)); +} +} -if (!ret) { -monitor_printf(mon, unknown event name \%s\\n, tp_name); +/** + * QMP handler to change trace event state. + * + */ +static int do_change_trace_event_state_qmp(Monitor *mon, const QDict *qdict, + QObject **ret_data) +{ +if (!do_change_trace_event_state_generic(qdict)) { +qerror_report(QERR_INVALID_PARAMETER, qdict_get_str(qdict, name)); +return -1; } +return 0; } +/** + * Generic handler to change trace event state. + * + */ +static int do_change_trace_event_state_generic(const QDict *qdict) +{ +const char *tp_name = qdict_get_str(qdict, name); +bool new_state = qdict_get_bool(qdict, option); +return st_change_trace_event_state(tp_name, new_state); +} static void do_trace_file(Monitor *mon, const QDict *qdict) { const char *op = qdict_get_try_str(qdict, op); @@ -583,6 +609,11 @@ static void do_info_trace_file_to_qmp(Monitor *mon, QObject **ret_data) { *ret_data = st_print_file_to_qobject(); } + +#else +static int do_change_trace_event_state_qmp(Monitor *mon, const QDict *qdict, +QObject **ret_data) {} + #endif static void user_monitor_complete(void *opaque, QObject *ret_data) diff --git a/qmp-commands.hx b/qmp-commands.hx index bc79b55..7613d73 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -761,6 +761,38 @@ Example: Note: This command must be issued before issuing any other command. +EQMP + +{ +.name = trace-event, +.args_type = name:s,option:b, +.params = name on|off, +.help = changes state of a specific trace event, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_change_trace_event_state_qmp, +}, + +SQMP +trace-event +--- + +Change state of a trace-event. + +Arguments: + +- name: name of trace-event (json-string) +- option: new state for the trace-event (json-bool) + +Example: + +- { execute: trace-event, arguments: { name: ABC, option:false } } +- { return: {} } + +Notes: + +(1) The 'query-trace-events' command should be used to check the new state +of the trace-event. + 3. Query Commands = -- 1.7.2.3 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][v4 PATCH 0/2] QMP Query interfaces for tracing
This patch set introduces three QMP query interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. * query-trace-file : to list currently set trace-file with its status. Changelog : --- Changes v3 - v4 : - Add 'query-trace-file' interface to query currently active trace-file. - Cleanup. Changes v2 - v3 : - Change declarations of st_print_trace_to_qlist() and st_print_trace_events_to_qlist() to return QList* Changes v1 - v2 : - Add 'timestamp' field for query-trace output. - Misc cleanups. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][v4 PATCH 1/2] Introduce QMP interfaces
[PATCH 1/2] Introduce QMP interfaces : - query-trace - query-trace-events - query-trace-file Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 53 --- simpletrace.c | 69 + simpletrace.h |5 3 files changed, 123 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index 260cc02..c7e1f53 100644 --- a/monitor.c +++ b/monitor.c @@ -578,6 +578,11 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) help_cmd(mon, trace-file); } } + +static void do_info_trace_file_to_qmp(Monitor *mon, QObject **ret_data) +{ +*ret_data = st_print_file_to_qobject(); +} #endif static void user_monitor_complete(void *opaque, QObject *ret_data) @@ -945,15 +950,27 @@ static void do_info_cpu_stats(Monitor *mon) #endif #if defined(CONFIG_SIMPLE_TRACE) -static void do_info_trace(Monitor *mon) +static void do_info_trace_print(Monitor *mon, const QObject *data) { st_print_trace((FILE *)mon, monitor_fprintf); } -static void do_info_trace_events(Monitor *mon) +static void do_info_trace(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} + +static void do_info_trace_events_print(Monitor *mon, const QObject *data) { st_print_trace_events((FILE *)mon, monitor_fprintf); } + +static void do_info_trace_events(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_events_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} #endif /** @@ -2610,14 +2627,16 @@ static const mon_cmd_t info_cmds[] = { .args_type = , .params = , .help = show current contents of trace buffer, -.mhandler.info = do_info_trace, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, }, { .name = trace-events, .args_type = , .params = , .help = show available trace-events their state, -.mhandler.info = do_info_trace_events, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, }, #endif { @@ -2752,6 +2771,32 @@ static const mon_cmd_t qmp_query_cmds[] = { .mhandler.info_async = do_info_balloon, .flags = MONITOR_CMD_ASYNC, }, +#if defined(CONFIG_SIMPLE_TRACE) +{ +.name = trace, +.args_type = , +.params = , +.help = show current contents of trace buffer, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, +}, +{ +.name = trace-events, +.args_type = , +.params = , +.help = show available trace-events their state, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, +}, +{ +.name = trace-file, +.args_type = , +.params = , +.help = show currently active trace output file and its status, +.user_print = monitor_user_noop, +.mhandler.info_new = do_info_trace_file_to_qmp, +}, +#endif { /* NULL */ }, }; diff --git a/simpletrace.c b/simpletrace.c index deb1e07..d24d6b0 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -220,6 +220,43 @@ void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char } } +/** + * Add the current contents of trace-buffer as a QList. + * + */ +QList* st_print_trace_to_qlist(void) +{ +QObject *data; +QList *tlist; +unsigned int i; + +tlist = qlist_new(); + +for (i = 0; i trace_idx; i++) { + data = qobject_from_jsonf({ + 'timestamp': % PRId64 , + 'event': % PRId64 , + 'arg1': % PRId64 , + 'arg2': % PRId64 , + 'arg3': % PRId64 , + 'arg4': % PRId64 , + 'arg5': % PRId64 , + 'arg6': % PRId64 +}, +trace_buf[i].timestamp_ns, +trace_buf[i].event, +trace_buf[i].x1, +trace_buf[i].x2, +trace_buf[i].x3, +trace_buf[i].x4, +trace_buf[i].x5, +trace_buf[i].x6); + qlist_append_obj(tlist, data); +} + +return tlist; +} + void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { unsigned int i; @@ -230,6 +267,38
[Qemu-devel] [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
[PATCH 2/2] Add documentation for QMP commands: - query-trace - query-trace-events - query-trace-file. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 94 +++ 1 files changed, 94 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..bc79b55 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,97 @@ Example: EQMP +SQMP +query-trace +- + +Show contents of trace buffer. + +Returns a set of json-objects containing the following data: + +- event: Event ID for the trace-event(json-int) +- timestamp: trace timestamp (json-int) +- arg1 .. arg6: Arguments logged by the trace-event (json-int) + +Example: + +- { execute: query-trace } +- { + return:{ + event: 22, + timestamp: 129456235912365, + arg1: 886 + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0, + }, + { + event: 22, + timestamp: 129456235973407, + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a set of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- event-id: Event ID of Trace-event (json-int) +- state: State of trace-event [ '0': inactive; '1':active ] (json-int) + +Example: + +- { execute: query-trace-events } +- { + return:{ + name: qemu_malloc, + event-id: 0 + state: 0, + }, + { + name: qemu_realloc, + event-id: 1, + state: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-file + + +Display currently set trace file name and its status. + +Returns a set of json-objects containing the following data: + +- trace-file: Name of Trace-file (json-string) +- status: State of trace-event [ '0': disabled; '1':enabled ] (json-int) + +Example: + +- { execute: query-trace-file } +- { + return:{ + trace-file: trace-26609, + status: 1 + } + } + +EQMP -- 1.7.2.2 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [Tracing][v4 PATCH 2/2] Add documentation for QMP interfaces
On 10/19/2010 11:57 AM, Prerna Saxena wrote: [PATCH 2/2] Add documentation for QMP commands: - query-trace - query-trace-events - query-trace-file. I've been trying ways to avoid building this documentation for other trace backends ( since these commands are only available with the 'simple' backend ). However, looks like hxtool blindly copies text between SQMP and EQMP. I can only think of making hxtool a wee bit intelligent to be able to parse CONFIG_* options and build documentation accordingly. Is there a workaround I'm missing ? -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][RFC v3 PATCH 0/2] QMP Query interfaces for tracing
This patch set introduces two QMP interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. Changelog : --- Changes v2 - v3 : - Change declarations of st_print_trace_to_qlist() and st_print_trace_events_to_qlist() to return QList* Changes v1 - v2 : - Add 'timestamp' field for query-trace output. - Misc cleanups. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][RFC v3 PATCH 1/2] Introduce QMP interfaces : query-trace query-trace-events
[PATCH 1/2] Introduce QMP interfaces : query-trace query-trace-events. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 40 +++--- simpletrace.c | 58 + simpletrace.h |4 +++ 3 files changed, 98 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index fbb678d..41f3477 100644 --- a/monitor.c +++ b/monitor.c @@ -941,15 +941,27 @@ static void do_info_cpu_stats(Monitor *mon) #endif #if defined(CONFIG_SIMPLE_TRACE) -static void do_info_trace(Monitor *mon) +static void do_info_trace_print(Monitor *mon) { st_print_trace((FILE *)mon, monitor_fprintf); } -static void do_info_trace_events(Monitor *mon) +static void do_info_trace(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} + +static void do_info_trace_events_print(Monitor *mon, const QObject *data) { st_print_trace_events((FILE *)mon, monitor_fprintf); } + +static void do_info_trace_events(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = st_print_trace_events_to_qlist(); +*ret_data = QOBJECT(trace_event_list); +} #endif /** @@ -2606,14 +2618,16 @@ static const mon_cmd_t info_cmds[] = { .args_type = , .params = , .help = show current contents of trace buffer, -.mhandler.info = do_info_trace, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, }, { .name = trace-events, .args_type = , .params = , .help = show available trace-events their state, -.mhandler.info = do_info_trace_events, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, }, #endif { @@ -2748,6 +2762,24 @@ static const mon_cmd_t qmp_query_cmds[] = { .mhandler.info_async = do_info_balloon, .flags = MONITOR_CMD_ASYNC, }, +#if defined(CONFIG_SIMPLE_TRACE) +{ +.name = trace, +.args_type = , +.params = , +.help = show current contents of trace buffer, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, +}, +{ +.name = trace-events, +.args_type = , +.params = , +.help = show available trace-events their state, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, +}, +#endif { /* NULL */ }, }; diff --git a/simpletrace.c b/simpletrace.c index f849e42..9d7ec68 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -220,6 +220,43 @@ void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char } } +/** + * Add the current contents of trace-buffer as a QList. + * + */ +QList* st_print_trace_to_qlist() +{ +QObject *data; +QList *tlist; +unsigned int i; + +tlist = qlist_new(); + +for (i = 0; i trace_idx; i++) { + data = qobject_from_jsonf({ + 'timestamp': % PRId64 , + 'event': % PRId64 , + 'arg1': % PRId64 , + 'arg2': % PRId64 , + 'arg3': % PRId64 , + 'arg4': % PRId64 , + 'arg5': % PRId64 , + 'arg6': % PRId64 +}, +trace_buf[i].timestamp_ns, +trace_buf[i].event, +trace_buf[i].x1, +trace_buf[i].x2, +trace_buf[i].x3, +trace_buf[i].x4, +trace_buf[i].x5, +trace_buf[i].x6); + qlist_append_obj(tlist, data); +} + +return tlist; +} + void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { unsigned int i; @@ -230,6 +267,27 @@ void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, cons } } +/** + * Add current set of trace-events as a QList. + * + */ +QList* st_print_trace_events_to_qlist() +{ +QObject *data; +QList *tlist; +unsigned int i; + +tlist = qlist_new(); + +for (i = 0; i NR_TRACE_EVENTS; i++) { + data = qobject_from_jsonf({ 'name': %s, 'event-id': %d, 'state': %d}, trace_list[i].tp_name, i, +trace_list[i].state); + qlist_append_obj(tlist, data); +} + +return tlist; +} + static TraceEvent* find_trace_event_by_name(const char
[Qemu-devel] [Tracing][RFC v3 PATCH 2/2] Add documentation for QMP commands: query-trace query-trace-events.
[PATCH 2/2] Add documentation for QMP commands: query-trace query-trace-events. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 71 +++ 1 files changed, 71 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..fefc93d 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,74 @@ Example: EQMP +SQMP +query-trace +- + +Show contents of trace buffer. + +Returns a set of json-objects containing the following data: + +- event: Event ID for the trace-event(json-int) +- timestamp: trace timestamp (json-int) +- arg1 .. arg6: Arguments logged by the trace-event (json-int) + +Example: + +- { execute: query-trace } +- { + return:{ + event: 22, + timestamp: 129456235912365, + arg1: 886 + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0, + }, + { + event: 22, + timestamp: 129456235973407, + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a set of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- event-id: Event ID of Trace-event (json-int) +- state: State of trace-event [ '0': inactive; '1':active ] (json-int) + +Example: + +- { execute: query-trace-events } +- { + return:{ + name: qemu_malloc, + event-id: 0 + state: 0, + }, + { + name: qemu_realloc, + event-id: 1, + state: 0 + }, + ... + } + +EQMP -- 1.7.2.2 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] Re: [Tracing][RFC v3 PATCH 0/2] QMP Query interfaces for tracing
On 10/18/2010 07:51 PM, Luiz Capitulino wrote: On Mon, 18 Oct 2010 11:36:55 +0530 Prerna Saxenapre...@linux.vnet.ibm.com wrote: This patch set introduces two QMP interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. This is in my to-review queue, but it's going to take a few days, because I have to take a deeper look at the tracing feature to be able to review it. Thanks for looking..I'd look forward to your comments :-) Two initial questions: o This is labeled as an RFC, but you're versioning it. Should this be considered for inclusion? I'm sending out a new version with some enhancements shortly -- for inclusion. o Is this really useful w/o being able to set new traces? I'm working on that as well. The query commands are the earliest interfaces to be implemented. I will be adding interfaces to toggle the state of trace-events, set a new trace-file, etc. Changelog : --- Changes v2 - v3 : - Change declarations of st_print_trace_to_qlist() and st_print_trace_events_to_qlist() to return QList* Changes v1 - v2 : - Add 'timestamp' field for query-trace output. - Misc cleanups. Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing] [RFC PATCH 2/2] : Documentation for QMP interfaces
[PATCH 2/2] Add documentation for QMP commands: query-trace query-trace-events. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 53 + 1 files changed, 53 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..9a48984 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,56 @@ Example: EQMP +SQMP +query-trace +- + +Show contents of trace buffer. + +Returns a set of json-objects containing the following data: + +- Event: Event ID for the trace-event(json-int) +- arg1 .. arg6: Arguments logged by the trace-event (json-int) + +Example: + +- { execute: query-trace } +- { + return:{ + Event: 22, + arg6: 0, + arg5: 0, + arg4: 0, + arg3: 0, + arg2: 80, + arg1: 886 + } + } + +EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a set of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- state: State of trace-event [ '0': inactive; '1':active ] (json-int) +- eventID: Event ID of Trace-event (json-int) + +Example: + +- { execute: query-trace-events } +- { + return:{ + name: qemu_malloc, + state: 0, + eventID: 0 + } + } + +EQMP + -- 1.7.2.2 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][RFC v2 PATCH 0/2] QMP Query interfaces for tracing
This patch set introduces two QMP interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. Changelog : --- Changes from v1 : - Add 'timestamp' field for query-trace output. - Misc cleanups. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][RFC v2 PATCH 1/2] Introduce 'query-trace' 'query-trace-events' interfaces
[PATCH 1/2] Introduce QMP interfaces : query-trace query-trace-events Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 46 +--- simpletrace.c | 58 + simpletrace.h |4 +++ 3 files changed, 104 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index fbb678d..7a150ae 100644 --- a/monitor.c +++ b/monitor.c @@ -941,15 +941,33 @@ static void do_info_cpu_stats(Monitor *mon) #endif #if defined(CONFIG_SIMPLE_TRACE) -static void do_info_trace(Monitor *mon) +static void do_info_trace_print(Monitor *mon) { st_print_trace((FILE *)mon, monitor_fprintf); } -static void do_info_trace_events(Monitor *mon) +static void do_info_trace(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = NULL; + +st_print_trace_to_qlist(trace_event_list); + +*ret_data = QOBJECT(trace_event_list); +} + +static void do_info_trace_events_print(Monitor *mon, const QObject *data) { st_print_trace_events((FILE *)mon, monitor_fprintf); } + +static void do_info_trace_events(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = NULL; + +st_print_trace_events_to_qlist(trace_event_list); + +*ret_data = QOBJECT(trace_event_list); +} #endif /** @@ -2606,14 +2624,16 @@ static const mon_cmd_t info_cmds[] = { .args_type = , .params = , .help = show current contents of trace buffer, -.mhandler.info = do_info_trace, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, }, { .name = trace-events, .args_type = , .params = , .help = show available trace-events their state, -.mhandler.info = do_info_trace_events, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, }, #endif { @@ -2748,6 +2768,24 @@ static const mon_cmd_t qmp_query_cmds[] = { .mhandler.info_async = do_info_balloon, .flags = MONITOR_CMD_ASYNC, }, +#if defined(CONFIG_SIMPLE_TRACE) +{ +.name = trace, +.args_type = , +.params = , +.help = show current contents of trace buffer, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, +}, +{ +.name = trace-events, +.args_type = , +.params = , +.help = show available trace-events their state, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, +}, +#endif { /* NULL */ }, }; diff --git a/simpletrace.c b/simpletrace.c index f849e42..a964312 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -220,6 +220,43 @@ void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char } } +/** + * Add the current contents of trace-buffer as a QList. + * NOTE: This assumes trace_list hasnt already been allocated with a QList. + *The initialization happens here. + */ +void st_print_trace_to_qlist(QList **tlist) +{ +QObject *data; +unsigned int i; + +assert(tlist); + +*tlist = qlist_new(); + +for (i = 0; i trace_idx; i++) { + data = qobject_from_jsonf({ + 'timestamp': % PRId64 , + 'event': % PRId64 , + 'arg1': % PRId64 , + 'arg2': % PRId64 , + 'arg3': % PRId64 , + 'arg4': % PRId64 , + 'arg5': % PRId64 , + 'arg6': % PRId64 +}, +trace_buf[i].timestamp_ns, +trace_buf[i].event, +trace_buf[i].x1, +trace_buf[i].x2, +trace_buf[i].x3, +trace_buf[i].x4, +trace_buf[i].x5, +trace_buf[i].x6); + qlist_append_obj(*tlist, data); +} +} + void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { unsigned int i; @@ -230,6 +267,27 @@ void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, cons } } +/** + * Add current set of trace-events as a QList. + * NOTE: This assumes trace_list hasnt already been allocated with a QList. + *The initialization happens here. + */ +void st_print_trace_events_to_qlist(QList **tlist) +{ +QObject *data; +unsigned int i; + +assert(tlist); + +*tlist = qlist_new(); + +for (i = 0; i NR_TRACE_EVENTS; i++) { + data
[Qemu-devel] [Tracing][RFC v2 PATCH 2/2] Documentation for QMP interfaces
[PATCH 2/2] Add documentation for QMP commands: query-trace query-trace-events. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qmp-commands.hx | 71 +++ 1 files changed, 71 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..fefc93d 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -1539,3 +1539,74 @@ Example: EQMP +SQMP +query-trace +- + +Show contents of trace buffer. + +Returns a set of json-objects containing the following data: + +- event: Event ID for the trace-event(json-int) +- timestamp: trace timestamp (json-int) +- arg1 .. arg6: Arguments logged by the trace-event (json-int) + +Example: + +- { execute: query-trace } +- { + return:{ + event: 22, + timestamp: 129456235912365, + arg1: 886 + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0, + }, + { + event: 22, + timestamp: 129456235973407, + arg1: 886, + arg2: 80, + arg3: 0, + arg4: 0, + arg5: 0, + arg6: 0 + }, + ... + } + +EQMP + +SQMP +query-trace-events +-- + +Show all available trace-events their state. + +Returns a set of json-objects containing the following data: + +- name: Name of Trace-event (json-string) +- event-id: Event ID of Trace-event (json-int) +- state: State of trace-event [ '0': inactive; '1':active ] (json-int) + +Example: + +- { execute: query-trace-events } +- { + return:{ + name: qemu_malloc, + event-id: 0 + state: 0, + }, + { + name: qemu_realloc, + event-id: 1, + state: 0 + }, + ... + } + +EQMP -- 1.7.2.2 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing] [RFC PATCH 0/2] : QMP query Interfaces for tracing
This patch set introduces two QMP interfaces for tracing : * query-trace: to list current contents of trace-buffer * query-trace-events : to list all available trace-events with their state. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing] [RFC PATCH 1/2] : Introduce 'query-trace' 'query-trace-events' interfaces
[PATCH 1/2] Introduce QMP interfaces : query-trace query-trace-events Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 46 ++ simpletrace.c | 54 ++ simpletrace.h |2 ++ 3 files changed, 98 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index fbb678d..7a150ae 100644 --- a/monitor.c +++ b/monitor.c @@ -941,15 +941,33 @@ static void do_info_cpu_stats(Monitor *mon) #endif #if defined(CONFIG_SIMPLE_TRACE) -static void do_info_trace(Monitor *mon) +static void do_info_trace_print(Monitor *mon) { st_print_trace((FILE *)mon, monitor_fprintf); } -static void do_info_trace_events(Monitor *mon) +static void do_info_trace(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = NULL; + +st_print_trace_to_qlist(trace_event_list); + +*ret_data = QOBJECT(trace_event_list); +} + +static void do_info_trace_events_print(Monitor *mon, const QObject *data) { st_print_trace_events((FILE *)mon, monitor_fprintf); } + +static void do_info_trace_events(Monitor *mon, QObject **ret_data) +{ +QList *trace_event_list = NULL; + +st_print_trace_events_to_qlist(trace_event_list); + +*ret_data = QOBJECT(trace_event_list); +} #endif /** @@ -2606,14 +2624,16 @@ static const mon_cmd_t info_cmds[] = { .args_type = , .params = , .help = show current contents of trace buffer, -.mhandler.info = do_info_trace, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, }, { .name = trace-events, .args_type = , .params = , .help = show available trace-events their state, -.mhandler.info = do_info_trace_events, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, }, #endif { @@ -2748,6 +2768,24 @@ static const mon_cmd_t qmp_query_cmds[] = { .mhandler.info_async = do_info_balloon, .flags = MONITOR_CMD_ASYNC, }, +#if defined(CONFIG_SIMPLE_TRACE) +{ +.name = trace, +.args_type = , +.params = , +.help = show current contents of trace buffer, +.user_print = do_info_trace_print, +.mhandler.info_new = do_info_trace, +}, +{ +.name = trace-events, +.args_type = , +.params = , +.help = show available trace-events their state, +.user_print = do_info_trace_events_print, +.mhandler.info_new = do_info_trace_events, +}, +#endif { /* NULL */ }, }; diff --git a/simpletrace.c b/simpletrace.c index f849e42..d1f66b4 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -220,6 +220,39 @@ void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char } } +void st_print_trace_to_qlist(QList **tlist) +{ +QObject *data; +unsigned int i; + +if (!tlist || *tlist ) +return; + +/* NOTE : This assumes trace_list hasnt already been allocated with a QList. + *The initialization happens here. + */ +*tlist = qlist_new(); + +for (i = 0; i trace_idx; i++) { + data = qobject_from_jsonf({ + 'Event': % PRId64 , + 'arg1': % PRId64 , + 'arg2': % PRId64 , + 'arg3': % PRId64 , + 'arg4': % PRId64 , + 'arg5': % PRId64 , + 'arg6': % PRId64 +}, +trace_buf[i].event, trace_buf[i].x1, +trace_buf[i].x2, trace_buf[i].x3, +trace_buf[i].x4, trace_buf[i].x5, +trace_buf[i].x6); + qlist_append_obj(*tlist, data); +} + +return; +} + void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { unsigned int i; @@ -230,6 +263,27 @@ void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, cons } } +void st_print_trace_events_to_qlist(QList **tlist) +{ +QObject *data; +unsigned int i; + +if (!tlist || *tlist ) +return; + +/* NOTE : This assumes trace_list hasnt already been allocated with a QList. + *The initialization happens here. + */ +*tlist = qlist_new(); + +for (i = 0; i NR_TRACE_EVENTS; i++) { + data = qobject_from_jsonf({ 'name': %s, 'eventID': %d, 'state': %d }, trace_list[i].tp_name, i, trace_list[i].state); + qlist_append_obj(*tlist, data); +} + +return; +} + static TraceEvent* find_trace_event_by_name(const char *tname) { unsigned int
[Qemu-devel] [PATCH][Tracing v2] Process -trace using QemuOptsList
[PATCH] Add -trace file FILENAME switch to qemu startup command. This processes the argument using QemuOptsList Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- qemu-config.c | 18 ++ qemu-config.h |3 +++ vl.c |5 - 3 files changed, 25 insertions(+), 1 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index 95abe61..9106511 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -294,6 +294,21 @@ QemuOptsList qemu_mon_opts = { }, }; +#ifdef CONFIG_SIMPLE_TRACE +QemuOptsList qemu_trace_opts = { +.name = trace, +.implied_opt_name = trace, +.head = QTAILQ_HEAD_INITIALIZER(qemu_trace_opts.head), +.desc = { +{ +.name = file, +.type = QEMU_OPT_STRING, +}, +{ /* end if list */ } +}, +}; +#endif + QemuOptsList qemu_cpudef_opts = { .name = cpudef, .head = QTAILQ_HEAD_INITIALIZER(qemu_cpudef_opts.head), @@ -352,6 +367,9 @@ static QemuOptsList *vm_config_groups[] = { qemu_global_opts, qemu_mon_opts, qemu_cpudef_opts, +#ifdef CONFIG_SIMPLE_TRACE +qemu_trace_opts, +#endif NULL, }; diff --git a/qemu-config.h b/qemu-config.h index dca69d4..4db2fb5 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -14,6 +14,9 @@ extern QemuOptsList qemu_rtc_opts; extern QemuOptsList qemu_global_opts; extern QemuOptsList qemu_mon_opts; extern QemuOptsList qemu_cpudef_opts; +#ifdef CONFIG_SIMPLE_TRACE +extern QemuOptsList qemu_trace_opts; +#endif QemuOptsList *qemu_find_opts(const char *group); int qemu_set_option(const char *str); diff --git a/vl.c b/vl.c index 99664e9..0ff04e9 100644 --- a/vl.c +++ b/vl.c @@ -2599,7 +2599,10 @@ int main(int argc, char **argv, char **envp) break; #ifdef CONFIG_SIMPLE_TRACE case QEMU_OPTION_trace: -trace_file = optarg; +opts = qemu_opts_parse(qemu_trace_opts, optarg, 0); +if (opts) { +trace_file = qemu_opt_get(opts, file); +} break; #endif case QEMU_OPTION_readconfig: -- 1.7.2.1 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [PATCH] trace: Make trace record fields 64-bit
On 08/09/2010 07:05 PM, Stefan Hajnoczi wrote: Explicitly use 64-bit fields in trace records so that timestamps and magic numbers work for 32-bit host builds. Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com --- simpletrace.c | 31 +-- simpletrace.h | 11 ++- simpletrace.py |2 +- tracetool |6 +++--- 4 files changed, 31 insertions(+), 19 deletions(-) diff --git a/simpletrace.c b/simpletrace.c index 954cc4e..01acfc5 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -9,18 +9,29 @@ */ #includestdlib.h +#includestdint.h #includestdio.h #includetime.h #include trace.h +/** Trace file header event ID */ +#define HEADER_EVENT_ID (~(uint64_t)0) /* avoids conflicting with TraceEventIDs */ + +/** Trace file magic number */ +#define HEADER_MAGIC 0xf2b177cb0aa429b4ULL + +/** Trace file version number, bump if format changes */ +#define HEADER_VERSION 0 + +/** Trace buffer entry */ typedef struct { -unsigned long event; -unsigned long timestamp_ns; -unsigned long x1; -unsigned long x2; -unsigned long x3; -unsigned long x4; -unsigned long x5; +uint64_t event; +uint64_t timestamp_ns; +uint64_t x1; +uint64_t x2; +uint64_t x3; +uint64_t x4; +uint64_t x5; } TraceRecord; enum { @@ -42,9 +53,9 @@ void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream, static bool write_header(FILE *fp) { TraceRecord header = { -.event = -1UL, /* max avoids conflicting with TraceEventIDs */ -.timestamp_ns = 0xf2b177cb0aa429b4, /* magic number */ -.x1 = 0, /* bump this version number if file format changes */ +.event = HEADER_EVENT_ID, +.timestamp_ns = HEADER_MAGIC, +.x1 = HEADER_VERSION, }; return fwrite(header, sizeof header, 1, fp) == 1; diff --git a/simpletrace.h b/simpletrace.h index 6a2b8d9..f81aa8e 100644 --- a/simpletrace.h +++ b/simpletrace.h @@ -10,6 +10,7 @@ #define SIMPLETRACE_H #includestdbool.h +#includestdint.h #includestdio.h typedef unsigned int TraceEventID; It would be useful to have : typedef uint64_t TraceEventID; This ensures that the maximum number of trace events available on both 32 and 64 bit builds is same. @@ -20,11 +21,11 @@ typedef struct { } TraceEvent; void trace0(TraceEventID event); -void trace1(TraceEventID event, unsigned long x1); -void trace2(TraceEventID event, unsigned long x1, unsigned long x2); -void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3); -void trace4(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4); -void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4, unsigned long x5); +void trace1(TraceEventID event, uint64_t x1); +void trace2(TraceEventID event, uint64_t x1, uint64_t x2); +void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3); +void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4); +void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5); void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)); void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)); void st_change_trace_event_state(const char *tname, bool tstate); diff --git a/simpletrace.py b/simpletrace.py index 979d911..fdf0eb5 100755 --- a/simpletrace.py +++ b/simpletrace.py @@ -17,7 +17,7 @@ header_event_id = 0x header_magic= 0xf2b177cb0aa429b4 header_version = 0 -trace_fmt = 'LLL' +trace_fmt = '=QQQ' trace_len = struct.calcsize(trace_fmt) event_re = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\)\s+([^]*)') diff --git a/tracetool b/tracetool index c5a5bdc..b78cd97 100755 --- a/tracetool +++ b/tracetool @@ -151,11 +151,11 @@ EOF simple_event_num=0 } -cast_args_to_ulong() +cast_args_to_uint64_t() { local arg for arg in $(get_argnames $1); do -echo -n (unsigned long)$arg +echo -n (uint64_t)$arg Tested this on a 32 bit host. It throws up some warnings, and we need : echo -n (uint64_t)(uintptr_t)$arg done } @@ -173,7 +173,7 @@ linetoh_simple() trace_args=$simple_event_num if [ $argc -gt 0 ] then -trace_args=$trace_args, $(cast_args_to_ulong $1) +trace_args=$trace_args, $(cast_args_to_uint64_t $1) fi catEOF -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing] More Trace events
This patch adds few more trace events for tracking IO and also to trace balloon event flagged via the monitor. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- balloon.c|2 ++ ioport.c |7 +++ trace-events |8 3 files changed, 17 insertions(+), 0 deletions(-) diff --git a/balloon.c b/balloon.c index 8e0b7f1..0021fef 100644 --- a/balloon.c +++ b/balloon.c @@ -29,6 +29,7 @@ #include cpu-common.h #include kvm.h #include balloon.h +#include trace.h static QEMUBalloonEvent *qemu_balloon_event; @@ -43,6 +44,7 @@ void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque) int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque) { if (qemu_balloon_event) { +trace_balloon_event(qemu_balloon_event_opaque, target); qemu_balloon_event(qemu_balloon_event_opaque, target, cb, opaque); return 1; } else { diff --git a/ioport.c b/ioport.c index 53dd87a..ec3dc65 100644 --- a/ioport.c +++ b/ioport.c @@ -26,6 +26,7 @@ */ #include ioport.h +#include trace.h /***/ /* IO Port */ @@ -195,18 +196,21 @@ void isa_unassign_ioport(pio_addr_t start, int length) void cpu_outb(pio_addr_t addr, uint8_t val) { LOG_IOPORT(outb: %04FMT_pioaddr %02PRIx8\n, addr, val); +trace_cpu_out(addr, val); ioport_write(0, addr, val); } void cpu_outw(pio_addr_t addr, uint16_t val) { LOG_IOPORT(outw: %04FMT_pioaddr %04PRIx16\n, addr, val); +trace_cpu_out(addr, val); ioport_write(1, addr, val); } void cpu_outl(pio_addr_t addr, uint32_t val) { LOG_IOPORT(outl: %04FMT_pioaddr %08PRIx32\n, addr, val); +trace_cpu_out(addr, val); ioport_write(2, addr, val); } @@ -214,6 +218,7 @@ uint8_t cpu_inb(pio_addr_t addr) { uint8_t val; val = ioport_read(0, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inb : %04FMT_pioaddr %02PRIx8\n, addr, val); return val; } @@ -222,6 +227,7 @@ uint16_t cpu_inw(pio_addr_t addr) { uint16_t val; val = ioport_read(1, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inw : %04FMT_pioaddr %04PRIx16\n, addr, val); return val; } @@ -230,6 +236,7 @@ uint32_t cpu_inl(pio_addr_t addr) { uint32_t val; val = ioport_read(2, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inl : %04FMT_pioaddr %08PRIx32\n, addr, val); return val; } diff --git a/trace-events b/trace-events index 80197b6..cade0b5 100644 --- a/trace-events +++ b/trace-events @@ -59,3 +59,11 @@ virtio_blk_handle_write(void *req, unsigned long sector, unsigned long nsectors) # posix-aio-compat.c paio_submit(void *acb, void *opaque, unsigned long sector_num, unsigned long nb_sectors, unsigned long type) acb %p opaque %p sector_num %lu nb_sectors %lu type %lu + +# ioport.c +cpu_in(unsigned int addr, unsigned int val) Addr %u Value %u +cpu_out(unsigned int addr, unsigned int val) Addr %u Value %u + +# balloon.c +# Since requests are raised via monitor, not many tracepoints are needed. +balloon_event(void *opaque, unsigned long addr) Opaque %p Addr %lu -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH v2] trace: Make trace record fields 64-bit
Explicitly use 64-bit fields in trace records so that timestamps and magic numbers work for 32-bit host builds. Changelog (from initial patch posted by Stefan): 1) TraceEventID is now uint64_t to take care of same number of tracepoints on both 32 and 64 bit builds. 2) Cast arguments to uintptr_t, and then to uint64_t to bypass warnings. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- simpletrace.c | 41 ++--- simpletrace.h | 13 +++-- simpletrace.py |2 +- tracetool |6 +++--- 4 files changed, 37 insertions(+), 25 deletions(-) diff --git a/simpletrace.c b/simpletrace.c index 954cc4e..27b0cab 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -9,18 +9,29 @@ */ #include stdlib.h +#include stdint.h #include stdio.h #include time.h #include trace.h +/** Trace file header event ID */ +#define HEADER_EVENT_ID (~(uint64_t)0) /* avoids conflicting with TraceEventIDs */ + +/** Trace file magic number */ +#define HEADER_MAGIC 0xf2b177cb0aa429b4ULL + +/** Trace file version number, bump if format changes */ +#define HEADER_VERSION 0 + +/** Trace buffer entry */ typedef struct { -unsigned long event; -unsigned long timestamp_ns; -unsigned long x1; -unsigned long x2; -unsigned long x3; -unsigned long x4; -unsigned long x5; +uint64_t event; +uint64_t timestamp_ns; +uint64_t x1; +uint64_t x2; +uint64_t x3; +uint64_t x4; +uint64_t x5; } TraceRecord; enum { @@ -42,9 +53,9 @@ void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream, static bool write_header(FILE *fp) { TraceRecord header = { -.event = -1UL, /* max avoids conflicting with TraceEventIDs */ -.timestamp_ns = 0xf2b177cb0aa429b4, /* magic number */ -.x1 = 0, /* bump this version number if file format changes */ +.event = HEADER_EVENT_ID, +.timestamp_ns = HEADER_MAGIC, +.x1 = HEADER_VERSION, }; return fwrite(header, sizeof header, 1, fp) == 1; @@ -160,27 +171,27 @@ void trace0(TraceEventID event) trace(event, 0, 0, 0, 0, 0); } -void trace1(TraceEventID event, unsigned long x1) +void trace1(TraceEventID event, uint64_t x1) { trace(event, x1, 0, 0, 0, 0); } -void trace2(TraceEventID event, unsigned long x1, unsigned long x2) +void trace2(TraceEventID event, uint64_t x1, uint64_t x2) { trace(event, x1, x2, 0, 0, 0); } -void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3) +void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3) { trace(event, x1, x2, x3, 0, 0); } -void trace4(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4) +void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4) { trace(event, x1, x2, x3, x4, 0); } -void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4, unsigned long x5) +void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5) { trace(event, x1, x2, x3, x4, x5); } diff --git a/simpletrace.h b/simpletrace.h index 6a2b8d9..00ca439 100644 --- a/simpletrace.h +++ b/simpletrace.h @@ -10,9 +10,10 @@ #define SIMPLETRACE_H #include stdbool.h +#include stdint.h #include stdio.h -typedef unsigned int TraceEventID; +typedef uint64_t TraceEventID; typedef struct { const char *tp_name; @@ -20,11 +21,11 @@ typedef struct { } TraceEvent; void trace0(TraceEventID event); -void trace1(TraceEventID event, unsigned long x1); -void trace2(TraceEventID event, unsigned long x1, unsigned long x2); -void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3); -void trace4(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4); -void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4, unsigned long x5); +void trace1(TraceEventID event, uint64_t x1); +void trace2(TraceEventID event, uint64_t x1, uint64_t x2); +void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3); +void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4); +void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5); void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)); void st_print_trace_events(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)); void st_change_trace_event_state(const char *tname, bool tstate); diff --git a/simpletrace.py b/simpletrace.py index 979d911..fdf0eb5 100755 --- a/simpletrace.py +++ b/simpletrace.py @@ -17,7 +17,7 @@ header_event_id = 0x header_magic= 0xf2b177cb0aa429b4 header_version = 0 -trace_fmt = 'LLL' +trace_fmt = '=QQQ' trace_len = struct.calcsize(trace_fmt) event_re
[Qemu-devel] [Tracing][PATCH 0/2] More Trace events
Set of patches to add trace-events for tracking IO and balloon events flagged via the monitor. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][PATCH 1/2] More Trace events
[PATCH 1/2] Trace events for tracking port IO Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- ioport.c |7 +++ trace-events |4 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index 53dd87a..ec3dc65 100644 --- a/ioport.c +++ b/ioport.c @@ -26,6 +26,7 @@ */ #include ioport.h +#include trace.h /***/ /* IO Port */ @@ -195,18 +196,21 @@ void isa_unassign_ioport(pio_addr_t start, int length) void cpu_outb(pio_addr_t addr, uint8_t val) { LOG_IOPORT(outb: %04FMT_pioaddr %02PRIx8\n, addr, val); +trace_cpu_out(addr, val); ioport_write(0, addr, val); } void cpu_outw(pio_addr_t addr, uint16_t val) { LOG_IOPORT(outw: %04FMT_pioaddr %04PRIx16\n, addr, val); +trace_cpu_out(addr, val); ioport_write(1, addr, val); } void cpu_outl(pio_addr_t addr, uint32_t val) { LOG_IOPORT(outl: %04FMT_pioaddr %08PRIx32\n, addr, val); +trace_cpu_out(addr, val); ioport_write(2, addr, val); } @@ -214,6 +218,7 @@ uint8_t cpu_inb(pio_addr_t addr) { uint8_t val; val = ioport_read(0, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inb : %04FMT_pioaddr %02PRIx8\n, addr, val); return val; } @@ -222,6 +227,7 @@ uint16_t cpu_inw(pio_addr_t addr) { uint16_t val; val = ioport_read(1, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inw : %04FMT_pioaddr %04PRIx16\n, addr, val); return val; } @@ -230,6 +236,7 @@ uint32_t cpu_inl(pio_addr_t addr) { uint32_t val; val = ioport_read(2, addr); +trace_cpu_in(addr, val); LOG_IOPORT(inl : %04FMT_pioaddr %08PRIx32\n, addr, val); return val; } diff --git a/trace-events b/trace-events index 80197b6..7dbd08f 100644 --- a/trace-events +++ b/trace-events @@ -59,3 +59,7 @@ virtio_blk_handle_write(void *req, unsigned long sector, unsigned long nsectors) # posix-aio-compat.c paio_submit(void *acb, void *opaque, unsigned long sector_num, unsigned long nb_sectors, unsigned long type) acb %p opaque %p sector_num %lu nb_sectors %lu type %lu + +# ioport.c +cpu_in(unsigned int addr, unsigned int val) addr %u value %u +cpu_out(unsigned int addr, unsigned int val) addr %u value %u -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing] Compilation failure
Hi Stefan, I think this needs to be resolved. CCtrace.o CCsimpletrace.o cc1: warnings being treated as errors /home/prerna/qemu-testing/git/qemu/simpletrace.c: In function ‘write_header’: /home/prerna/qemu-testing/git/qemu/simpletrace.c:46: error: integer constant is too large for ‘long’ type /home/prerna/qemu-testing/git/qemu/simpletrace.c:46: error: large integer implicitly truncated to unsigned type make: *** [simpletrace.o] Error 1 The error arises due to : TraceRecord header = { .event = -1UL, /* max avoids conflicting with TraceEventIDs */ .timestamp_ns = 0xf2b177cb0aa429b4, /* magic number */ error. Also, it would be better to #define the magic number to some macro, and use that instead of using the constant directly. Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][PATCH] Compilation fixes
Fix to ensure rebuild is properly triggered when switching trace backends using ./configure. Also, when using the 'ust' backend, check if the relevant headers are available at host. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- Makefile |4 ++-- configure | 20 +--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index 8831174..3bd41ce 100644 --- a/Makefile +++ b/Makefile @@ -132,10 +132,10 @@ bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS) iov.o: iov.c iov.h -trace.h: $(SRC_PATH)/trace-events +trace.h: $(SRC_PATH)/trace-events config-host.mak $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h $ $@, GEN $@) -trace.c: $(SRC_PATH)/trace-events +trace.c: $(SRC_PATH)/trace-events config-host.mak $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -c $ $@, GEN $@) trace.o: trace.c $(GENERATED_HEADERS) diff --git a/configure b/configure index fe1b027..ee9f1e3 100755 --- a/configure +++ b/configure @@ -2011,6 +2011,23 @@ if test $? -ne 0 ; then exit 1 fi +## +# For 'ust' backend, test if ust headers are present +if test $trace_backend = ust; then + cat $TMPC EOF +#include ust/tracepoint.h +#include ust/marker.h +int main(void) { return 0; } +EOF + if compile_prog ; then +LIBS=-lust $LIBS + else +echo ERROR: Trace backend 'ust' does not have relevant headers available +echoon the host. Pls choose a different backend. +exit 1 + fi +fi +## # End of CC checks # After here, no more $cc or $ld runs @@ -2392,9 +2409,6 @@ echo TRACE_BACKEND=$trace_backend $config_host_mak if test $trace_backend = simple; then echo CONFIG_SIMPLE_TRACE=y $config_host_mak fi -if test $trace_backend = ust; then - LIBS=-lust $LIBS -fi # Set the appropriate trace file. if test $trace_backend = simple; then trace_file=\$trace_file-%u\ -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][PATCH v2] Add options to specify trace file name at startup and runtime.
This patch adds an optional command line switch '-trace' to specify the filename to write traces to, when qemu starts. Eg, If compiled with the 'simple' trace backend, [t...@system]$ qemu -trace FILENAME IMAGE Allows the binary traces to be written to FILENAME instead of the option set at config-time. Also, this adds monitor sub-command 'set' to trace-file commands to dynamically change trace log file at runtime. Eg, (qemu)trace-file set FILENAME This allows one to set trace outputs to FILENAME from the default specified at startup. Changelog from v1 : - Cleanups. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c |6 ++ qemu-monitor.hx |6 +++--- qemu-options.hx | 11 +++ simpletrace.c | 41 +++-- tracetool |1 + vl.c| 20 6 files changed, 72 insertions(+), 13 deletions(-) diff --git a/monitor.c b/monitor.c index 1e35a6b..1d6c4c0 100644 --- a/monitor.c +++ b/monitor.c @@ -544,6 +544,7 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) static void do_trace_file(Monitor *mon, const QDict *qdict) { const char *op = qdict_get_try_str(qdict, op); +const char *arg = qdict_get_try_str(qdict, arg); if (!op) { st_print_trace_file_status((FILE *)mon, monitor_fprintf); @@ -553,8 +554,13 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) st_set_trace_file_enabled(false); } else if (!strcmp(op, flush)) { st_flush_trace_buffer(); +} else if (!strcmp(op, set)) { +if (arg) { +st_set_trace_file(arg); +} } else { monitor_printf(mon, unexpected argument \%s\\n, op); +help_cmd(mon, trace-file); } } #endif diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 25887bd..adfaf2b 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -276,9 +276,9 @@ ETEXI { .name = trace-file, -.args_type = op:s?, -.params = op [on|off|flush], -.help = open, close, or flush trace file, +.args_type = op:s?,arg:F?, +.params = on|off|flush|set [arg], +.help = open, close, or flush trace file, or set a new file name, .mhandler.cmd = do_trace_file, }, diff --git a/qemu-options.hx b/qemu-options.hx index d1d2272..aea9675 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2223,6 +2223,17 @@ Normally QEMU loads a configuration file from @var{sysconfdir}/qemu.conf and @var{sysconfdir}/targ...@var{arch}.conf on startup. The @code{-nodefconfig} option will prevent QEMU from loading these configuration files at startup. ETEXI +#ifdef CONFIG_SIMPLE_TRACE +DEF(trace, HAS_ARG, QEMU_OPTION_trace, +-trace\n +Specify a trace file to log traces to\n, +QEMU_ARCH_ALL) +STEXI +...@item -trace +...@findex -trace +Specify a trace file to log output traces to. +ETEXI +#endif HXCOMM This is the last statement. Insert new options before this line! STEXI diff --git a/simpletrace.c b/simpletrace.c index 71110b3..19855f4 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -20,25 +20,46 @@ enum { static TraceRecord trace_buf[TRACE_BUF_LEN]; static unsigned int trace_idx; static FILE *trace_fp; -static bool trace_file_enabled = true; +static char *trace_file_name = NULL; +static bool trace_file_enabled = false; void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { -stream_printf(stream, Trace file \ CONFIG_TRACE_FILE \ %s.\n, - getpid(), trace_file_enabled ? on : off); +stream_printf(stream, Trace file \%s\ %s.\n, + trace_file_name, trace_file_enabled ? on : off); } -static bool open_trace_file(void) +static inline bool open_trace_file(void) { -char *filename; +trace_fp = fopen(trace_file_name, w); +return trace_fp != NULL; +} + +/** + * set_trace_file : To set the name of a trace file. + * @file : pointer to the name to be set. + * If NULL, set to the default name-pid set at config time. + */ +bool st_set_trace_file(const char *file) +{ +st_set_trace_file_enabled(false); -if (asprintf(filename, CONFIG_TRACE_FILE, getpid()) 0) { -return false; +free(trace_file_name); + +if (!file) { +if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()) 0) { +trace_file_name = NULL; + return false; +} +} else { +if (asprintf(trace_file_name, %s, file) 0) { +trace_file_name = NULL; +return false; +} } -trace_fp = fopen(filename, w); -free(filename); -return trace_fp != NULL; +st_set_trace_file_enabled(true); +return true; } static void flush_trace_file(void) diff --git a/tracetool b/tracetool index ac832af..5b979f5 100755 --- a/tracetool +++ b/tracetool @@ -158,6 +158,7 @@ void
Re: [Qemu-devel] [Tracing][PATCH] Add options to specify trace file name at startup and runtime.
On 08/03/2010 07:45 PM, Stefan Hajnoczi wrote: On Tue, Aug 3, 2010 at 6:37 AM, Prerna Saxenapre...@linux.vnet.ibm.com wrote: This patch adds an optional command line switch '-trace' to specify the filename to write traces to, when qemu starts. Eg, If compiled with the 'simple' trace backend, [t...@system]$ qemu -trace FILENAME IMAGE Allows the binary traces to be written to FILENAME instead of the option set at config-time. Also, this adds monitor sub-command 'set' to trace-file commands to dynamically change trace log file at runtime. Eg, (qemu)trace-file set FILENAME This allows one to set trace outputs to FILENAME from the default specified at startup. Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com --- monitor.c |6 ++ qemu-monitor.hx |6 +++--- qemu-options.hx | 11 +++ simpletrace.c | 41 - tracetool |1 + vl.c| 22 ++ 6 files changed, 75 insertions(+), 12 deletions(-) Looks like a good approach. I checked that this also handles the case where trace events fire before the command-line option is handled and the trace filename is set. diff --git a/monitor.c b/monitor.c index 1e35a6b..8e2a3a6 100644 --- a/monitor.c +++ b/monitor.c @@ -544,6 +544,7 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) static void do_trace_file(Monitor *mon, const QDict *qdict) { const char *op = qdict_get_try_str(qdict, op); +const char *arg = qdict_get_try_str(qdict, arg); if (!op) { st_print_trace_file_status((FILE *)mon,monitor_fprintf); @@ -553,8 +554,13 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) st_set_trace_file_enabled(false); } else if (!strcmp(op, flush)) { st_flush_trace_buffer(); +} else if (!strcmp(op, set)) { +if (arg) { +st_set_trace_file(arg); +} } else { monitor_printf(mon, unexpected argument \%s\\n, op); +monitor_printf(mon, Options are: [on | off| flush| set FILENAME]); Can we use help_cmd() here to print the help text and avoid duplicating the options? Agree, changed in v2. } } #endif ... ... static bool open_trace_file(void) { -char *filename; +trace_fp = fopen(trace_file_name, w); +return trace_fp != NULL; +} This could be inlined now. The function is only used by one caller. Done in v2. -if (asprintf(filename, CONFIG_TRACE_FILE, getpid()) 0) { -return false; +/** + * set_trace_file : To set the name of a trace file. + * @file : pointer to the name to be set. + * If NULL, set to the default name-pid set at config time. + */ +bool st_set_trace_file(const char *file) +{ +if (trace_file_enabled) { +st_set_trace_file_enabled(false); } No need for an if statement. If trace_file_enabled is already false, then st_set_trace_file_enabled() is a nop. Agree this is unnecessary. Changed in v2. -trace_fp = fopen(filename, w); -free(filename); -return trace_fp != NULL; +if (trace_file_name) { +free(trace_file_name); +} No need for an if statement. free(NULL) is a nop. Changed in v2. + +if (!file) { +if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()) 0) { + return false; +} +} else { +if (asprintf(trace_file_name, %s, file) 0) { +return false; +} +} When asprintf() fails, the value of the string pointer is undefined according to the man page. That can result in double frees. It would be safest to set trace_file_name = NULL on failure. Done. ... ... @@ -2590,6 +2597,12 @@ int main(int argc, char **argv, char **envp) } xen_mode = XEN_ATTACH; break; +#ifdef CONFIG_SIMPLE_TRACE +case QEMU_OPTION_trace: +trace_file = (char *) qemu_malloc(strlen(optarg) + 1); +strcpy(trace_file, optarg); +break; +#endif Malloc isn't necessary, just hold the optarg pointer like gdbstub_dev and other string options do. It wouldnt be corect to use optarg directly here. If this optional argument is not specified, st_set_file_name() is called with a NULL argument, and the filename defaults to config-specified name. (This is how gdbstub_dev works too. The optional argument is copied to gdbstub_dev if provided.) ... Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [Tracing][PATCH v3] Add options to specify trace file name at startup and runtime.
Stefanha, Malc, Thanks for suggestions. Resending the patch after clean-up. This patch adds an optional command line switch '-trace' to specify the filename to write traces to, when qemu starts. Eg, If compiled with the 'simple' trace backend, [t...@system]$ qemu -trace FILENAME IMAGE Allows the binary traces to be written to FILENAME instead of the option set at config-time. Also, this adds monitor sub-command 'set' to trace-file commands to dynamically change trace log file at runtime. Eg, (qemu)trace-file set FILENAME This allows one to set trace outputs to FILENAME from the default specified at startup. Changelog from v2 : - Cleanups. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c |6 ++ qemu-monitor.hx |6 +++--- qemu-options.hx | 11 +++ simpletrace.c | 41 +++-- tracetool |1 + vl.c| 18 ++ 6 files changed, 70 insertions(+), 13 deletions(-) diff --git a/monitor.c b/monitor.c index 1e35a6b..1d6c4c0 100644 --- a/monitor.c +++ b/monitor.c @@ -544,6 +544,7 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) static void do_trace_file(Monitor *mon, const QDict *qdict) { const char *op = qdict_get_try_str(qdict, op); +const char *arg = qdict_get_try_str(qdict, arg); if (!op) { st_print_trace_file_status((FILE *)mon, monitor_fprintf); @@ -553,8 +554,13 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) st_set_trace_file_enabled(false); } else if (!strcmp(op, flush)) { st_flush_trace_buffer(); +} else if (!strcmp(op, set)) { +if (arg) { +st_set_trace_file(arg); +} } else { monitor_printf(mon, unexpected argument \%s\\n, op); +help_cmd(mon, trace-file); } } #endif diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 25887bd..adfaf2b 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -276,9 +276,9 @@ ETEXI { .name = trace-file, -.args_type = op:s?, -.params = op [on|off|flush], -.help = open, close, or flush trace file, +.args_type = op:s?,arg:F?, +.params = on|off|flush|set [arg], +.help = open, close, or flush trace file, or set a new file name, .mhandler.cmd = do_trace_file, }, diff --git a/qemu-options.hx b/qemu-options.hx index d1d2272..aea9675 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2223,6 +2223,17 @@ Normally QEMU loads a configuration file from @var{sysconfdir}/qemu.conf and @var{sysconfdir}/targ...@var{arch}.conf on startup. The @code{-nodefconfig} option will prevent QEMU from loading these configuration files at startup. ETEXI +#ifdef CONFIG_SIMPLE_TRACE +DEF(trace, HAS_ARG, QEMU_OPTION_trace, +-trace\n +Specify a trace file to log traces to\n, +QEMU_ARCH_ALL) +STEXI +...@item -trace +...@findex -trace +Specify a trace file to log output traces to. +ETEXI +#endif HXCOMM This is the last statement. Insert new options before this line! STEXI diff --git a/simpletrace.c b/simpletrace.c index 71110b3..860bcf1 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -20,25 +20,46 @@ enum { static TraceRecord trace_buf[TRACE_BUF_LEN]; static unsigned int trace_idx; static FILE *trace_fp; -static bool trace_file_enabled = true; +static char *trace_file_name = NULL; +static bool trace_file_enabled = false; void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { -stream_printf(stream, Trace file \ CONFIG_TRACE_FILE \ %s.\n, - getpid(), trace_file_enabled ? on : off); +stream_printf(stream, Trace file \%s\ %s.\n, + trace_file_name, trace_file_enabled ? on : off); } -static bool open_trace_file(void) +static inline bool open_trace_file(void) { -char *filename; +trace_fp = fopen(trace_file_name, w); +return trace_fp != NULL; +} + +/** + * set_trace_file : To set the name of a trace file. + * @file : pointer to the name to be set. + * If NULL, set to the default name-pid set at config time. + */ +bool st_set_trace_file(const char *file) +{ +st_set_trace_file_enabled(false); -if (asprintf(filename, CONFIG_TRACE_FILE, getpid()) 0) { -return false; +free(trace_file_name); + +if (!file) { +if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()) 0) { +trace_file_name = NULL; +return false; +} +} else { +if (asprintf(trace_file_name, %s, file) 0) { +trace_file_name = NULL; +return false; +} } -trace_fp = fopen(filename, w); -free(filename); -return trace_fp != NULL; +st_set_trace_file_enabled(true); +return true; } static void flush_trace_file(void) diff --git a/tracetool b/tracetool index ac832af..5b979f5
[Qemu-devel] [Tracing][PATCH] Add options to specify trace file name at startup and runtime.
This patch adds an optional command line switch '-trace' to specify the filename to write traces to, when qemu starts. Eg, If compiled with the 'simple' trace backend, [t...@system]$ qemu -trace FILENAME IMAGE Allows the binary traces to be written to FILENAME instead of the option set at config-time. Also, this adds monitor sub-command 'set' to trace-file commands to dynamically change trace log file at runtime. Eg, (qemu)trace-file set FILENAME This allows one to set trace outputs to FILENAME from the default specified at startup. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c |6 ++ qemu-monitor.hx |6 +++--- qemu-options.hx | 11 +++ simpletrace.c | 41 - tracetool |1 + vl.c| 22 ++ 6 files changed, 75 insertions(+), 12 deletions(-) diff --git a/monitor.c b/monitor.c index 1e35a6b..8e2a3a6 100644 --- a/monitor.c +++ b/monitor.c @@ -544,6 +544,7 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) static void do_trace_file(Monitor *mon, const QDict *qdict) { const char *op = qdict_get_try_str(qdict, op); +const char *arg = qdict_get_try_str(qdict, arg); if (!op) { st_print_trace_file_status((FILE *)mon, monitor_fprintf); @@ -553,8 +554,13 @@ static void do_trace_file(Monitor *mon, const QDict *qdict) st_set_trace_file_enabled(false); } else if (!strcmp(op, flush)) { st_flush_trace_buffer(); +} else if (!strcmp(op, set)) { +if (arg) { +st_set_trace_file(arg); +} } else { monitor_printf(mon, unexpected argument \%s\\n, op); +monitor_printf(mon, Options are: [on | off| flush| set FILENAME]); } } #endif diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 25887bd..adfaf2b 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -276,9 +276,9 @@ ETEXI { .name = trace-file, -.args_type = op:s?, -.params = op [on|off|flush], -.help = open, close, or flush trace file, +.args_type = op:s?,arg:F?, +.params = on|off|flush|set [arg], +.help = open, close, or flush trace file, or set a new file name, .mhandler.cmd = do_trace_file, }, diff --git a/qemu-options.hx b/qemu-options.hx index d1d2272..aea9675 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2223,6 +2223,17 @@ Normally QEMU loads a configuration file from @var{sysconfdir}/qemu.conf and @var{sysconfdir}/targ...@var{arch}.conf on startup. The @code{-nodefconfig} option will prevent QEMU from loading these configuration files at startup. ETEXI +#ifdef CONFIG_SIMPLE_TRACE +DEF(trace, HAS_ARG, QEMU_OPTION_trace, +-trace\n +Specify a trace file to log traces to\n, +QEMU_ARCH_ALL) +STEXI +...@item -trace +...@findex -trace +Specify a trace file to log output traces to. +ETEXI +#endif HXCOMM This is the last statement. Insert new options before this line! STEXI diff --git a/simpletrace.c b/simpletrace.c index 71110b3..5812fe9 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -20,25 +20,48 @@ enum { static TraceRecord trace_buf[TRACE_BUF_LEN]; static unsigned int trace_idx; static FILE *trace_fp; -static bool trace_file_enabled = true; +static char *trace_file_name = NULL; +static bool trace_file_enabled = false; void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...)) { -stream_printf(stream, Trace file \ CONFIG_TRACE_FILE \ %s.\n, - getpid(), trace_file_enabled ? on : off); +stream_printf(stream, Trace file \%s\ %s.\n, + trace_file_name, trace_file_enabled ? on : off); } static bool open_trace_file(void) { -char *filename; +trace_fp = fopen(trace_file_name, w); +return trace_fp != NULL; +} -if (asprintf(filename, CONFIG_TRACE_FILE, getpid()) 0) { -return false; +/** + * set_trace_file : To set the name of a trace file. + * @file : pointer to the name to be set. + * If NULL, set to the default name-pid set at config time. + */ +bool st_set_trace_file(const char *file) +{ +if (trace_file_enabled) { +st_set_trace_file_enabled(false); } -trace_fp = fopen(filename, w); -free(filename); -return trace_fp != NULL; +if (trace_file_name) { +free(trace_file_name); +} + +if (!file) { +if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()) 0) { + return false; +} +} else { +if (asprintf(trace_file_name, %s, file) 0) { +return false; +} +} + +st_set_trace_file_enabled(true); +return true; } static void flush_trace_file(void) diff --git a/tracetool b/tracetool index ac832af..5b979f5 100755 --- a/tracetool +++ b/tracetool @@ -158,6 +158,7 @@ void st_print_trace_events(FILE *stream, int
[Qemu-devel] [Tracing][PATCH] Allow bulk enabling of trace events at compile time.
[PATCH] For 'simple' trace backend, allow bulk enabling/disabling of trace events at compile time. Trace events that are preceded by 'disable' keyword are compiled in, but turned off by default. These can individually be turned on using the monitor. All other trace events are enabled by default. TODO : This could be enhanced when the trace-event namespace is partitioned into a group and an ID within that group. In such a case, marking a group as enabled would automatically enable all trace-events listed under it. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- trace-events |3 +++ tracetool| 36 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/trace-events b/trace-events index a533414..cb5ef00 100644 --- a/trace-events +++ b/trace-events @@ -17,6 +17,9 @@ # Example: qemu_malloc(size_t size) size %zu # # The disable keyword will build without the trace event. +# In case of 'simple' trace backend, it will allow the trace event to be +# compiled, but this would be turned off by default. It can be toggled on via +# the monitor. # # The name must be a valid as a C function name. # diff --git a/tracetool b/tracetool index b7a0499..98d23fb 100755 --- a/tracetool +++ b/tracetool @@ -73,6 +73,20 @@ get_fmt() echo $fmt } +# Get the state of a trace event +get_state() +{ +local str disable state +str=$(get_name $1) +disable=${str##disable } +if [ $disable = $str ] ; then +state=1 +else +state=0 +fi +echo $state +} + linetoh_begin_nop() { return @@ -155,12 +169,16 @@ cast_args_to_ulong() linetoh_simple() { -local name args argc ulong_args +local name args argc ulong_args state name=$(get_name $1) args=$(get_args $1) argc=$(get_argc $1) ulong_args=$(cast_args_to_ulong $1) +state=$(get_state $1) +if [ $state = 0 ]; then +name=${name##disable } +fi cat EOF static inline void trace_$name($args) { trace$argc($simple_event_num, $ulong_args); @@ -191,10 +209,14 @@ EOF linetoc_simple() { -local name +local name state name=$(get_name $1) +state=$(get_state $1) +if [ $state = 0 ] ; then +name=${name##disable } +fi cat EOF -{.tp_name = $name, .state=0}, +{.tp_name = $name, .state=$state}, EOF simple_event_num=$((simple_event_num + 1)) } @@ -305,7 +327,13 @@ convert() disable=${str%%disable *} echo if test -z $disable; then -lineto$1_nop ${str##disable } +# Pass the disabled state as an arg to lineto$1_simple(). +# For all other cases, call lineto$1_nop() +if [ $backend = simple ]; then +$process_line $str +else +lineto$1_nop ${str##disable } +fi else $process_line $str fi -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [RFC v5[PATCH][Tracing] Fix build errors for target i386-linux-user
[PATCH] Separate monitor command handler interfaces and tracing internals. Changelog from v3: - cleanup ( removed unnecessary references to 'rec' ) Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 23 +++ simpletrace.c | 50 -- tracetool |7 +++ 3 files changed, 58 insertions(+), 22 deletions(-) diff --git a/monitor.c b/monitor.c index 433a3ec..1f89938 100644 --- a/monitor.c +++ b/monitor.c @@ -540,6 +540,29 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) bool new_state = qdict_get_bool(qdict, option); change_trace_event_state(tp_name, new_state); } + +void do_info_trace(Monitor *mon) +{ +unsigned int i; +char rec[MAX_TRACE_STR_LEN]; +unsigned int trace_idx = get_trace_idx(); + +for (i = 0; i trace_idx ; i++) { +if (format_trace_string(i, rec)) { +monitor_printf(mon, rec); +} +} +} + +void do_info_all_trace_events(Monitor *mon) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENTS; i++) { +monitor_printf(mon, %s [Event ID %u] : state %u\n, +trace_list[i].tp_name, i, trace_list[i].state); +} +} #endif static void user_monitor_complete(void *opaque, QObject *ret_data) diff --git a/simpletrace.c b/simpletrace.c index 57c41fc..9e3b46c 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -1,8 +1,8 @@ #include stdlib.h #include stdio.h -#include monitor.h #include trace.h +/* Remember to update MAX_TRACE_STR_LEN when changing TraceRecord structure */ typedef struct { unsigned long event; unsigned long x1; @@ -69,27 +69,6 @@ void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned lon trace(event, x1, x2, x3, x4, x5); } -void do_info_trace(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i trace_idx ; i++) { -monitor_printf(mon, Event %lu : %lx %lx %lx %lx %lx\n, - trace_buf[i].event, trace_buf[i].x1, trace_buf[i].x2, -trace_buf[i].x3, trace_buf[i].x4, trace_buf[i].x5); -} -} - -void do_info_all_trace_events(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i NR_TRACE_EVENTS; i++) { -monitor_printf(mon, %s [Event ID %u] : state %u\n, -trace_list[i].tp_name, i, trace_list[i].state); -} -} - static TraceEvent* find_trace_event_by_name(const char *tname) { unsigned int i; @@ -115,3 +94,30 @@ void change_trace_event_state(const char *tname, bool tstate) tp-state = tstate; } } + +/** + * Return the current trace index. + * + */ +unsigned int get_trace_idx(void) +{ +return trace_idx; +} + +/** + * returns formatted TraceRecord at a given index in the trace buffer. + * FORMAT : Event %lu : %lx %lx %lx %lx %lx\n + * + * @idx : index in the buffer for which trace record is returned. + * @trace_str : output string passed. + */ +char* format_trace_string(unsigned int idx, char trace_str[]) +{ +if (idx = TRACE_BUF_LEN) { +return NULL; +} +sprintf(trace_str[0], Event %lu : %lx %lx %lx %lx %lx\n, + trace_buf[idx].event, trace_buf[idx].x1, trace_buf[idx].x2, + trace_buf[idx].x3, trace_buf[idx].x4, trace_buf[idx].x5); +return trace_str[0]; +} diff --git a/tracetool b/tracetool index c77280d..b7a0499 100755 --- a/tracetool +++ b/tracetool @@ -125,6 +125,11 @@ typedef struct { bool state; } TraceEvent; +/* Max size of trace string to be displayed via the monitor. + * Format : Event %lu : %lx %lx %lx %lx %lx\n + */ +#define MAX_TRACE_STR_LEN 100 + void trace1(TraceEventID event, unsigned long x1); void trace2(TraceEventID event, unsigned long x1, unsigned long x2); void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3); @@ -133,6 +138,8 @@ void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned lon void do_info_trace(Monitor *mon); void do_info_all_trace_events(Monitor *mon); void change_trace_event_state(const char *tname, bool tstate); +unsigned int get_trace_idx(void); +char* format_trace_string(unsigned int idx, char *trace_str); EOF simple_event_num=0 -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] [PATCH][Tracing] Specify trace file name
[PATCH] Allow users to specify a file for trace-outputs at configuration. Also, allow trace files to be annotated by pid so each qemu instance has unique traces. The trace file name can be passed as a config option: --trace-file=/path/to/file (Default : /tmp/trace ) At runtime, the pid of the qemu process is appended to the filename so that mutiple qemu instances do not have overlapping logs. Eg : /tmp/trace-1234 for qemu launched with pid 1234. I have yet to test this on windows. getpid() is used at many places in code(including vnc.c), so I'm hoping this would be okay too. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- configure | 20 simpletrace.c | 13 - tracetool |1 + vl.c |8 4 files changed, 41 insertions(+), 1 deletions(-) diff --git a/configure b/configure index 02bf602..18cb6ab 100755 --- a/configure +++ b/configure @@ -313,6 +313,7 @@ check_utests=no user_pie=no zero_malloc= trace_backend=nop +trace_file= # OS specific if check_define __linux__ ; then @@ -517,6 +518,8 @@ for opt do ;; --trace-backend=*) trace_backend=$optarg ;; + --trace-file=*) trace_file=$optarg + ;; --enable-gprof) gprof=yes ;; --static) @@ -876,6 +879,9 @@ echo --disable-docs disable documentation build echo --disable-vhost-net disable vhost-net acceleration support echo --enable-vhost-net enable vhost-net acceleration support echo --trace-backend=BTrace backend nop simple ust +echo --trace-file=NAMEFull PATH,NAME of file to store traces +echoDefault:/tmp/trace-pid +echoDefault:trace-pid on Windows echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -2132,6 +2138,7 @@ echo fdatasync $fdatasync echo uuid support $uuid echo vhost-net support $vhost_net echo Trace backend $trace_backend +echo Trace Output File $trace_file-pid if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -2387,6 +2394,19 @@ fi if test $trace_backend = ust; then LIBS=-lust $LIBS fi +# Set the appropriate trace file. +if test $trace_backend = simple; then + if test $trace_file = ; then +if test $mingw32 = yes ; then + trace_file=\trace-%u\ +else + trace_file=\/tmp/trace-%u\ +fi + else +trace_file=\$trace_file-%u\ + fi +fi +echo CONFIG_TRACE_FILE=$trace_file $config_host_mak echo TOOLS=$tools $config_host_mak echo ROMS=$roms $config_host_mak echo MAKE=$make $config_host_mak diff --git a/simpletrace.c b/simpletrace.c index 57c41fc..4f3228f 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -20,6 +20,16 @@ static TraceRecord trace_buf[TRACE_BUF_LEN]; static unsigned int trace_idx; static FILE *trace_fp; +char* trace_file_name; + +/** + * Initialize trace file name. + */ +int init_trace_file(void) +{ + return asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid()); +} + static void trace(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4, unsigned long x5) { @@ -40,7 +50,7 @@ static void trace(TraceEventID event, unsigned long x1, trace_idx = 0; if (!trace_fp) { -trace_fp = fopen(/tmp/trace.log, w); +trace_fp = fopen(trace_file_name, w); } if (trace_fp) { size_t result = fwrite(trace_buf, sizeof trace_buf, 1, trace_fp); @@ -78,6 +88,7 @@ void do_info_trace(Monitor *mon) trace_buf[i].event, trace_buf[i].x1, trace_buf[i].x2, trace_buf[i].x3, trace_buf[i].x4, trace_buf[i].x5); } +monitor_printf(mon, Trace output logged at %s, trace_file_name); } void do_info_all_trace_events(Monitor *mon) diff --git a/tracetool b/tracetool index c77280d..05ece45 100755 --- a/tracetool +++ b/tracetool @@ -125,6 +125,7 @@ typedef struct { bool state; } TraceEvent; +int init_trace_file(void); void trace1(TraceEventID event, unsigned long x1); void trace2(TraceEventID event, unsigned long x1, unsigned long x2); void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3); diff --git a/vl.c b/vl.c index 920717a..adc28ef 100644 --- a/vl.c +++ b/vl.c @@ -95,6 +95,10 @@ extern int madvise(caddr_t, size_t, int); #include windows.h #endif +#ifdef CONFIG_SIMPLE_TRACE +#include trace.h +#endif + #ifdef CONFIG_SDL #if defined(__APPLE__) || defined(main) #include SDL.h @@ -2758,6 +2762,10 @@ int main(int argc, char **argv, char **envp) exit(1); } +/* Init tracing, if configured */ +#ifdef CONFIG_SIMPLE_TRACE +init_trace_file(); +#endif /* init the bluetooth world */ if (foreach_device_config(DEV_BT, bt_parse)) exit(1); -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems
[Qemu-devel] [RFC v4][PATCH][Tracing] Fix build errors for target i386-linux-user
[PATCH] Separate monitor command handler interfaces and tracing internals. Changelog from v3 : 1. Cleanups. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com --- monitor.c | 23 +++ simpletrace.c | 52 ++-- tracetool |7 +++ 3 files changed, 60 insertions(+), 22 deletions(-) diff --git a/monitor.c b/monitor.c index 433a3ec..1f89938 100644 --- a/monitor.c +++ b/monitor.c @@ -540,6 +540,29 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) bool new_state = qdict_get_bool(qdict, option); change_trace_event_state(tp_name, new_state); } + +void do_info_trace(Monitor *mon) +{ +unsigned int i; +char rec[MAX_TRACE_STR_LEN]; +unsigned int trace_idx = get_trace_idx(); + +for (i = 0; i trace_idx ; i++) { +if (format_trace_string(i, rec)) { +monitor_printf(mon, rec); +} +} +} + +void do_info_all_trace_events(Monitor *mon) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENTS; i++) { +monitor_printf(mon, %s [Event ID %u] : state %u\n, +trace_list[i].tp_name, i, trace_list[i].state); +} +} #endif static void user_monitor_complete(void *opaque, QObject *ret_data) diff --git a/simpletrace.c b/simpletrace.c index 57c41fc..78507ec 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -1,8 +1,8 @@ #include stdlib.h #include stdio.h -#include monitor.h #include trace.h +/* Remember to update MAX_TRACE_STR_LEN when changing TraceRecord structure */ typedef struct { unsigned long event; unsigned long x1; @@ -69,27 +69,6 @@ void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned lon trace(event, x1, x2, x3, x4, x5); } -void do_info_trace(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i trace_idx ; i++) { -monitor_printf(mon, Event %lu : %lx %lx %lx %lx %lx\n, - trace_buf[i].event, trace_buf[i].x1, trace_buf[i].x2, -trace_buf[i].x3, trace_buf[i].x4, trace_buf[i].x5); -} -} - -void do_info_all_trace_events(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i NR_TRACE_EVENTS; i++) { -monitor_printf(mon, %s [Event ID %u] : state %u\n, -trace_list[i].tp_name, i, trace_list[i].state); -} -} - static TraceEvent* find_trace_event_by_name(const char *tname) { unsigned int i; @@ -115,3 +94,32 @@ void change_trace_event_state(const char *tname, bool tstate) tp-state = tstate; } } + +/** + * Return the current trace index. + * + */ +unsigned int get_trace_idx(void) +{ +return trace_idx; +} + +/** + * returns formatted TraceRecord at a given index in the trace buffer. + * FORMAT : Event %lu : %lx %lx %lx %lx %lx\n + * + * @idx : index in the buffer for which trace record is returned. + * @trace_str : output string passed. + */ +char* format_trace_string(unsigned int idx, char trace_str[]) +{ +TraceRecord rec; +if (idx = TRACE_BUF_LEN) { +return NULL; +} +rec = trace_buf[idx]; +sprintf(trace_str[0], Event %lu : %lx %lx %lx %lx %lx\n, + trace_buf[idx].event, trace_buf[idx].x1, trace_buf[idx].x2, + trace_buf[idx].x3, trace_buf[idx].x4, trace_buf[idx].x5); +return trace_str[0]; +} diff --git a/tracetool b/tracetool index c77280d..b7a0499 100755 --- a/tracetool +++ b/tracetool @@ -125,6 +125,11 @@ typedef struct { bool state; } TraceEvent; +/* Max size of trace string to be displayed via the monitor. + * Format : Event %lu : %lx %lx %lx %lx %lx\n + */ +#define MAX_TRACE_STR_LEN 100 + void trace1(TraceEventID event, unsigned long x1); void trace2(TraceEventID event, unsigned long x1, unsigned long x2); void trace3(TraceEventID event, unsigned long x1, unsigned long x2, unsigned long x3); @@ -133,6 +138,8 @@ void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned lon void do_info_trace(Monitor *mon); void do_info_all_trace_events(Monitor *mon); void change_trace_event_state(const char *tname, bool tstate); +unsigned int get_trace_idx(void); +char* format_trace_string(unsigned int idx, char *trace_str); EOF simple_event_num=0 -- 1.6.2.5 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
[Qemu-devel] Re: [RFC v3][PATCH][Tracing] Fix build errors for target i386-linux-user
On 07/08/2010 07:04 PM, Stefan Hajnoczi wrote: On Thu, Jul 08, 2010 at 04:50:52PM +0530, Prerna Saxena wrote: On 07/08/2010 02:50 PM, Stefan Hajnoczi wrote: On Thu, Jul 08, 2010 at 10:58:58AM +0530, Prerna Saxena wrote: [PATCH] Separate monitor command handler interfaces and tracing internals. Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com --- monitor.c | 23 +++ simpletrace.c | 51 +-- tracetool |7 +++ 3 files changed, 59 insertions(+), 22 deletions(-) diff --git a/monitor.c b/monitor.c index 433a3ec..1f89938 100644 --- a/monitor.c +++ b/monitor.c @@ -540,6 +540,29 @@ static void do_change_trace_event_state(Monitor *mon, const QDict *qdict) bool new_state = qdict_get_bool(qdict, option); change_trace_event_state(tp_name, new_state); } + +void do_info_trace(Monitor *mon) +{ +unsigned int i; +char rec[MAX_TRACE_STR_LEN]; +unsigned int trace_idx = get_trace_idx(); + +for (i = 0; i trace_idx ; i++) { +if (format_trace_string(i, rec)) { +monitor_printf(mon, rec); +} +} +} + +void do_info_all_trace_events(Monitor *mon) +{ +unsigned int i; + +for (i = 0; i NR_TRACE_EVENTS; i++) { +monitor_printf(mon, %s [Event ID %u] : state %u\n, +trace_list[i].tp_name, i, trace_list[i].state); +} +} #endif static void user_monitor_complete(void *opaque, QObject *ret_data) diff --git a/simpletrace.c b/simpletrace.c index 57c41fc..c7b1e7e 100644 --- a/simpletrace.c +++ b/simpletrace.c @@ -1,8 +1,8 @@ #includestdlib.h #includestdio.h -#include monitor.h #include trace.h +/* Remember to update TRACE_REC_SIZE when changing TraceRecord structure */ I can't see TRACE_REC_SIZE anywhere else in this patch. Oops. This comment must go. The connotation was for MAX_TRACE_STR_LEN to be large enough to hold the formatted string, but I'm not sure if there is a way to test that. Done in v4. typedef struct { unsigned long event; unsigned long x1; @@ -69,27 +69,6 @@ void trace5(TraceEventID event, unsigned long x1, unsigned long x2, unsigned lon trace(event, x1, x2, x3, x4, x5); } -void do_info_trace(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i trace_idx ; i++) { -monitor_printf(mon, Event %lu : %lx %lx %lx %lx %lx\n, - trace_buf[i].event, trace_buf[i].x1, trace_buf[i].x2, -trace_buf[i].x3, trace_buf[i].x4, trace_buf[i].x5); -} -} - -void do_info_all_trace_events(Monitor *mon) -{ -unsigned int i; - -for (i = 0; i NR_TRACE_EVENTS; i++) { -monitor_printf(mon, %s [Event ID %u] : state %u\n, -trace_list[i].tp_name, i, trace_list[i].state); -} -} - static TraceEvent* find_trace_event_by_name(const char *tname) { unsigned int i; @@ -115,3 +94,31 @@ void change_trace_event_state(const char *tname, bool tstate) tp-state = tstate; } } + +/** + * Return the current trace index. + * + */ +unsigned int get_trace_idx(void) +{ +return trace_idx; +} format_trace_string() returns NULL if the index is beyond the last valid trace record. monitor.c doesn't need to know how many trace records there are ahead of time, it can just keep printing until it gets NULL. I don't feel strongly about this but wanted to mention it. format_trace_string() returns NULL when the index passed exceeds the size of trace buffer. This function is meant for printing current contents of trace buffer, which may be less than the entire buffer size. Sorry, you're right the patch will return NULL if the index exceeds the size of the trace buffer. The idea I was suggesting requires it to return NULL when the index= trace_idx. I've tried to keep this as generic as possible. get_trace_idx() can be put to use to query state of trace buffer in different scenarios. + +/** + * returns formatted TraceRecord at a given index in the trace buffer. + * FORMAT : Event %lu : %lx %lx %lx %lx %lx\n + * + * @idx : index in the buffer for which trace record is returned. + * @trace_str : output string passed. + */ +char* format_trace_string(unsigned int idx, char trace_str[]) +{ +TraceRecord rec; +if (idx= TRACE_BUF_LEN || sizeof(trace_str)= MAX_TRACE_STR_LEN) { sizeof(trace_str) == sizeof(char *), not the size of the caller's array in bytes. Hmm, I'll need to scrap off this check. Done. The fixed size limit can be eliminated using asprintf(3), which allocates a string of the right size while doing the string formatting. The caller of format_trace_string() is then responsible for freeing the string when they are done with it. I am somehow reluctant to allocate memory here and free it somewhere else. Calls for memory leaks quite easily in case it gets missed. I'd rather use stack-allocated arrays that clean up after the call