Re: [Qemu-devel] [Question] Question about ACPI table in qemu
On 11/09/2018 18:54, Li Qiang wrote: > Hi all, > > I noticed that both qemu and seabios create the ACPI table. > Once I think he bios' ACPI table will overwrite the qemu's if seabios > compiled with CONFIG_ACPI. Yes, SeaBIOS's ACPI tables are not used anymore, to remove the need to update QEMU and SeaBIOS in sync. > But after I read this --> > https://lists.gnu.org/archive/html/qemu-devel/2013-02/msg04555.html > > There say: > > "just have QEMU pass the tables to SeaBIOS for it to copy into memory like it does on CSM, coreboot, and Xen" > > So where does this 'copy' happens? or is there something others I missed? The copy happens in SeaBIOS's romfile_loader_execute function (called by qemu_platform_setup). Thanks, Paolo
Re: [Qemu-devel] [Question] Question about the i440FX device
Thanks Paolo, Paolo Bonzini 于2018年9月10日周一 上午7:11写道: > On 07/09/2018 08:32, Li Qiang wrote: > > Hello all, > > > > I want to know why the i440FX in the following 'info qtree' information > is > > laid under the pci.0 bus. In the chip spec here: > > -->https://wiki.qemu.org/images/b/bb/29054901.pdf > > I don't see this device. > > > > Can anyone give me some hints? > > Hi, > > the device implements what is in "3.2. PCI Configuration Space Mapped > Registers" in the i440FX spec. > > Indeed, also I find the comments in 'i440fx_class_init' is useful. As I got the 'PCII440FXState' is for the "PCI-facing part of the host bridge" and 'I440FXState' is for the "host-facing part". Thanks, Li Qiang > Paolo >
Re: [Qemu-devel] [Question] Question about the i440FX device
On 07/09/2018 08:32, Li Qiang wrote: > Hello all, > > I want to know why the i440FX in the following 'info qtree' information is > laid under the pci.0 bus. In the chip spec here: > -->https://wiki.qemu.org/images/b/bb/29054901.pdf > I don't see this device. > > Can anyone give me some hints? Hi, the device implements what is in "3.2. PCI Configuration Space Mapped Registers" in the i440FX spec. Paolo
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
On 12/25/2017 01:33 AM, He Junyan wrote: hi all: I am now focusing on snapshot optimization for Intel NVDimm kind memory. Different from the normal memory, the NVDimm may be 128G, 256G or even more for just one guest, and its speed is slower than the normal memory. So sometimes it may take several minutes to complete just one snapshot saving. Even with compression enabled, the snapshot point may consume more than 30G disk space. We decide to add incremental kind snapshot saving to resolve this. Just store difference between snapshot points to save time and disk space. But the current snapshot/save_vm framework seems not to support this. We need to add snapshot dependency and extra operations when we LOAD and DELETE the snapshot point. Is that possible to modify the savevm framework and add some incremental snapshot support to QCOW2 format? In general, the list has tended to focus on external snapshots rather than internal; where persistent bitmaps have been the proposed mechanism for tracking incremental differences between snapshots. But yes, it is certainly feasible that patches to improve internal snapshots to take advantage of incremental relationships may prove useful. You will need to document all enhancements to the qcow2 file format and get that approved first, as interoperability demands that others reading the same spec would be able to interpret the image you create that is utilizing an internal snapshot with an incremental diff. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
On 三, 2018-01-10 at 20:17 +, Stefan Hajnoczi wrote: > On Wed, Jan 10, 2018 at 8:15 PM, Dr. David Alan Gilbert >wrote: > > > > * Stefan Hajnoczi (stefa...@gmail.com) wrote: > > > > > > On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert > > > wrote: > > > > > > > > > > > > > > Certain guest operations like rebooting or zeroing memory > > > > > will defeat > > > > > the incremental guest RAM snapshot feature. It's worth > > > > > thinking about > > > > > these cases to make sure this feature would be worth it in > > > > > real use > > > > > cases. > > > > But those probably wouldn't upset an NVDimm? > > > If the guest dirties all RAM then the incremental snapshot > > > feature > > > degrades to a full snapshot. I'm asking if there are common > > > operations where that happens. > > > > > > I seem to remember Windows guests zero all pages on cold > > > boot. Maybe > > > that's not the case anymore. > > > > > > Worth checking before embarking on this feature because it could > > > be a > > > waste of effort if it turns out real-world guests dirty all > > > memory in > > > common cases. > > Right, but I'm hoping that there's some magic somewhere where an > > NVDimm doesn't > > get zero'd because of a cold boot since that would seem to make it > > volatile. > This feature isn't specific to NVDIMM though. It would be equally > useful for regular RAM. > > Stefan > Thanks for all your advices. I already did a lot of investigation and write some code. My consideration is as following: 1. As the first step, I will use is_active() function to make it just work for nvdimm kind memory region and just for snapshot saving, not for live migration. I understand that this can work for all kinds of memory, but it may low guest's performance if we always enable dirty log tracking for memory. For the live migration, all the data need to be copied and it seems can not get benefit by this manner. 2. Saving and Loading is relatively easy to do, while deleting some snapshot point needs a lot of work to do. The current framework just deletes all the data of one snapshot point by one shot. I want to add some reference to L1/L2 table of QCOW2 when the cluster's data is depended by other snapshot point, so we can keep the data when deleting. I will also check whether the cold boot will zero all pages later. Thanks, Junyan
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
On Wed, Jan 10, 2018 at 8:15 PM, Dr. David Alan Gilbertwrote: > * Stefan Hajnoczi (stefa...@gmail.com) wrote: >> On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert >> wrote: >> >> Certain guest operations like rebooting or zeroing memory will defeat >> >> the incremental guest RAM snapshot feature. It's worth thinking about >> >> these cases to make sure this feature would be worth it in real use >> >> cases. >> > >> > But those probably wouldn't upset an NVDimm? >> >> If the guest dirties all RAM then the incremental snapshot feature >> degrades to a full snapshot. I'm asking if there are common >> operations where that happens. >> >> I seem to remember Windows guests zero all pages on cold boot. Maybe >> that's not the case anymore. >> >> Worth checking before embarking on this feature because it could be a >> waste of effort if it turns out real-world guests dirty all memory in >> common cases. > > Right, but I'm hoping that there's some magic somewhere where an NVDimm > doesn't > get zero'd because of a cold boot since that would seem to make it > volatile. This feature isn't specific to NVDIMM though. It would be equally useful for regular RAM. Stefan
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbert >wrote: > >> Certain guest operations like rebooting or zeroing memory will defeat > >> the incremental guest RAM snapshot feature. It's worth thinking about > >> these cases to make sure this feature would be worth it in real use > >> cases. > > > > But those probably wouldn't upset an NVDimm? > > If the guest dirties all RAM then the incremental snapshot feature > degrades to a full snapshot. I'm asking if there are common > operations where that happens. > > I seem to remember Windows guests zero all pages on cold boot. Maybe > that's not the case anymore. > > Worth checking before embarking on this feature because it could be a > waste of effort if it turns out real-world guests dirty all memory in > common cases. Right, but I'm hoping that there's some magic somewhere where an NVDimm doesn't get zero'd because of a cold boot since that would seem to make it volatile. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
On Tue, Jan 9, 2018 at 7:55 PM, Dr. David Alan Gilbertwrote: >> Certain guest operations like rebooting or zeroing memory will defeat >> the incremental guest RAM snapshot feature. It's worth thinking about >> these cases to make sure this feature would be worth it in real use >> cases. > > But those probably wouldn't upset an NVDimm? If the guest dirties all RAM then the incremental snapshot feature degrades to a full snapshot. I'm asking if there are common operations where that happens. I seem to remember Windows guests zero all pages on cold boot. Maybe that's not the case anymore. Worth checking before embarking on this feature because it could be a waste of effort if it turns out real-world guests dirty all memory in common cases.
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Mon, Dec 25, 2017 at 07:54:00AM +, He, Junyan wrote: > > I am now focusing on snapshot optimization for Intel NVDimm kind memory. > > Different from the normal memory, the NVDimm may be 128G, 256G or even more > > for just one guest, and its speed is slower than the normal memory. So > > sometimes it may take several minutes to complete just one snapshot saving. > > Even with compression enabled, the snapshot point may consume more than 30G > > disk space. We decide to add incremental kind snapshot saving to resolve > > this. Just store difference between snapshot points to save time and disk > > space. > > But the current snapshot/save_vm framework seems not to support this. > > We need to add snapshot dependency and extra operations when we LOAD and > > DELETE the snapshot point. > > Is that possible to modify the savevm framework and add some incremental > > snapshot support to QCOW2 format? > > It sounds like you'd like to save incremental guest RAM snapshots based > on the contents of earlier snapshots. QEMU has no way of doing this at > the moment. > > In order to do this QEMU needs to keep dirty memory logging enabled at > all times so it knows which pages have been written since the last > snapshot. This will affect guest performance. Keeping the dirty logging going is something that COLO does, to send incremental snapshots to the secondary. As you say, it does slow things down and you have to be able to keep the state between the migration runs. > Certain guest operations like rebooting or zeroing memory will defeat > the incremental guest RAM snapshot feature. It's worth thinking about > these cases to make sure this feature would be worth it in real use > cases. But those probably wouldn't upset an NVDimm? > What you're proposing isn't specific to NVDIMM. Regular RAM use cases > also benefit from incremental snapshots because less disk space is > required for the savevm data. > > Some questions about your use case: > > 1. Does the guest have -device nvdimm? > > 2. Is the qcow2 file on NVDIMM? > > Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Some question about savem/qcow2 incremental snapshot
On Mon, Dec 25, 2017 at 07:54:00AM +, He, Junyan wrote: > I am now focusing on snapshot optimization for Intel NVDimm kind memory. > Different from the normal memory, the NVDimm may be 128G, 256G or even more > for just one guest, and its speed is slower than the normal memory. So > sometimes it may take several minutes to complete just one snapshot saving. > Even with compression enabled, the snapshot point may consume more than 30G > disk space. We decide to add incremental kind snapshot saving to resolve > this. Just store difference between snapshot points to save time and disk > space. > But the current snapshot/save_vm framework seems not to support this. > We need to add snapshot dependency and extra operations when we LOAD and > DELETE the snapshot point. > Is that possible to modify the savevm framework and add some incremental > snapshot support to QCOW2 format? It sounds like you'd like to save incremental guest RAM snapshots based on the contents of earlier snapshots. QEMU has no way of doing this at the moment. In order to do this QEMU needs to keep dirty memory logging enabled at all times so it knows which pages have been written since the last snapshot. This will affect guest performance. Certain guest operations like rebooting or zeroing memory will defeat the incremental guest RAM snapshot feature. It's worth thinking about these cases to make sure this feature would be worth it in real use cases. What you're proposing isn't specific to NVDIMM. Regular RAM use cases also benefit from incremental snapshots because less disk space is required for the savevm data. Some questions about your use case: 1. Does the guest have -device nvdimm? 2. Is the qcow2 file on NVDIMM? Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.
Le 08/01/2018 à 20:56, Mark Cave-Ayland a écrit : Thanks for the patch! I'm afraid I don't really have any experience with LEON as my focus is sun4m/sun4u, however I'm happy to take patches Acked/Reviewed by Fabien as the current LEON maintainer I am waiting for Fabien feedback after my experiment with tsim. if they don't cause any regressions in my own tests. Of course. JC ATB, Mark.
Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.
On 02/01/18 11:13, Jean-Christophe DUBOIS wrote: Hi Mark, Artyom, I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct when it comes to acknowledging interrupts. With the actual code an interrupt can be lowered/acked only by an "ack" from the processor which means that the trap handler related to this external interrupt needs to be run for the ack to happen. In particular this means that the interrupt cannot be acked only by software. Even if the software clears the "pending" interrupts (by writing to the CLEAR_OFFSET register before the interrupt handler is run) this does not clear the interrupt to the processor (which is kept asserted until the handler is run and the interrupt acked by the processor). Do you know if this is indeed the intended behavior (I understand that for most operating system the interrupt handler will be run at last and this does not make a difference)? I would expect that clearing interrupt through software (by writing to the CLEAR_OFFSET register) would have the same effect as the processor acknowledgment (and could avoid to run the interrupt handler if things have already been taken care of by software). Unfortunately the documentation I got (on the web) on the IRQMP is not very clear on the topic. Anyway you can find below the patch I'd like to provide for IRQMP. Thanks Thanks for the patch! I'm afraid I don't really have any experience with LEON as my focus is sun4m/sun4u, however I'm happy to take patches Acked/Reviewed by Fabien as the current LEON maintainer if they don't cause any regressions in my own tests. ATB, Mark.
Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.
Hi, So after trying my code on tsim, I can confirm that the software is indeed able to clear/ack the interrupt without requiring the ack from the processor. Things are a bit strange with tsim as the simulator doesn't seem to respect time delay when the processor is in sleep/idle mode and jump straight to the next timer expiration/interrupt (incrementing the required clock cycles in the process) ... And as the evaluation version of tsim stops the program after 2^32 clock cycles the program could not run more than 86 seconds (with the LEON clock @ 50 MHz). But still the interrupts are acknowledged by software only without requiring the trap handler to run or the processor to ack the interrupt on a hardware way. So I believe that my proposed modification is correct. On a related note, tsim is using different interrupts (than Qemu) for timer and all. So what is the reference LEON3 platform that Qemu is emulating with leon3_generic? JC Le 03/01/2018 à 10:23, j...@tribudubois.net a écrit : Le 2018-01-02 19:58, Fabien Chouteau a écrit : Hello Jean-Christophe, I'm the original author of this patch and I add in copy my colleague Frederic. On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote: I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct when it comes to acknowledging interrupts. With the actual code an interrupt can be lowered/acked only by an "ack" from the processor which means that the trap handler related to this external interrupt needs to be run for the ack to happen. In particular this means that the interrupt cannot be acked only by software. Even if the software clears the "pending" interrupts (by writing to the CLEAR_OFFSET register before the interrupt handler is run) this does not clear the interrupt to the processor (which is kept asserted until the handler is run and the interrupt acked by the processor). Do you know if this is indeed the intended behavior (I understand that for most operating system the interrupt handler will be run at last and this does not make a difference)? I would expect that clearing interrupt through software (by writing to the CLEAR_OFFSET register) would have the same effect as the processor acknowledgment (and could avoid to run the interrupt handler if things have already been taken care of by software). Unfortunately the documentation I got (on the web) on the IRQMP is not very clear on the topic. I don't remember all the details of this CPU on top of my head, I worked on this years ago. If you have access to a real board the best would be to compare the behavior of the CPU on it. Unfortunately I don't have a real board (yet). There's also a cycle accurate simulator of Leon3, you can download an evaluation version here: http://www.gaisler.com/index.php/downloads/simulators OK, I will try the tsim simulator from Gaisler as a reference. Anyway you can find below the patch I'd like to provide for IRQMP. Can you explain the reason for this change? Why can't you use the current interrupt handling? I am working on a cooperative multitasking kernel (with no preemption). So the kernel is not handling interrupt related traps (actually the kernel is not handling the interrupt controller). All interrupts are masked at all time for all application or kernel so no interrupt trap handler is ever going to trigger (except for IRQ 15 which is not maskable). When the tasks have nothing to do the kernel goes to sleep using ASR19 on LEON. So the system is awaken by next interrupt and the kernel will schedule the task handling the interrupt controller. On LEON, I can go to sleep until the first interrupt. Once the first interrupt has been raised the LEON will never be able to get to sleep/idle again through ASR19 (it exists immediately) even if the interrupt controller handling task clears the interrupt (writing to CLEAR_OFFSET register). And this is because, in the actual Qemu implementation, the interrupt can only be acknowledged in the interrupt controller by the CPU through the triggering of the related trap handler. So I am wondering if this is indeed the expected behavior in real hardware/life (interrupts can only be acked by processor and not by software). Note: On a related subject I am wondering if the system (put in idle/sleep through ASR19) would woke up on interrupt if the interrupts are all masked through PSR (PIL) (it does wake up in Qemu for now). I will also test this on tsim before trying it on real hardware someday. Regards,
Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.
Le 2018-01-02 19:58, Fabien Chouteau a écrit : Hello Jean-Christophe, I'm the original author of this patch and I add in copy my colleague Frederic. On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote: I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct when it comes to acknowledging interrupts. With the actual code an interrupt can be lowered/acked only by an "ack" from the processor which means that the trap handler related to this external interrupt needs to be run for the ack to happen. In particular this means that the interrupt cannot be acked only by software. Even if the software clears the "pending" interrupts (by writing to the CLEAR_OFFSET register before the interrupt handler is run) this does not clear the interrupt to the processor (which is kept asserted until the handler is run and the interrupt acked by the processor). Do you know if this is indeed the intended behavior (I understand that for most operating system the interrupt handler will be run at last and this does not make a difference)? I would expect that clearing interrupt through software (by writing to the CLEAR_OFFSET register) would have the same effect as the processor acknowledgment (and could avoid to run the interrupt handler if things have already been taken care of by software). Unfortunately the documentation I got (on the web) on the IRQMP is not very clear on the topic. I don't remember all the details of this CPU on top of my head, I worked on this years ago. If you have access to a real board the best would be to compare the behavior of the CPU on it. Unfortunately I don't have a real board (yet). There's also a cycle accurate simulator of Leon3, you can download an evaluation version here: http://www.gaisler.com/index.php/downloads/simulators OK, I will try the tsim simulator from Gaisler as a reference. Anyway you can find below the patch I'd like to provide for IRQMP. Can you explain the reason for this change? Why can't you use the current interrupt handling? I am working on a cooperative multitasking kernel (with no preemption). So the kernel is not handling interrupt related traps (actually the kernel is not handling the interrupt controller). All interrupts are masked at all time for all application or kernel so no interrupt trap handler is ever going to trigger (except for IRQ 15 which is not maskable). When the tasks have nothing to do the kernel goes to sleep using ASR19 on LEON. So the system is awaken by next interrupt and the kernel will schedule the task handling the interrupt controller. On LEON, I can go to sleep until the first interrupt. Once the first interrupt has been raised the LEON will never be able to get to sleep/idle again through ASR19 (it exists immediately) even if the interrupt controller handling task clears the interrupt (writing to CLEAR_OFFSET register). And this is because, in the actual Qemu implementation, the interrupt can only be acknowledged in the interrupt controller by the CPU through the triggering of the related trap handler. So I am wondering if this is indeed the expected behavior in real hardware/life (interrupts can only be acked by processor and not by software). Note: On a related subject I am wondering if the system (put in idle/sleep through ASR19) would woke up on interrupt if the interrupts are all masked through PSR (PIL) (it does wake up in Qemu for now). I will also test this on tsim before trying it on real hardware someday. Regards,
Re: [Qemu-devel] [SPARC] question on LEON IRQMP interrupt controller.
Hello Jean-Christophe, I'm the original author of this patch and I add in copy my colleague Frederic. On 02/01/2018 12:13, Jean-Christophe DUBOIS wrote: > I am wondering if the IRQMP code in hw/intc/grlib_irqmp.c is correct > when it comes to acknowledging interrupts. > > With the actual code an interrupt can be lowered/acked only by an > "ack" from the processor which means that the trap handler related to > this external interrupt needs to be run for the ack to happen. > > In particular this means that the interrupt cannot be acked only by > software. Even if the software clears the "pending" interrupts (by > writing to the CLEAR_OFFSET register before the interrupt handler is > run) this does not clear the interrupt to the processor (which is kept > asserted until the handler is run and the interrupt acked by the > processor). Do you know if this is indeed the intended behavior (I > understand that for most operating system the interrupt handler will > be run at last and this does not make a difference)? > > I would expect that clearing interrupt through software (by writing to > the CLEAR_OFFSET register) would have the same effect as the processor > acknowledgment (and could avoid to run the interrupt handler if things > have already been taken care of by software). > > Unfortunately the documentation I got (on the web) on the IRQMP is not > very clear on the topic. > I don't remember all the details of this CPU on top of my head, I worked on this years ago. If you have access to a real board the best would be to compare the behavior of the CPU on it. There's also a cycle accurate simulator of Leon3, you can download an evaluation version here: http://www.gaisler.com/index.php/downloads/simulators > Anyway you can find below the patch I'd like to provide for IRQMP. > Can you explain the reason for this change? Why can't you use the current interrupt handling? Regards,
Re: [Qemu-devel] coroutine question, for NBD debugging
On 03/11/2017 21:03, Eric Blake wrote: > In include/qemu/coroutine.h, we have: > > /** > * Yield the coroutine for a given duration > * > * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be > * resumed when using aio_poll(). > */ > void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type, > > but there is no co_sleep_ns() anywhere. What should the documentation > actually state? The old doc comment for co_sleep_ns was: /** * Yield the coroutine for a given duration * * Note this function uses timers and hence only works when a main loop is in * use. See main-loop.h and do not use from qemu-tool programs. */ So probably: This function uses timers and hence needs to know the event loop (#AioContext) to place the timer on. In any case, co_aio_sleep_ns() does not affect the #AioContext where the current coroutine is running, as the coroutine will restart on the same #AioContext that it is running on. Thanks, Paolo > > Meanwhile, I'm trying to figure out if there is an easy way to hack > qemu-nbd to send two structured replies interleaved (as in A.1, B.1, > A.2, B.2) to ensure the NBD client coroutines properly handle > interleaved responses, but adding a plain sleep() between A.1 and A.2 > does not work, and I'm not sure whether co_aio_sleep_ns() would work for > the hack instead. > > The hack I'm currently playing with splits reads and structured errors > into two parts, attempting to use sleeps to force the server to send > replies out of order: > > diff --git c/nbd/server.c w/nbd/server.c > index bcf0cdb47c..4b642127bb 100644 > --- c/nbd/server.c > +++ w/nbd/server.c > @@ -1285,13 +1285,24 @@ static int coroutine_fn > nbd_co_send_structured_read(NBDClient *client, > {.iov_base = , .iov_len = sizeof(chunk)}, > {.iov_base = data, .iov_len = size} > }; > +int ret; > > trace_nbd_co_send_structured_read(handle, offset, data, size); > -set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_OFFSET_DATA, > +set_be_chunk(, 0, NBD_REPLY_TYPE_OFFSET_DATA, > handle, sizeof(chunk) - sizeof(chunk.h) + size); > stq_be_p(, offset); > > -return nbd_co_send_iov(client, iov, 2, errp); > +ret = nbd_co_send_iov(client, iov, 2, errp); > +if (ret < 0) { > +return ret; > +} > + > +sleep(2); > + > +set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, > + handle, 0); > +iov[0].iov_len = sizeof(chunk.h); > +return nbd_co_send_iov(client, iov, 1, errp); > } > > static int coroutine_fn nbd_co_send_structured_error(NBDClient *client, > @@ -1306,16 +1317,27 @@ static int coroutine_fn > nbd_co_send_structured_error(NBDClient *client, > {.iov_base = , .iov_len = sizeof(chunk)}, > {.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0}, > }; > +int ret; > > assert(nbd_err); > trace_nbd_co_send_structured_error(handle, nbd_err, > nbd_err_lookup(nbd_err), msg ? > msg : ""); > -set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_ERROR, > handle, > +set_be_chunk(, 0, NBD_REPLY_TYPE_ERROR, handle, > sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len); > stl_be_p(, nbd_err); > stw_be_p(_length, iov[1].iov_len); > > -return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp); > +ret = nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp); > +if (ret < 0) { > +return ret; > +} > + > +sleep(1); > + > +set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, > + handle, 0); > +iov[0].iov_len = sizeof(chunk.h); > +return nbd_co_send_iov(client, iov, 1, errp); > } > > /* nbd_co_receive_request > > > coupled with a client that does: > > $ ./qemu-io -f raw nbd://localhost:10809/foo --trace='nbd_*' -c > 'aio_write -P 3 1 1' -c 'aio_read -P 1 1 1' > > but the trace shows that the client does not receive B.1 until after it > has blocked waiting for A.2, which is not what I was hoping to see. > signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] coroutine question, for NBD debugging
On 11/03/2017 03:03 PM, Eric Blake wrote: > In include/qemu/coroutine.h, we have: > > /** > * Yield the coroutine for a given duration > * > * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be > * resumed when using aio_poll(). > */ > void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type, > > but there is no co_sleep_ns() anywhere. What should the documentation > actually state? This part still needs fixing; but partially answering myself: > > Meanwhile, I'm trying to figure out if there is an easy way to hack > qemu-nbd to send two structured replies interleaved (as in A.1, B.1, > A.2, B.2) to ensure the NBD client coroutines properly handle > interleaved responses, but adding a plain sleep() between A.1 and A.2 > does not work, and I'm not sure whether co_aio_sleep_ns() would work for > the hack instead. > > The hack I'm currently playing with splits reads and structured errors > into two parts, attempting to use sleeps to force the server to send > replies out of order: > > +sleep(2); Using this instead of sleep(): +co_aio_sleep_ns(client->exp->ctx, QEMU_CLOCK_REALTIME, 20); > coupled with a client that does: > > $ ./qemu-io -f raw nbd://localhost:10809/foo --trace='nbd_*' -c > 'aio_write -P 3 1 1' -c 'aio_read -P 1 1 1' > produced the trace I was looking for: 2837@1509745520.815687:nbd_send_request Sending request to server: { .from = 1, .len = 1, .handle = 93924431318768, .flags = 0x1, .type = 1 (write) } 2837@1509745520.815764:nbd_send_request Sending request to server: { .from = 1, .len = 1, .handle = 93924431318769, .flags = 0x0, .type = 0 (read) } 2837@1509745520.815913:nbd_receive_structured_reply_chunk Got structured reply chunk: { flags = 0x0, type = 32769, handle = 93924431318768, length = 25 } 2837@1509745520.856401:nbd_receive_structured_reply_chunk Got structured reply chunk: { flags = 0x0, type = 1, handle = 93924431318769, length = 9 } 2837@1509745521.817279:nbd_receive_structured_reply_chunk Got structured reply chunk: { flags = 0x1, type = 0, handle = 93924431318768, length = 0 } aio_write failed: Operation not permitted 2837@1509745522.817474:nbd_receive_structured_reply_chunk Got structured reply chunk: { flags = 0x1, type = 0, handle = 93924431318769, length = 0 } read 1/1 bytes at offset 1 1 bytes, 1 ops; 0:00:02.00 (0.499560 bytes/sec and 0.4996 ops/sec) which means I'm fairly confident that our client implementation is doing the right thing with regards to complex out-of-order/interleaved structured replies! Always a nice thing to demonstrate. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On Fri, Oct 20, 2017 at 15:04:57 +0200, David Hildenbrand wrote: > On 20.10.2017 14:50, Jiri Denemark wrote: > > The thing is libvirt calls query-cpu-model-expansion to check what the > > host CPU is. This 'host-model' CPU is replaced with the probed CPU model > > when a domain starts. The problem described by Marc is that the probed > > CPU model cannot be used universally with all machine types. So starting > > a domain on such host with machine type s390-virtio-ccw-2.10 works, but > > a domain with machine type s390-virtio-ccw-2.9 fails to start with the > > same probed CPU model. > > > > My assumption would be that the CPU model is copied into the XML when > the domain fist starts. This is what the documentation describes. The CPU model is copied into the XML each time the domain starts. > So when upgrading QEMU, the CPU model in the XML is still the same > (z13), even though something different is now reported in the host info > after upgrading QEMU (z14). > > In this case it would continue to work. > > The problem is that the CPU model is not copied into the XML doesn't > remain there, right? It is suddenly replaced with a z14 host model. Even preserving the actual CPU model in the XML from the first start wouldn't solve the issue. You can always create a new domain with s390-virtio-ccw-2.9 machine type even on the upgraded host. Jirka
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
> > I intend to put some brain-power in this too. Probably next week. > > My general impression is, that I have a at places different understanding > of how things should work compared to David. Especially when it comes > to this concept of persistent copying, and also an end-user-digestible > definition of what host-model does. (I think this with copying capabilities > from whatever xml which is subject to convoluted caching is a bit too > hard to digest for an end user not involved in the development of qemu > and libvirt). When reading the doc I was assuming that it would be a persistent copy. But it would only fix part of the issue. In the end, it specifies quite clearly what is copied. And if that is not runnable with the selected machine, bad luck. Therefore ... > > I had a conversation with Boris a couple of hours ago, and it seems, things > are pretty convoluted. > > If I understand the train of thought here (David) it can be summarized like > this: > For a certain QEMU binary no aspect of the cpu-models may depend on the > machine type. In particular, compat properties and compat handling is > not alowed to alter cpu-models (whether the available cpu models nor the > capabilities of each of these). ... I always recommended avoiding such compatibility settings in the machine. But before we had CPU models it was really all we had. No we should let go of it. We might be able to fix this one (gs) and take care of it in the future, but ... > > This we would have to make a part of the external interface. That way > one could be sure that the 'cpu capabilities' are machine-type independent > (that is, the same for all the machine types). ... the problem is once nasty stuff like zPCI comes in place. If we ever have (other?) machine dependent stuff that toggles CPU features, we can only try limit the damage. > > Or did I get this completely wrong? (Based on the answer branches my > think). Not at all. > > Halil > -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 10/20/2017 04:12 PM, Christian Borntraeger wrote: > > > On 10/20/2017 04:06 PM, David Hildenbrand wrote: >> On 20.10.2017 16:02, Christian Borntraeger wrote: >>> >>> >>> On 10/20/2017 03:51 PM, David Hildenbrand wrote: >>> [...] > The problem goes much further. > A fresh guest with > > > hvm > > > > does not start. No migration from an older system is necessary. > Yes, as stated in the documentation "copying host CPU definition from capabilities XML" this can not work. And it works just as documented. Not saying this is a nice thing :) I think we should try to fix gs_allowed (if possible) and avoid something like that in the future. This would avoid other complexity involved when suddenly having X host models. >>> >>> Maybe this one is really a proper fix. It will allow the guest to start >>> and on migration the cpu model will complain if the target cannot provide >>> gs. >>> Similar things can happen if - for example - the host kernel lacks some >>> features. >> >> Right, just what I had in mind. >> >>> >>> >>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c >>> index 77169bb..97a08fa 100644 >>> --- a/hw/s390x/s390-virtio-ccw.c >>> +++ b/hw/s390x/s390-virtio-ccw.c >>> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, >>> void *data) >>> s390mc->ri_allowed = true; >>> s390mc->cpu_model_allowed = true; >>> s390mc->css_migration_enabled = true; >>> -s390mc->gs_allowed = true; >>> mc->init = ccw_init; >>> mc->reset = s390_machine_reset; >>> mc->hot_add_cpu = s390_hot_add_cpu; >>> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void) >>> return get_machine_class()->cpu_model_allowed; >>> } >>> >>> -bool gs_allowed(void) >>> -{ >>> -/* for "none" machine this results in true */ >>> -return get_machine_class()->gs_allowed; >>> -} >>> - >>> static char *machine_get_loadparm(Object *obj, Error **errp) >>> { >>> S390CcwMachineState *ms = S390_CCW_MACHINE(obj); >>> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass >>> *mc) >>> { >>> S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc); >>> >>> -s390mc->gs_allowed = false; >>> ccw_machine_2_10_class_options(mc); >>> SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9); >>> s390mc->css_migration_enabled = false; >>> diff --git a/include/hw/s390x/s390-virtio-ccw.h >>> b/include/hw/s390x/s390-virtio-ccw.h >>> index a9a90c2..1de53b0 100644 >>> --- a/include/hw/s390x/s390-virtio-ccw.h >>> +++ b/include/hw/s390x/s390-virtio-ccw.h >>> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass { >>> bool ri_allowed; >>> bool cpu_model_allowed; >>> bool css_migration_enabled; >>> -bool gs_allowed; >>> } S390CcwMachineClass; >>> >>> /* runtime-instrumentation allowed by the machine */ >>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c >>> index a0d5052..3f13fc2 100644 >>> --- a/target/s390x/kvm.c >>> +++ b/target/s390x/kvm.c >>> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) >>> cap_ri = 1; >>> } >>> } >>> -if (gs_allowed()) { >>> +if (cpu_model_allowed()) { >>> if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) { >>> cap_gs = 1; >>> } >>> >> >> And the last hunk makes sure we keep same handling for machines without >> CPU model support and therefore no way to mask support. For all recent >> machines, we expect CPU model checks to be in place. >> >> Will have to think about this a bit more. Will you send this as a proper >> patch? > > After thinking about it :-) > I intend to put some brain-power in this too. Probably next week. My general impression is, that I have a at places different understanding of how things should work compared to David. Especially when it comes to this concept of persistent copying, and also an end-user-digestible definition of what host-model does. (I think this with copying capabilities from whatever xml which is subject to convoluted caching is a bit too hard to digest for an end user not involved in the development of qemu and libvirt). I had a conversation with Boris a couple of hours ago, and it seems, things are pretty convoluted. If I understand the train of thought here (David) it can be summarized like this: For a certain QEMU binary no aspect of the cpu-models may depend on the machine type. In particular, compat properties and compat handling is not alowed to alter cpu-models (whether the available cpu models nor the capabilities of each of these). This we would have to make a part of the external interface. That way one could be sure that the 'cpu capabilities' are machine-type independent (that is, the same for all the machine types). Or did I get this completely wrong? (Based on the answer branches my think). Halil
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 10/20/2017 04:06 PM, David Hildenbrand wrote: > On 20.10.2017 16:02, Christian Borntraeger wrote: >> >> >> On 10/20/2017 03:51 PM, David Hildenbrand wrote: >> [...] The problem goes much further. A fresh guest with hvm does not start. No migration from an older system is necessary. >>> >>> Yes, as stated in the documentation "copying host CPU definition from >>> capabilities XML" this can not work. And it works just as documented. >>> Not saying this is a nice thing :) >>> >>> I think we should try to fix gs_allowed (if possible) and avoid >>> something like that in the future. This would avoid other complexity >>> involved when suddenly having X host models. >> >> Maybe this one is really a proper fix. It will allow the guest to start >> and on migration the cpu model will complain if the target cannot provide gs. >> Similar things can happen if - for example - the host kernel lacks some >> features. > > Right, just what I had in mind. > >> >> >> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c >> index 77169bb..97a08fa 100644 >> --- a/hw/s390x/s390-virtio-ccw.c >> +++ b/hw/s390x/s390-virtio-ccw.c >> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void >> *data) >> s390mc->ri_allowed = true; >> s390mc->cpu_model_allowed = true; >> s390mc->css_migration_enabled = true; >> -s390mc->gs_allowed = true; >> mc->init = ccw_init; >> mc->reset = s390_machine_reset; >> mc->hot_add_cpu = s390_hot_add_cpu; >> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void) >> return get_machine_class()->cpu_model_allowed; >> } >> >> -bool gs_allowed(void) >> -{ >> -/* for "none" machine this results in true */ >> -return get_machine_class()->gs_allowed; >> -} >> - >> static char *machine_get_loadparm(Object *obj, Error **errp) >> { >> S390CcwMachineState *ms = S390_CCW_MACHINE(obj); >> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass >> *mc) >> { >> S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc); >> >> -s390mc->gs_allowed = false; >> ccw_machine_2_10_class_options(mc); >> SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9); >> s390mc->css_migration_enabled = false; >> diff --git a/include/hw/s390x/s390-virtio-ccw.h >> b/include/hw/s390x/s390-virtio-ccw.h >> index a9a90c2..1de53b0 100644 >> --- a/include/hw/s390x/s390-virtio-ccw.h >> +++ b/include/hw/s390x/s390-virtio-ccw.h >> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass { >> bool ri_allowed; >> bool cpu_model_allowed; >> bool css_migration_enabled; >> -bool gs_allowed; >> } S390CcwMachineClass; >> >> /* runtime-instrumentation allowed by the machine */ >> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c >> index a0d5052..3f13fc2 100644 >> --- a/target/s390x/kvm.c >> +++ b/target/s390x/kvm.c >> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) >> cap_ri = 1; >> } >> } >> -if (gs_allowed()) { >> +if (cpu_model_allowed()) { >> if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) { >> cap_gs = 1; >> } >> > > And the last hunk makes sure we keep same handling for machines without > CPU model support and therefore no way to mask support. For all recent > machines, we expect CPU model checks to be in place. > > Will have to think about this a bit more. Will you send this as a proper > patch? After thinking about it :-)
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 20.10.2017 16:02, Christian Borntraeger wrote: > > > On 10/20/2017 03:51 PM, David Hildenbrand wrote: > [...] >>> The problem goes much further. >>> A fresh guest with >>> >>> >>> hvm >>> >>> >>> >>> does not start. No migration from an older system is necessary. >>> >> >> Yes, as stated in the documentation "copying host CPU definition from >> capabilities XML" this can not work. And it works just as documented. >> Not saying this is a nice thing :) >> >> I think we should try to fix gs_allowed (if possible) and avoid >> something like that in the future. This would avoid other complexity >> involved when suddenly having X host models. > > Maybe this one is really a proper fix. It will allow the guest to start > and on migration the cpu model will complain if the target cannot provide gs. > Similar things can happen if - for example - the host kernel lacks some > features. Right, just what I had in mind. > > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index 77169bb..97a08fa 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void > *data) > s390mc->ri_allowed = true; > s390mc->cpu_model_allowed = true; > s390mc->css_migration_enabled = true; > -s390mc->gs_allowed = true; > mc->init = ccw_init; > mc->reset = s390_machine_reset; > mc->hot_add_cpu = s390_hot_add_cpu; > @@ -509,12 +508,6 @@ bool cpu_model_allowed(void) > return get_machine_class()->cpu_model_allowed; > } > > -bool gs_allowed(void) > -{ > -/* for "none" machine this results in true */ > -return get_machine_class()->gs_allowed; > -} > - > static char *machine_get_loadparm(Object *obj, Error **errp) > { > S390CcwMachineState *ms = S390_CCW_MACHINE(obj); > @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass > *mc) > { > S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc); > > -s390mc->gs_allowed = false; > ccw_machine_2_10_class_options(mc); > SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9); > s390mc->css_migration_enabled = false; > diff --git a/include/hw/s390x/s390-virtio-ccw.h > b/include/hw/s390x/s390-virtio-ccw.h > index a9a90c2..1de53b0 100644 > --- a/include/hw/s390x/s390-virtio-ccw.h > +++ b/include/hw/s390x/s390-virtio-ccw.h > @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass { > bool ri_allowed; > bool cpu_model_allowed; > bool css_migration_enabled; > -bool gs_allowed; > } S390CcwMachineClass; > > /* runtime-instrumentation allowed by the machine */ > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > index a0d5052..3f13fc2 100644 > --- a/target/s390x/kvm.c > +++ b/target/s390x/kvm.c > @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > cap_ri = 1; > } > } > -if (gs_allowed()) { > +if (cpu_model_allowed()) { > if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) { > cap_gs = 1; > } > And the last hunk makes sure we keep same handling for machines without CPU model support and therefore no way to mask support. For all recent machines, we expect CPU model checks to be in place. Will have to think about this a bit more. Will you send this as a proper patch? -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 10/20/2017 03:51 PM, David Hildenbrand wrote: [...] >> The problem goes much further. >> A fresh guest with >> >> >> hvm >> >> >> >> does not start. No migration from an older system is necessary. >> > > Yes, as stated in the documentation "copying host CPU definition from > capabilities XML" this can not work. And it works just as documented. > Not saying this is a nice thing :) > > I think we should try to fix gs_allowed (if possible) and avoid > something like that in the future. This would avoid other complexity > involved when suddenly having X host models. Maybe this one is really a proper fix. It will allow the guest to start and on migration the cpu model will complain if the target cannot provide gs. Similar things can happen if - for example - the host kernel lacks some features. diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 77169bb..97a08fa 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void *data) s390mc->ri_allowed = true; s390mc->cpu_model_allowed = true; s390mc->css_migration_enabled = true; -s390mc->gs_allowed = true; mc->init = ccw_init; mc->reset = s390_machine_reset; mc->hot_add_cpu = s390_hot_add_cpu; @@ -509,12 +508,6 @@ bool cpu_model_allowed(void) return get_machine_class()->cpu_model_allowed; } -bool gs_allowed(void) -{ -/* for "none" machine this results in true */ -return get_machine_class()->gs_allowed; -} - static char *machine_get_loadparm(Object *obj, Error **errp) { S390CcwMachineState *ms = S390_CCW_MACHINE(obj); @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass *mc) { S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc); -s390mc->gs_allowed = false; ccw_machine_2_10_class_options(mc); SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9); s390mc->css_migration_enabled = false; diff --git a/include/hw/s390x/s390-virtio-ccw.h b/include/hw/s390x/s390-virtio-ccw.h index a9a90c2..1de53b0 100644 --- a/include/hw/s390x/s390-virtio-ccw.h +++ b/include/hw/s390x/s390-virtio-ccw.h @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass { bool ri_allowed; bool cpu_model_allowed; bool css_migration_enabled; -bool gs_allowed; } S390CcwMachineClass; /* runtime-instrumentation allowed by the machine */ diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c index a0d5052..3f13fc2 100644 --- a/target/s390x/kvm.c +++ b/target/s390x/kvm.c @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) cap_ri = 1; } } -if (gs_allowed()) { +if (cpu_model_allowed()) { if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) { cap_gs = 1; }
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 20.10.2017 15:49, Christian Borntraeger wrote: > > > On 10/20/2017 03:43 PM, David Hildenbrand wrote: >> On 20.10.2017 15:36, Christian Borntraeger wrote: >>> >>> >>> On 10/20/2017 03:16 PM, David Hildenbrand wrote: > Hi all, > > we recently encountered the problem that the 'host-model' [1] has to be > related to the machine type of a domain. We have following problem: > >Let's assume we've a z13 system with a QEMU 2.9 and we define a >domain using the default s390-virtio-ccw machine together with the >host-model CPU mode [1]. The definition will have the machine >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >in the domain definition. In a next step we upgrade to QEMU 2.10 >(first version to recognize z14). Everything is still fine, even >though the machine runs in 2.9 compatibility mode. Finally we >upgrade to a z14. As a consequence it is not possible to start the >domain anymore as the machine type doesn't support our CPU host >model (which is expanded at start time of the domain). Actually, what is the cause of that problem? I assume it is the gs feature (gs_allowed)? We should really avoid such things (..._allowed) for CPU model features in the future and clue all new such stuff to cpumodel_allowed. >>> >>> Yes, starting a guest with >>> >>> hvm >>> >>> >>> >>> results in >>> >>> qemu-system-s390x: Some features requested in the CPU model are not >>> available in the configuration: gs >>> >>> Tying it to cpumodel_allowed would not help, migration-wise though. >>> libvirt would still transform >>> >>> >>> hvm >>> >>> >> >> My point was, that the host model would have to be copied and _remain_ >> there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9. >> >> So really replacing by the model z13-base >> This would at least fix this issue. Just like s390-ccw-virtio get's >> replaced and remains thats way. >> >> But this might for sure have other problems. > > The problem goes much further. > A fresh guest with > > > hvm > > > > does not start. No migration from an older system is necessary. > Yes, as stated in the documentation "copying host CPU definition from capabilities XML" this can not work. And it works just as documented. Not saying this is a nice thing :) I think we should try to fix gs_allowed (if possible) and avoid something like that in the future. This would avoid other complexity involved when suddenly having X host models. -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 10/20/2017 03:43 PM, David Hildenbrand wrote: > On 20.10.2017 15:36, Christian Borntraeger wrote: >> >> >> On 10/20/2017 03:16 PM, David Hildenbrand wrote: >>> Hi all, we recently encountered the problem that the 'host-model' [1] has to be related to the machine type of a domain. We have following problem: Let's assume we've a z13 system with a QEMU 2.9 and we define a domain using the default s390-virtio-ccw machine together with the host-model CPU mode [1]. The definition will have the machine expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode in the domain definition. In a next step we upgrade to QEMU 2.10 (first version to recognize z14). Everything is still fine, even though the machine runs in 2.9 compatibility mode. Finally we upgrade to a z14. As a consequence it is not possible to start the domain anymore as the machine type doesn't support our CPU host model (which is expanded at start time of the domain). >>> >>> Actually, what is the cause of that problem? I assume it is the gs >>> feature (gs_allowed)? >>> >>> We should really avoid such things (..._allowed) for CPU model features >>> in the future and clue all new such stuff to cpumodel_allowed. >> >> Yes, starting a guest with >> >> hvm >> >> >> >> results in >> >> qemu-system-s390x: Some features requested in the CPU model are not >> available in the configuration: gs >> >> Tying it to cpumodel_allowed would not help, migration-wise though. >> libvirt would still transform >> >> >> hvm >> >> > > My point was, that the host model would have to be copied and _remain_ > there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9. > > So really replacing by the model z13-base > This would at least fix this issue. Just like s390-ccw-virtio get's > replaced and remains thats way. > > But this might for sure have other problems. The problem goes much further. A fresh guest with hvm does not start. No migration from an older system is necessary.
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 20.10.2017 15:36, Christian Borntraeger wrote: > > > On 10/20/2017 03:16 PM, David Hildenbrand wrote: >> >>> Hi all, >>> >>> we recently encountered the problem that the 'host-model' [1] has to be >>> related to the machine type of a domain. We have following problem: >>> >>>Let's assume we've a z13 system with a QEMU 2.9 and we define a >>>domain using the default s390-virtio-ccw machine together with the >>>host-model CPU mode [1]. The definition will have the machine >>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >>>in the domain definition. In a next step we upgrade to QEMU 2.10 >>>(first version to recognize z14). Everything is still fine, even >>>though the machine runs in 2.9 compatibility mode. Finally we >>>upgrade to a z14. As a consequence it is not possible to start the >>>domain anymore as the machine type doesn't support our CPU host >>>model (which is expanded at start time of the domain). >> >> Actually, what is the cause of that problem? I assume it is the gs >> feature (gs_allowed)? >> >> We should really avoid such things (..._allowed) for CPU model features >> in the future and clue all new such stuff to cpumodel_allowed. > > Yes, starting a guest with > > hvm > > > > results in > > qemu-system-s390x: Some features requested in the CPU model are not available > in the configuration: gs > > Tying it to cpumodel_allowed would not help, migration-wise though. > libvirt would still transform > > > hvm > > My point was, that the host model would have to be copied and _remain_ there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9. So really replacing by the model z13-base This would at least fix this issue. Just like s390-ccw-virtio get's replaced and remains thats way. But this might for sure have other problems. > > > into > -cpu > z14-base,aen=on,cmmnt=on,aefsi=on,mepoch=on,msa8=on,msa7=on,msa6=on,msa5=on,msa4=on, > \ > msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,vxeh=on,vxpd=on, > \ > esop=on,iep=on,cte=on,ais=on,gs=on,zpci=on,sea_esop2=on,te=on,cmm=on > ^ > because cpu model is certainly there. Now the guest would start but migration > would > later fail because what we create now would never have been possible with 2.9. Migration is a totally different story, as tooling has to make sure to find a CPU model that is compatible over all host models. So cpumodel_allowed would indeed work. (at least in my theory ;) ) > > If libvirt could get information from QEMU depending on the machine version, > this would > make it work. But of course this has other issues. I am not sure if that is the right thing to do. The documentation states clearly that the host model is copied. If that is not runnable, fix the setup. > > Christian > -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 10/20/2017 03:16 PM, David Hildenbrand wrote: > >> Hi all, >> >> we recently encountered the problem that the 'host-model' [1] has to be >> related to the machine type of a domain. We have following problem: >> >>Let's assume we've a z13 system with a QEMU 2.9 and we define a >>domain using the default s390-virtio-ccw machine together with the >>host-model CPU mode [1]. The definition will have the machine >>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >>in the domain definition. In a next step we upgrade to QEMU 2.10 >>(first version to recognize z14). Everything is still fine, even >>though the machine runs in 2.9 compatibility mode. Finally we >>upgrade to a z14. As a consequence it is not possible to start the >>domain anymore as the machine type doesn't support our CPU host >>model (which is expanded at start time of the domain). > > Actually, what is the cause of that problem? I assume it is the gs > feature (gs_allowed)? > > We should really avoid such things (..._allowed) for CPU model features > in the future and clue all new such stuff to cpumodel_allowed. Yes, starting a guest with hvm results in qemu-system-s390x: Some features requested in the CPU model are not available in the configuration: gs Tying it to cpumodel_allowed would not help, migration-wise though. libvirt would still transform hvm into -cpu z14-base,aen=on,cmmnt=on,aefsi=on,mepoch=on,msa8=on,msa7=on,msa6=on,msa5=on,msa4=on, \ msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,vxeh=on,vxpd=on, \ esop=on,iep=on,cte=on,ais=on,gs=on,zpci=on,sea_esop2=on,te=on,cmm=on ^ because cpu model is certainly there. Now the guest would start but migration would later fail because what we create now would never have been possible with 2.9. If libvirt could get information from QEMU depending on the machine version, this would make it work. But of course this has other issues. Christian
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
> Hi all, > > we recently encountered the problem that the 'host-model' [1] has to be > related to the machine type of a domain. We have following problem: > >Let's assume we've a z13 system with a QEMU 2.9 and we define a >domain using the default s390-virtio-ccw machine together with the >host-model CPU mode [1]. The definition will have the machine >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >in the domain definition. In a next step we upgrade to QEMU 2.10 >(first version to recognize z14). Everything is still fine, even >though the machine runs in 2.9 compatibility mode. Finally we >upgrade to a z14. As a consequence it is not possible to start the >domain anymore as the machine type doesn't support our CPU host >model (which is expanded at start time of the domain). Actually, what is the cause of that problem? I assume it is the gs feature (gs_allowed)? We should really avoid such things (..._allowed) for CPU model features in the future and clue all new such stuff to cpumodel_allowed. -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 20.10.2017 14:50, Jiri Denemark wrote: > On Fri, Oct 20, 2017 at 13:37:42 +0200, David Hildenbrand wrote: >> On 20.10.2017 13:09, Marc Hartmayer wrote: >>> we recently encountered the problem that the 'host-model' [1] has to be >>> related to the machine type of a domain. We have following problem: >>> >>>Let's assume we've a z13 system with a QEMU 2.9 and we define a >>>domain using the default s390-virtio-ccw machine together with the >>>host-model CPU mode [1]. The definition will have the machine >>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >>>in the domain definition. In a next step we upgrade to QEMU 2.10 >>>(first version to recognize z14). Everything is still fine, even >>>though the machine runs in 2.9 compatibility mode. Finally we >>>upgrade to a z14. As a consequence it is not possible to start the >>>domain anymore as the machine type doesn't support our CPU host >>>model (which is expanded at start time of the domain). >> >> Yes, the host model may vary depending on QEMU version and to some >> degree also on compatibility machines (although I always try to push >> people to not introduce any new such stuff). >> >> Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html >> >> "host-model >> The host-model mode is essentially a shortcut to copying host CPU >> definition from capabilities XML into domain XML. Since the CPU >> definition is copied just before starting a domain, exactly the same XML >> can be used on different hosts while still providing the best guest CPU >> each host supports." >> >> You're telling me that that copying does not happen, which seems to be >> the problematic part about this in my opinion. >> >> So I am not sure if probing anything else (as you noted below) is the >> correct thing to do. Copy it and you have a migration_safe and even >> static version of the _current_ host CPU. > > The thing is libvirt calls query-cpu-model-expansion to check what the > host CPU is. This 'host-model' CPU is replaced with the probed CPU model > when a domain starts. The problem described by Marc is that the probed > CPU model cannot be used universally with all machine types. So starting > a domain on such host with machine type s390-virtio-ccw-2.10 works, but > a domain with machine type s390-virtio-ccw-2.9 fails to start with the > same probed CPU model. > My assumption would be that the CPU model is copied into the XML when the domain fist starts. This is what the documentation describes. So when upgrading QEMU, the CPU model in the XML is still the same (z13), even though something different is now reported in the host info after upgrading QEMU (z14). In this case it would continue to work. The problem is that the CPU model is not copied into the XML doesn't remain there, right? It is suddenly replaced with a z14 host model. > Jirka > -- Thanks, David
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On Fri, Oct 20, 2017 at 13:37:42 +0200, David Hildenbrand wrote: > On 20.10.2017 13:09, Marc Hartmayer wrote: > > we recently encountered the problem that the 'host-model' [1] has to be > > related to the machine type of a domain. We have following problem: > > > >Let's assume we've a z13 system with a QEMU 2.9 and we define a > >domain using the default s390-virtio-ccw machine together with the > >host-model CPU mode [1]. The definition will have the machine > >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode > >in the domain definition. In a next step we upgrade to QEMU 2.10 > >(first version to recognize z14). Everything is still fine, even > >though the machine runs in 2.9 compatibility mode. Finally we > >upgrade to a z14. As a consequence it is not possible to start the > >domain anymore as the machine type doesn't support our CPU host > >model (which is expanded at start time of the domain). > > Yes, the host model may vary depending on QEMU version and to some > degree also on compatibility machines (although I always try to push > people to not introduce any new such stuff). > > Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html > > "host-model > The host-model mode is essentially a shortcut to copying host CPU > definition from capabilities XML into domain XML. Since the CPU > definition is copied just before starting a domain, exactly the same XML > can be used on different hosts while still providing the best guest CPU > each host supports." > > You're telling me that that copying does not happen, which seems to be > the problematic part about this in my opinion. > > So I am not sure if probing anything else (as you noted below) is the > correct thing to do. Copy it and you have a migration_safe and even > static version of the _current_ host CPU. The thing is libvirt calls query-cpu-model-expansion to check what the host CPU is. This 'host-model' CPU is replaced with the probed CPU model when a domain starts. The problem described by Marc is that the probed CPU model cannot be used universally with all machine types. So starting a domain on such host with machine type s390-virtio-ccw-2.10 works, but a domain with machine type s390-virtio-ccw-2.9 fails to start with the same probed CPU model. Jirka
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On Fri, Oct 20, 2017 at 13:09:26 +0200, Marc Hartmayer wrote: > we recently encountered the problem that the 'host-model' [1] has to be > related to the machine type of a domain. We have following problem: > >Let's assume we've a z13 system with a QEMU 2.9 and we define a >domain using the default s390-virtio-ccw machine together with the >host-model CPU mode [1]. The definition will have the machine >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >in the domain definition. In a next step we upgrade to QEMU 2.10 >(first version to recognize z14). Everything is still fine, even >though the machine runs in 2.9 compatibility mode. Finally we >upgrade to a z14. As a consequence it is not possible to start the >domain anymore as the machine type doesn't support our CPU host >model (which is expanded at start time of the domain). > > For determining the actual host-model the QMP command > 'query-cpu-model-expansion' is used. This is done once per QEMU binary > and the result of it is cached by libvirt. The problem with that is > that libvirt queries with the newest machine type of the QEMU binary > for the host CPU model. No, libvirt probes QEMU with -machine none. > We could now either probe the host CPU model for each QEMU binary + > machine type combination and for this we've to start a new QEMU > process each time. This is not really a viable solution. Jirka
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On 20.10.2017 13:09, Marc Hartmayer wrote: > On Thu, Oct 12, 2017 at 02:07 PM +0200, Jiri Denemark> wrote: >> On Mon, Oct 09, 2017 at 10:16:48 +0200, Marc Hartmayer wrote: >>> On Thu, Oct 05, 2017 at 02:11 PM +0200, Jiri Denemark >>> wrote: But it's going to be a bit complicated because we ask QEMU what the host CPU is and the interface we used would need to be enhanced to give us different results for all supported machine types so that we can select the right one when a domain is started. >>> >>> So how do we deal with this? >> >> I think we need to start discussing this on qemu-devel list. If I >> remember correctly, David Hildenbrand is the original author of the >> query-cpu-model-expansion QMP command. >> >> Please, Cc me so that the thread is more visible to me. >> >> Thanks, >> >> Jirka >> > > Hi all, > > we recently encountered the problem that the 'host-model' [1] has to be > related to the machine type of a domain. We have following problem: > >Let's assume we've a z13 system with a QEMU 2.9 and we define a >domain using the default s390-virtio-ccw machine together with the >host-model CPU mode [1]. The definition will have the machine >expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode >in the domain definition. In a next step we upgrade to QEMU 2.10 >(first version to recognize z14). Everything is still fine, even >though the machine runs in 2.9 compatibility mode. Finally we >upgrade to a z14. As a consequence it is not possible to start the >domain anymore as the machine type doesn't support our CPU host >model (which is expanded at start time of the domain). Yes, the host model may vary depending on QEMU version and to some degree also on compatibility machines (although I always try to push people to not introduce any new such stuff). Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html "host-model The host-model mode is essentially a shortcut to copying host CPU definition from capabilities XML into domain XML. Since the CPU definition is copied just before starting a domain, exactly the same XML can be used on different hosts while still providing the best guest CPU each host supports." You're telling me that that copying does not happen, which seems to be the problematic part about this in my opinion. So I am not sure if probing anything else (as you noted below) is the correct thing to do. Copy it and you have a migration_safe and even static version of the _current_ host CPU. > > For determining the actual host-model the QMP command > 'query-cpu-model-expansion' is used. This is done once per QEMU binary > and the result of it is cached by libvirt. The problem with that is > that libvirt queries with the newest machine type of the QEMU binary > for the host CPU model. But now we realized that we have to consider > the machine type of each domain for the determination of the host CPU > model of a domain. > > We could now either probe the host CPU model for each QEMU binary + > machine type combination and for this we've to start a new QEMU > process each time. Or we can add a QMP command/parameter that allows > us to determine the host CPU model(s) with respect to the passed > machine type and/or all supported machine types. > > The QMP command query-cpu-model-expansion is currently described in > qapi-schema.json as follows: > > > ## > # @query-cpu-model-expansion: > # > # Expands a given CPU model (or a combination of CPU model + additional > options) > # to different granularities, allowing tooling to get an understanding what a > # specific CPU model looks like in QEMU under a certain configuration. > # > # This interface can be used to query the "host" CPU model. > # > # The data returned by this command may be affected by: > # > # * QEMU version: CPU models may look different depending on the QEMU version. > # (Except for CPU models reported as "static" in query-cpu-definitions.) > # * machine-type: CPU model may look different depending on the machine-type. > # (Except for CPU models reported as "static" in query-cpu-definitions.) > # * machine options (including accelerator): in some architectures, CPU models > # may look different depending on machine and accelerator options. (Except for > # CPU models reported as "static" in query-cpu-definitions.) > # * "-cpu" arguments and global properties: arguments to the -cpu option and > # global properties may affect expansion of CPU models. Using > # query-cpu-model-expansion while using these is not advised. > # > # Some architectures may not support all expansion types. s390x supports > # "full" and "static". > # > # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models > is > # not supported, if the model cannot be expanded, if the model contains > # an unknown CPU definition name, unknown properties or properties > # with a wrong type. Also
Re: [Qemu-devel] [libvirt] Question about the host-model CPU mode
On Thu, Oct 12, 2017 at 02:07 PM +0200, Jiri Denemarkwrote: > On Mon, Oct 09, 2017 at 10:16:48 +0200, Marc Hartmayer wrote: >> On Thu, Oct 05, 2017 at 02:11 PM +0200, Jiri Denemark >> wrote: >> > But it's going to be a bit complicated because we ask QEMU what the >> > host CPU is and the interface we used would need to be enhanced to >> > give us different results for all supported machine types so that we >> > can select the right one when a domain is started. >> >> So how do we deal with this? > > I think we need to start discussing this on qemu-devel list. If I > remember correctly, David Hildenbrand is the original author of the > query-cpu-model-expansion QMP command. > > Please, Cc me so that the thread is more visible to me. > > Thanks, > > Jirka > Hi all, we recently encountered the problem that the 'host-model' [1] has to be related to the machine type of a domain. We have following problem: Let's assume we've a z13 system with a QEMU 2.9 and we define a domain using the default s390-virtio-ccw machine together with the host-model CPU mode [1]. The definition will have the machine expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode in the domain definition. In a next step we upgrade to QEMU 2.10 (first version to recognize z14). Everything is still fine, even though the machine runs in 2.9 compatibility mode. Finally we upgrade to a z14. As a consequence it is not possible to start the domain anymore as the machine type doesn't support our CPU host model (which is expanded at start time of the domain). For determining the actual host-model the QMP command 'query-cpu-model-expansion' is used. This is done once per QEMU binary and the result of it is cached by libvirt. The problem with that is that libvirt queries with the newest machine type of the QEMU binary for the host CPU model. But now we realized that we have to consider the machine type of each domain for the determination of the host CPU model of a domain. We could now either probe the host CPU model for each QEMU binary + machine type combination and for this we've to start a new QEMU process each time. Or we can add a QMP command/parameter that allows us to determine the host CPU model(s) with respect to the passed machine type and/or all supported machine types. The QMP command query-cpu-model-expansion is currently described in qapi-schema.json as follows: ## # @query-cpu-model-expansion: # # Expands a given CPU model (or a combination of CPU model + additional options) # to different granularities, allowing tooling to get an understanding what a # specific CPU model looks like in QEMU under a certain configuration. # # This interface can be used to query the "host" CPU model. # # The data returned by this command may be affected by: # # * QEMU version: CPU models may look different depending on the QEMU version. # (Except for CPU models reported as "static" in query-cpu-definitions.) # * machine-type: CPU model may look different depending on the machine-type. # (Except for CPU models reported as "static" in query-cpu-definitions.) # * machine options (including accelerator): in some architectures, CPU models # may look different depending on machine and accelerator options. (Except for # CPU models reported as "static" in query-cpu-definitions.) # * "-cpu" arguments and global properties: arguments to the -cpu option and # global properties may affect expansion of CPU models. Using # query-cpu-model-expansion while using these is not advised. # # Some architectures may not support all expansion types. s390x supports # "full" and "static". # # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is # not supported, if the model cannot be expanded, if the model contains # an unknown CPU definition name, unknown properties or properties # with a wrong type. Also returns an error if an expansion type is # not supported. # # Since: 2.8.0 ## { 'command': 'query-cpu-model-expansion', 'data': { 'type': 'CpuModelExpansionType', 'model': 'CpuModelInfo' }, 'returns': 'CpuModelExpansionInfo' } 4:46:25 PM ◄ qapi-schema.json What do you think? Thank you. Marc [1] https://libvirt.org/formatdomain.html#elementsCPU -- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [Qemu-devel] HDMI Question
On Wed, Sep 27, 2017 at 5:31 AM, Ni, Xingrongwrote: > Hello, > > I am using qemu and hit a question and would like to ask here, can we > simulate a HDMI device in Qemu, if so, how to do it? Hello, I don't see any reason why you can't. We (Xilinx) have a display port device in QEMU that we use. Basically how it works is you model the devices registers and use QEMUs display backend (usually over SDL or VNC) to display the image. Here is a pretty recent example of a display port device in QEMU: https://github.com/qemu/qemu/blob/master/hw/display/xlnx_dp.c In saying that display devices are pretty intense and this will not be a trivial task. Thanks, Alistair > > > Thanks & Best Wishes! > Rocky > >
Re: [Qemu-devel] question:a question about throttle and hot-unplug
On Thu 14 Sep 2017 11:41:40 AM CEST, WangJie (Captain) wrote: > the patch you commited: > https://github.com/qemu/qemu/commit/7ca7f0f6db1fedd28d490795d778cf23979a2aa7#diff-ea36ba0f79150cc299732696a069caba > > remove blk_io_limits_disable from blk_remove_bs > > Then, if a disk which configured qos hot-unplug from VM, the backend > of the disk reminds in throttle group. > > So when I hot-unplug and hot-plug a disk, and use the same throttle > group name, will lead to qemu crash. > > and Eric committed a path as fallow fixed the bug on qemu-kvm 2.9.0-rc4: > https://github.com/qemu/qemu/commit/1606e4cf8a976513ecac70ad6642a7ec45744cf5#diff-7cb66df56045598b75a219eebc27efb6 > > Is what I said as below correct? That sounds correct to me. Here's the original discussion for reference: https://lists.gnu.org/archive/html/qemu-block/2017-04/msg00019.html Berto
Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest
Michael- OK, I was confused until reading your email but I have now been able to get it to work. Unfortunately, I find that it is *very* unreliable when trying to mount the share; It only works sometimes. See attached screenshot. Do you have any suggestions on how to fix this problem? I am using qemu 2.9.0. Best Regards / Mit freundlichen Grüßen / 敬具, Mark. -Original Message- From: Michael Tokarev [mailto:m...@tls.msk.ru] Sent: Sunday, May 14, 2017 02:47 To: FONNEMANN Mark <mark.fonnem...@hexagon.com>; qemu-devel@nongnu.org Subject: Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest 07.05.2017 00:30, FONNEMANN Mark write: > Hello- > > I am trying to shares files between Windows host and Linux guest using > slirp’s built-in SMB server. There's no such thing as "slirp built-in SMB Server". Qemu does not have an SMB server built-in, qemu uses samba to provide smb functionality, it spawns smbd. So you have to have smbd on windows in order to use this functionality. However, on windows, it's easy to use native CIFS server, just share the directory you want to share and connect to your windows machine from within guest. /mjt
Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest
07.05.2017 00:30, FONNEMANN Mark write: > Hello- > > I am trying to shares files between Windows host and Linux guest using > slirp’s built-in SMB server. There's no such thing as "slirp built-in SMB Server". Qemu does not have an SMB server built-in, qemu uses samba to provide smb functionality, it spawns smbd. So you have to have smbd on windows in order to use this functionality. However, on windows, it's easy to use native CIFS server, just share the directory you want to share and connect to your windows machine from within guest. /mjt
Re: [Qemu-devel] slirp: question about using SMB on windows host with linux guest
>I am trying to shares files between Windows host and Linux guest using slirp’s >built-in SMB server. >I start QEMU 2.9.0 using the following: > >C:\Program Files\qemu>qemu-system-i386.exe -readconfig >\Users\mfonnemann\qemu.cfg -net nic -net user,smb=\Users\mfonnemann\smb > >I then try to mount the SMB share first with 10.0.2.4: > >root@qemu:~# mount -t cifs //10.0.2.4/Users/mfonnemann/smb tmp -o >user=mfonnemann >Password: > CIFS VFS: Error connecting to IPv4 socket. Aborting > operation > CIFS VFS: cifs_mount failed w/return code = -113 >mount error 113 = No route to host >mount: mount.cifs failed with status -1 > >And then with 10.0.2.2: > >root@qemu:~# mount -t cifs //10.0.2.2/Users/mfonnemann/smb tmp -o >user=mfonnemann >Password: >CIFS VFS: Unknown RFC 1002 frame > > CIFS VFS: Send error in SessSetup = -13 > CIFS VFS: cifs_mount failed w/return code = -13 >mount error 13 = Permission denied >mount: mount.cifs failed with status -1 > >Regarding RFC 1002 frame, I see the following when using “dmesg”: > >CIFS VFS: Unknown RFC 1002 frame > Received Data: : dump of 36 bytes of data at 0xf71400c0 > > 008d 57770064 6e006900 6f006400 . . . . d . w W . i . > n . d . o > 73007700 31002000 20003000 72005000 . w . s . . 1 . 0 . > . P . r > 20006f00 . o . I just retried tried both again using Windows 7 Professional and Windows 10 Professional. I also tried using QEMU 2.8 as well. Not sure what else to try.
Re: [Qemu-devel] quick question about live VM migration
On Mon, Apr 10, 2017 at 01:07:14PM -0700, Vinod Chegu wrote: > Hello, > > A very basic question... > > In a vanilla KVM env. (i.e. non open stack etc) when a VM is live migrated > to another host does the MAC address of the vNICs and the DHCP IP address > for these interfaces remain the same ? *All* aspects of the virtual machine hardware *must* remain the same. If you have any differences in configuration of QEMU on the target host you are likely to either prevent live migration working, or break the guest OS after migration is finished. This includes MAC address, PCI addresses, device versions, etc, etc > Thought the answer was a yes as the goal is to minimize disruption to the > apps etc. in the VM, but wanted to double check (couldn't verify as I don't > have a ready test env handy at the moment). > > BTW, please assume that the MAC address for the vNIC interfaces were not > explicitly specified at the time of VM creation. It is the responsibility of whoever/whatever launches QEMU on the target host to ensure the hardware config is identical. The only safe way todo this is ensure you specify every single aspect of guest hardware when launching QEMU, including the MAC address. FWIW, if you are using libvirt to manage QEMU, then libvirt goes to great lengths to ensure this happens (which is why QEMU command lines generated by libvirt are so huge :-) Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :|
Re: [Qemu-devel] [libvirt] Question regarding Snapshots
On 02/27/2017 07:17 AM, Kashyap Chamarthy wrote: > On Mon, Feb 27, 2017 at 01:33:56PM +0100, Kashyap Chamarthy wrote: > > [...] > Can you make a snapshot, then go back to the base and ignore that snapshot like a closed branch, make another snapshot, and then go back and forth from each to each? Yes. Or graphically, you can have: -- A ... B ... C / base < \ -- D ... E ... F and toggle between any of the named snapshots at will. As Kashyap and others pointed out, libvirt support for reverting to a particular external snapshot is not fully present, so there are some manual steps involved, but the task is still doable. > >> For the long answer to the complications involved in reverting to >> external snapshots, refer this (long read): >> >> https://wiki.libvirt.org/page/I_created_an_external_snapshot,_but_libvirt_will_not_let_me_delete_or_revert_to_it > > [...] > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [libvirt] Question regarding Snapshots
On Mon, Feb 27, 2017 at 01:33:56PM +0100, Kashyap Chamarthy wrote: [...] > > > Can you make a snapshot, then go back to the base and ignore that snapshot > > > like a closed branch, make another snapshot, and then go back and forth > > > from each to each? > > If you edit the '--disk' element as above and point to the right > 'snapshot', you should be able to switch between several of them. [...] > And conveniently, you can tell libvirt to not track the metadata: > > $ virsh snapshot-create-as --domain vm1 guest-state1 \ > --diskspec vda,file=/export/overlay1.qcow2 \ > --disk-only --atomic --no-metadata > > This way, libvirt will not track 'overlay1.qcow2'. But if you do need > it, as you know, just update the Err, unfinished sentence there. Meant to write: just update the 'source file' attribute in the guest XML, pointing to the said disk. > For the long answer to the complications involved in reverting to > external snapshots, refer this (long read): > > https://wiki.libvirt.org/page/I_created_an_external_snapshot,_but_libvirt_will_not_let_me_delete_or_revert_to_it [...] -- /kashyap
Re: [Qemu-devel] A question about PCI device address spaces
On Mon, Dec 26, 2016 at 01:01:34PM +0200, Marcel Apfelbaum wrote: > On 12/22/2016 11:42 AM, Peter Xu wrote: > > Hello, > > > > Hi Peter, > > > Since this is a general topic, I picked it out from the VT-d > > discussion and put it here, just want to be more clear of it. > > > > The issue is, whether we have exposed too much address spaces for > > emulated PCI devices? > > > > Now for each PCI device, we are having PCIDevice::bus_master_as for > > the device visible address space, which derived from > > pci_device_iommu_address_space(): > > > > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > > { > > PCIBus *bus = PCI_BUS(dev->bus); > > PCIBus *iommu_bus = bus; > > > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); > > } > > if (iommu_bus && iommu_bus->iommu_fn) { > > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, > > dev->devfn); > > } > > return _space_memory; > > } > > > > By default (for no-iommu case), it's pointed to system memory space, > > which includes MMIO, and looks wrong - PCI device should not be able to > > write to MMIO regions. > > > > Why? As far as I know a PCI device can start a read/write transaction > to virtually any address, it doesn't matter if it 'lands' in RAM or a MMIO > region mapped by other device. But I might be wrong, need to read the spec > again... So as I noted in another mail, my earlier comment which led Peter to say that was misleading. In particular I was talking about *non PCI* MMIO devices, which barely exist on x86 (and even there the statement won't necessarily be true). > The PCI transaction will eventually reach the Root Complex/PCI host bridge > where an IOMMU or some other hw entity can sanitize/translate, but is out of > the scope of the device itself. Right, but we're not talking about the device, or purely within PCI address space. We're explicitly talking about what addresses the RC/host bridge will translate between PCI space and CPU address space. I'm betting that even on x86, it won't be the whole 64-bit address space (otherwise how would the host bridge know whether another PCI device might be listening on that address). > The Root Complex will 'translate' the transaction into a memory read/write > in the behalf of the device and pass it to the memory controller. > If the transaction target is another device, I am not sure if the > Root Complex will re-route by itself or pass it to the Memory Controller. It will either re-route itself, or simply drop it, possibly depending on configuration. I'm sure the MC won't be bouncing transactions back to PCI space. Note that for vanilla PCI the question is moot - the cycle will be broadcast on the bus segment and something will pick it up - either a device or the host bridge. If multiple things try to respond to the same addresses, things will go badly wrong. > > As an example, if we dump a PCI device address space into detail on > > x86_64 system, we can see (this is address space for a virtio-net-pci > > device on an Q35 machine with 6G memory): > > > > -0009 (prio 0, RW): pc.ram > > 000a-000a (prio 1, RW): vga.vram > > 000b-000b (prio 1, RW): vga-lowmem > > 000c-000c9fff (prio 0, RW): pc.ram > > 000ca000-000ccfff (prio 0, RW): pc.ram > > 000cd000-000ebfff (prio 0, RW): pc.ram > > 000ec000-000e (prio 0, RW): pc.ram > > 000f-000f (prio 0, RW): pc.ram > > 0010-7fff (prio 0, RW): pc.ram > > b000-bfff (prio 0, RW): pcie-mmcfg-mmio > > fd00-fdff (prio 1, RW): vga.vram > > fe00-fe000fff (prio 0, RW): virtio-pci-common > > fe001000-fe001fff (prio 0, RW): virtio-pci-isr > > fe002000-fe002fff (prio 0, RW): virtio-pci-device > > fe003000-fe003fff (prio 0, RW): virtio-pci-notify > > febd0400-febd041f (prio 0, RW): vga ioports remapped > > febd0500-febd0515 (prio 0, RW): bochs dispi interface > > febd0600-febd0607 (prio 0, RW): qemu extended regs > > febd1000-febd102f (prio 0, RW): msix-table > > febd1800-febd1807 (prio 0, RW): msix-pba > > febd2000-febd2fff (prio 1, RW): ahci > > fec0-fec00fff (prio 0, RW): kvm-ioapic > > fed0-fed003ff (prio 0, RW): hpet > > fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio > > fee0-feef (prio 4096, RW): kvm-apic-msi > > fffc- (prio 0, R-): pc.bios > > 0001-0001 (prio 0, RW): pc.ram > > > >
Re: [Qemu-devel] A question about PCI device address spaces
On 12/22/2016 11:42 AM, Peter Xu wrote: Hello, Hi Peter, Since this is a general topic, I picked it out from the VT-d discussion and put it here, just want to be more clear of it. The issue is, whether we have exposed too much address spaces for emulated PCI devices? Now for each PCI device, we are having PCIDevice::bus_master_as for the device visible address space, which derived from pci_device_iommu_address_space(): AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) { PCIBus *bus = PCI_BUS(dev->bus); PCIBus *iommu_bus = bus; while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); } if (iommu_bus && iommu_bus->iommu_fn) { return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn); } return _space_memory; } By default (for no-iommu case), it's pointed to system memory space, which includes MMIO, and looks wrong - PCI device should not be able to write to MMIO regions. Why? As far as I know a PCI device can start a read/write transaction to virtually any address, it doesn't matter if it 'lands' in RAM or a MMIO region mapped by other device. But I might be wrong, need to read the spec again... The PCI transaction will eventually reach the Root Complex/PCI host bridge where an IOMMU or some other hw entity can sanitize/translate, but is out of the scope of the device itself. The Root Complex will 'translate' the transaction into a memory read/write in the behalf of the device and pass it to the memory controller. If the transaction target is another device, I am not sure if the Root Complex will re-route by itself or pass it to the Memory Controller. As an example, if we dump a PCI device address space into detail on x86_64 system, we can see (this is address space for a virtio-net-pci device on an Q35 machine with 6G memory): -0009 (prio 0, RW): pc.ram 000a-000a (prio 1, RW): vga.vram 000b-000b (prio 1, RW): vga-lowmem 000c-000c9fff (prio 0, RW): pc.ram 000ca000-000ccfff (prio 0, RW): pc.ram 000cd000-000ebfff (prio 0, RW): pc.ram 000ec000-000e (prio 0, RW): pc.ram 000f-000f (prio 0, RW): pc.ram 0010-7fff (prio 0, RW): pc.ram b000-bfff (prio 0, RW): pcie-mmcfg-mmio fd00-fdff (prio 1, RW): vga.vram fe00-fe000fff (prio 0, RW): virtio-pci-common fe001000-fe001fff (prio 0, RW): virtio-pci-isr fe002000-fe002fff (prio 0, RW): virtio-pci-device fe003000-fe003fff (prio 0, RW): virtio-pci-notify febd0400-febd041f (prio 0, RW): vga ioports remapped febd0500-febd0515 (prio 0, RW): bochs dispi interface febd0600-febd0607 (prio 0, RW): qemu extended regs febd1000-febd102f (prio 0, RW): msix-table febd1800-febd1807 (prio 0, RW): msix-pba febd2000-febd2fff (prio 1, RW): ahci fec0-fec00fff (prio 0, RW): kvm-ioapic fed0-fed003ff (prio 0, RW): hpet fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio fee0-feef (prio 4096, RW): kvm-apic-msi fffc- (prio 0, R-): pc.bios 0001-0001 (prio 0, RW): pc.ram So here are the "pc.ram" regions the only ones that we should expose to PCI devices? (it should contain all of them, including the low-mem ones and the >=4g one) As I previously said, it does not have to be RAM only, but let's wait also for Michael's opinion. And, should this rule work for all platforms? The PCI rules should be generic for all platforms, but I don't know the other platforms. Thanks, Marcel Or say, would it be a problem if I directly change address_space_memory in pci_device_iommu_address_space() into something else, which only contains RAMs? (of course this won't affect any platform that has IOMMU, aka, customized PCIBus::iommu_fn function) (btw, I'd appreciate if anyone has quick answer on why we have lots of continuous "pc.ram" in low 2g range - from can_merge() I guess they seem to have different dirty_log_mask, romd_mode, etc., but I still would like to know why they are having these difference. Anyway, this is totally an "optional question" just to satisfy my own curiosity :) Thanks, -- peterx
Re: [Qemu-devel] A question about PCI device address spaces
On Fri, Dec 23, 2016 at 11:21:53AM +, Peter Maydell wrote: > On 22 December 2016 at 09:42, Peter Xuwrote: > > Hello, > > > > Since this is a general topic, I picked it out from the VT-d > > discussion and put it here, just want to be more clear of it. > > > > The issue is, whether we have exposed too much address spaces for > > emulated PCI devices? > > > > Now for each PCI device, we are having PCIDevice::bus_master_as for > > the device visible address space, which derived from > > pci_device_iommu_address_space(): > > > > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > > { > > PCIBus *bus = PCI_BUS(dev->bus); > > PCIBus *iommu_bus = bus; > > > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); > > } > > if (iommu_bus && iommu_bus->iommu_fn) { > > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, > > dev->devfn); > > } > > return _space_memory; > > } > > > > By default (for no-iommu case), it's pointed to system memory space, > > which includes MMIO, and looks wrong - PCI device should not be able to > > write to MMIO regions. > > This is just legacy, I think, ie a combination of "this used to > be system memory space so let's not break things" and "PC works > mostly like this". It should be possible for the PCI host bridge > emulation to set things up so that the device's visible address > space is whatever it feels like. The PCI APIs we have for doing > this have "iommu" in the name but they work just as well even > if the host bridge doesn't actually have an iommu and is just > setting up a fixed or slightly configurable mapping. > I think it just hasn't been implemented because for guests which > aren't misbehaving it doesn't make any difference. Hmm, yes I see ppc e500 is using that for setting up its own address space, possibly x86 can leverage it too when we really need this. For now, I see no strong reason for this enhancement, so let me keep it as it is, and wait until we have both a strong reason and a PCI guru. Thank you for your answer! (to Paolo/David as well) -- peterx
Re: [Qemu-devel] A question about PCI device address spaces
On 22 December 2016 at 09:42, Peter Xuwrote: > Hello, > > Since this is a general topic, I picked it out from the VT-d > discussion and put it here, just want to be more clear of it. > > The issue is, whether we have exposed too much address spaces for > emulated PCI devices? > > Now for each PCI device, we are having PCIDevice::bus_master_as for > the device visible address space, which derived from > pci_device_iommu_address_space(): > > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > { > PCIBus *bus = PCI_BUS(dev->bus); > PCIBus *iommu_bus = bus; > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); > } > if (iommu_bus && iommu_bus->iommu_fn) { > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn); > } > return _space_memory; > } > > By default (for no-iommu case), it's pointed to system memory space, > which includes MMIO, and looks wrong - PCI device should not be able to > write to MMIO regions. This is just legacy, I think, ie a combination of "this used to be system memory space so let's not break things" and "PC works mostly like this". It should be possible for the PCI host bridge emulation to set things up so that the device's visible address space is whatever it feels like. The PCI APIs we have for doing this have "iommu" in the name but they work just as well even if the host bridge doesn't actually have an iommu and is just setting up a fixed or slightly configurable mapping. I think it just hasn't been implemented because for guests which aren't misbehaving it doesn't make any difference. thanks -- PMM
Re: [Qemu-devel] A question about PCI device address spaces
On Thu, Dec 22, 2016 at 05:42:40PM +0800, Peter Xu wrote: > Hello, > > Since this is a general topic, I picked it out from the VT-d > discussion and put it here, just want to be more clear of it. > > The issue is, whether we have exposed too much address spaces for > emulated PCI devices? > > Now for each PCI device, we are having PCIDevice::bus_master_as for > the device visible address space, which derived from > pci_device_iommu_address_space(): > > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > { > PCIBus *bus = PCI_BUS(dev->bus); > PCIBus *iommu_bus = bus; > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); > } > if (iommu_bus && iommu_bus->iommu_fn) { > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn); > } > return _space_memory; > } > > By default (for no-iommu case), it's pointed to system memory space, > which includes MMIO, and looks wrong - PCI device should not be able to > write to MMIO regions. Sorry, I've realized my earlier comments were a bit misleading. I'm pretty sure the inbound (==DMA) window(s) will be less than the full 64-bit address space. However, that doesn't necessarily mean it won't cover *any* MMIO. Plus, of course, any MMIO that's provided by PCI (or legacy ISA) devices - and on the PC platform, that's nearly everything - will also be visible in PCI space, since it doesn't need to go through the inbound window for that at all. Strictly speaking PCI-provided MMIO may not appear at the same address in PCI space as it does in the system memory space, but for PC they will be. By platform convention the outbound windows are also identity mappings. Part of the reason I was misleading was that I was thinking of non-PC platforms, which often have more "native" MMIO devices on the CPU side of of the PCI host bridge. > As an example, if we dump a PCI device address space into detail on > x86_64 system, we can see (this is address space for a virtio-net-pci > device on an Q35 machine with 6G memory): > > -0009 (prio 0, RW): pc.ram > 000a-000a (prio 1, RW): vga.vram > 000b-000b (prio 1, RW): vga-lowmem > 000c-000c9fff (prio 0, RW): pc.ram > 000ca000-000ccfff (prio 0, RW): pc.ram > 000cd000-000ebfff (prio 0, RW): pc.ram > 000ec000-000e (prio 0, RW): pc.ram > 000f-000f (prio 0, RW): pc.ram > 0010-7fff (prio 0, RW): pc.ram > b000-bfff (prio 0, RW): pcie-mmcfg-mmio > fd00-fdff (prio 1, RW): vga.vram > fe00-fe000fff (prio 0, RW): virtio-pci-common > fe001000-fe001fff (prio 0, RW): virtio-pci-isr > fe002000-fe002fff (prio 0, RW): virtio-pci-device > fe003000-fe003fff (prio 0, RW): virtio-pci-notify > febd0400-febd041f (prio 0, RW): vga ioports remapped > febd0500-febd0515 (prio 0, RW): bochs dispi interface > febd0600-febd0607 (prio 0, RW): qemu extended regs > febd1000-febd102f (prio 0, RW): msix-table > febd1800-febd1807 (prio 0, RW): msix-pba > febd2000-febd2fff (prio 1, RW): ahci > fec0-fec00fff (prio 0, RW): kvm-ioapic > fed0-fed003ff (prio 0, RW): hpet > fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio > fee0-feef (prio 4096, RW): kvm-apic-msi > fffc- (prio 0, R-): pc.bios > 0001-0001 (prio 0, RW): pc.ram > > So here are the "pc.ram" regions the only ones that we should expose > to PCI devices? (it should contain all of them, including the low-mem > ones and the >=4g one) > > And, should this rule work for all platforms? Or say, would it be a > problem if I directly change address_space_memory in > pci_device_iommu_address_space() into something else, which only > contains RAMs? (of course this won't affect any platform that has > IOMMU, aka, customized PCIBus::iommu_fn function) No, the arragement of both inbound and outbound windows is certainly platform dependent (strictly speaking, dependent on the model and configuration of the host bridge, but that tends to be tied strongly to the platform). I think address_space_memory is the closest approximation we're going to get that works for multiple platforms - having both inbound and outbound windows identity mapped is pretty common, I believe, even if they don't strictly speaking cover the whole address space. > (btw, I'd appreciate if anyone has quick answer on why we have lots of > continuous "pc.ram" in low 2g range - from
Re: [Qemu-devel] A question about PCI device address spaces
On 22/12/2016 10:42, Peter Xu wrote: > Hello, > > Since this is a general topic, I picked it out from the VT-d > discussion and put it here, just want to be more clear of it. > > The issue is, whether we have exposed too much address spaces for > emulated PCI devices? > > Now for each PCI device, we are having PCIDevice::bus_master_as for > the device visible address space, which derived from > pci_device_iommu_address_space(): > > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > { > PCIBus *bus = PCI_BUS(dev->bus); > PCIBus *iommu_bus = bus; > > while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { > iommu_bus = PCI_BUS(iommu_bus->parent_dev->bus); > } > if (iommu_bus && iommu_bus->iommu_fn) { > return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn); > } > return _space_memory; > } > > By default (for no-iommu case), it's pointed to system memory space, > which includes MMIO, and looks wrong - PCI device should not be able to > write to MMIO regions. Hmm, I think that unless you describe that with ACS, PCI devices should be able to write to MMIO regions. Possibly not _all_ of them, I don't know (maybe they cannot write to MMCONFIG?) but after all they can write to the MSI region, and that is an MMIO region. If the IOMMU translate callback included the MemTxAttrs, ACS Source Validation could probably be implemented with an IOMMU region on the root complex. Most of the other ACS features either do not apply, or they are already implicit in the way that all PCI devices' address spaces are based on address_space_memory. Paolo > As an example, if we dump a PCI device address space into detail on > x86_64 system, we can see (this is address space for a virtio-net-pci > device on an Q35 machine with 6G memory): > > -0009 (prio 0, RW): pc.ram > 000a-000a (prio 1, RW): vga.vram > 000b-000b (prio 1, RW): vga-lowmem > 000c-000c9fff (prio 0, RW): pc.ram > 000ca000-000ccfff (prio 0, RW): pc.ram > 000cd000-000ebfff (prio 0, RW): pc.ram > 000ec000-000e (prio 0, RW): pc.ram > 000f-000f (prio 0, RW): pc.ram > 0010-7fff (prio 0, RW): pc.ram > b000-bfff (prio 0, RW): pcie-mmcfg-mmio > fd00-fdff (prio 1, RW): vga.vram > fe00-fe000fff (prio 0, RW): virtio-pci-common > fe001000-fe001fff (prio 0, RW): virtio-pci-isr > fe002000-fe002fff (prio 0, RW): virtio-pci-device > fe003000-fe003fff (prio 0, RW): virtio-pci-notify > febd0400-febd041f (prio 0, RW): vga ioports remapped > febd0500-febd0515 (prio 0, RW): bochs dispi interface > febd0600-febd0607 (prio 0, RW): qemu extended regs > febd1000-febd102f (prio 0, RW): msix-table > febd1800-febd1807 (prio 0, RW): msix-pba > febd2000-febd2fff (prio 1, RW): ahci > fec0-fec00fff (prio 0, RW): kvm-ioapic > fed0-fed003ff (prio 0, RW): hpet > fed1c000-fed1 (prio 1, RW): lpc-rcrb-mmio > fee0-feef (prio 4096, RW): kvm-apic-msi > fffc- (prio 0, R-): pc.bios > 0001-0001 (prio 0, RW): pc.ram > > So here are the "pc.ram" regions the only ones that we should expose > to PCI devices? (it should contain all of them, including the low-mem > ones and the >=4g one) > > And, should this rule work for all platforms? Or say, would it be a > problem if I directly change address_space_memory in > pci_device_iommu_address_space() into something else, which only > contains RAMs? (of course this won't affect any platform that has > IOMMU, aka, customized PCIBus::iommu_fn function) > > (btw, I'd appreciate if anyone has quick answer on why we have lots of > continuous "pc.ram" in low 2g range - from can_merge() I guess they > seem to have different dirty_log_mask, romd_mode, etc., but I still > would like to know why they are having these difference. Anyway, this > is totally an "optional question" just to satisfy my own curiosity :) > > Thanks, > > -- peterx >
Re: [Qemu-devel] virtIO question
On Mon, Nov 14, 2016 at 08:36:39PM +0800, zhun...@gmail.com wrote: > I have a question about qemu.is it a bug in qemu version 1.2? > in qemu version 1.2 ,it set avail event by the code : > if (vq->vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) { > vring_avail_event(vq, vring_avail_idx(vq)); > } > and in version 2.7 the code is > if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) { > vring_set_avail_event(vq, vq->last_avail_idx); > } > > a big difference of this is the value.vring_avail_idx(vq)is the latest value > of VRingAvail.idx,and vq->last_avail_idx is not, I think it really different > with the two different values,and I think the later is right,is it?? qemu.git/master has both vring_set_avail_event(vq, vring_avail_idx(vq)) and vring_set_avail_event(vq, vq->last_avail_idx) so I'm not sure what you are referring to. Please use git-blame(1) and git-log(1) to investigate changes to the code yourself. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] virtIO question
On Sat, Nov 12, 2016 at 04:43:21PM +0800, zhun...@gmail.com wrote: > Thanks,the expression is not the key problem,I just write it wrong,the key > problem is that what I get from the code is everytime dirver add a sg ,it > will call virtqueue_kick,such as network driver,in start_xmit function ,it > called xmit_skb generate a sg list and add it to the queue,then called > virtqueue_kick ,why it handle like this??can you explain it to me??thank you > very much!!! I can see there *are* cases in Linux 4.9-rc1 virtio_net.c:start_xmit() where virtqueue_kick() is skipped so that multiple tx packets can be added to the virtqueue in a single kick. static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) { ... bool kick = !skb->xmit_more; ... if (kick || netif_xmit_stopped(txq)) virtqueue_kick(sq->vq); ... } Also keep in mind that the virtio driver APIs in Linux are used by multiple device drivers (virtio_net.ko, virtio_blk.ko, etc). virtio_net.ko does not use all features offered by the API. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] virtIO question
Hello,teacher,I may be found where the driver add mutiple buffer to virtqueue and then kick qemu side. It is when driver use NAPI to poll the device to get buffers,and it is in receive queue.but in transmit queue,every time driver add a buffer to virtqueue,then kick function is called!!!why ??is qemu handle buffer faster than driver add it?? thank you very much! zhun...@gmail.com From: Stefan Hajnoczi Date: 2016-11-11 20:03 To: zhun...@gmail.com CC: qemu Subject: Re: Re: [Qemu-devel] virtIO question On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote: > From this point of view ,I think it make sense well, thank you very much! > but I have another question about notify mechanism between virtIO driver and > qemu. > according the source code of Linux and qemu, > when driver add a sg buffer to send queue named sq, > sq->vq->vring.avail->idx++ > vq->num_added++ > and then use virtqueue_kick_prepare to make sure if need notify qemu. > it (new_idx-event_idx)<(new_idx-old_idx) This expression is wrong. The specification and Linux code both say: (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx) Both the (u16) and the -1 matter. Maybe that's why you are confused by this? > if it is true,then notify other side. > However,every time driver add a sg,then virtqueue_kick_prepare is called,and > vq->num_added is reseted to 0,so in fact ,I think vq->num_added is always 0 > or 1。 A driver may add multiple buffers to the virtqueue by calling virtqueue_add_sgs() or similar functions multiple times before kicking. Therefore vq->num_added > 1 is possible. > as to qemu side,every time when pop a elem from virtqueue,it set > VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the > arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this > mechanism does not make sense You are basically asking "how does event_idx work?". The specification says: "The driver can ask the device to delay interrupts until an entry with an index specified by the “used_event” field is written in the used ring (equivalently, until the idx field in the used ring will reach the value used_event + 1)." and: "The device can ask the driver to delay notifications until an entry with an index specified by the “avail_event” field is written in the available ring (equivalently, until the idx field in the used ring will reach the value avail_event + 1)." Whenever the device or driver wants to notify, it first checks if the index update crossed the event index set by the other side.
Re: [Qemu-devel] virtIO question
Thanks,the expression is not the key problem,I just write it wrong,the key problem is that what I get from the code is everytime dirver add a sg ,it will call virtqueue_kick,such as network driver,in start_xmit function ,it called xmit_skb generate a sg list and add it to the queue,then called virtqueue_kick ,why it handle like this??can you explain it to me??thank you very much!!! zhun...@gmail.com From: Stefan Hajnoczi Date: 2016-11-11 20:03 To: zhun...@gmail.com CC: qemu Subject: Re: Re: [Qemu-devel] virtIO question On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote: > From this point of view ,I think it make sense well, thank you very much! > but I have another question about notify mechanism between virtIO driver and > qemu. > according the source code of Linux and qemu, > when driver add a sg buffer to send queue named sq, > sq->vq->vring.avail->idx++ > vq->num_added++ > and then use virtqueue_kick_prepare to make sure if need notify qemu. > it (new_idx-event_idx)<(new_idx-old_idx) This expression is wrong. The specification and Linux code both say: (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx) Both the (u16) and the -1 matter. Maybe that's why you are confused by this? > if it is true,then notify other side. > However,every time driver add a sg,then virtqueue_kick_prepare is called,and > vq->num_added is reseted to 0,so in fact ,I think vq->num_added is always 0 > or 1。 A driver may add multiple buffers to the virtqueue by calling virtqueue_add_sgs() or similar functions multiple times before kicking. Therefore vq->num_added > 1 is possible. > as to qemu side,every time when pop a elem from virtqueue,it set > VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the > arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this > mechanism does not make sense You are basically asking "how does event_idx work?". The specification says: "The driver can ask the device to delay interrupts until an entry with an index specified by the “used_event” field is written in the used ring (equivalently, until the idx field in the used ring will reach the value used_event + 1)." and: "The device can ask the driver to delay notifications until an entry with an index specified by the “avail_event” field is written in the available ring (equivalently, until the idx field in the used ring will reach the value avail_event + 1)." Whenever the device or driver wants to notify, it first checks if the index update crossed the event index set by the other side.
Re: [Qemu-devel] virtIO question
thanks,but i really do not found when the driver will add mutiple buffers then call kick function,this is the key problem . æ�¥è‡ª é…æ—� MX5 原始邮件 å�‘件人:Stefan Hajnoczi <stefa...@gmail.com> 时间:周五 11月11æ—¥ 20:03 收件人:zhun...@gmail.com 抄é€�:qemu <qemu-devel@nongnu.org> 主题:Re: Re: [Qemu-devel] virtIO question On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote: > From this point of view ,I think it make sense well, thank you very much! > but I have another question about notify mechanism between virtIO driver and qemu. > according the source code of Linux and qemu, > when driver add a sg buffer to send queue named sq, > sq->vq->vring.avail->idx++ > vq->num_added++ > and then use virtqueue_kick_prepare to make sure if need notify qemu. > it (new_idx-event_idx)<(new_idx-old_idx) This expression is wrong. The specification and Linux code both say: (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx) Both the (u16) and the -1 matter. Maybe that's why you are confused by this? > if it is true,then notify other side. > However,every time driver add a sg,then virtqueue_kick_prepare is called,and vq->num_added is reseted to 0,so in fact ,I think vq->num_added is always 0 or 1。 A driver may add multiple buffers to the virtqueue by calling virtqueue_add_sgs() or similar functions multiple times before kicking. Therefore vq->num_added > 1 is possible. > as to qemu side,every time when pop a elem from virtqueue,it set VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this mechanism does not make sense You are basically asking "how does event_idx work?". The specification says: "The driver can ask the device to delay interrupts until an entry with an index specified by the “used_eventâ€� field is written in the used ring (equivalently, until the idx field in the used ring will reach the value used_event + 1)." and: "The device can ask the driver to delay notiï¬�cations until an entry with an index specified by the “avail_eventâ€� field is written in the available ring (equivalently, until the idx field in the used ring will reach the value avail_event + 1)." Whenever the device or driver wants to notify, it first checks if the index update crossed the event index set by the other side. signature.asc Description: Binary data
Re: [Qemu-devel] virtIO question
On Thu, Nov 10, 2016 at 08:16:38PM +0800, zhun...@gmail.com wrote: > From this point of view ,I think it make sense well, thank you very much! > but I have another question about notify mechanism between virtIO driver and > qemu. > according the source code of Linux and qemu, > when driver add a sg buffer to send queue named sq, > sq->vq->vring.avail->idx++ > vq->num_added++ > and then use virtqueue_kick_prepare to make sure if need notify qemu. > it (new_idx-event_idx)<(new_idx-old_idx) This expression is wrong. The specification and Linux code both say: (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old_idx) Both the (u16) and the -1 matter. Maybe that's why you are confused by this? > if it is true,then notify other side. > However,every time driver add a sg,then virtqueue_kick_prepare is called,and > vq->num_added is reseted to 0,so in fact ,I think vq->num_added is always 0 > or 1。 A driver may add multiple buffers to the virtqueue by calling virtqueue_add_sgs() or similar functions multiple times before kicking. Therefore vq->num_added > 1 is possible. > as to qemu side,every time when pop a elem from virtqueue,it set > VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the > arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this > mechanism does not make sense You are basically asking "how does event_idx work?". The specification says: "The driver can ask the device to delay interrupts until an entry with an index specified by the “used_event” field is written in the used ring (equivalently, until the idx field in the used ring will reach the value used_event + 1)." and: "The device can ask the driver to delay notifications until an entry with an index specified by the “avail_event” field is written in the available ring (equivalently, until the idx field in the used ring will reach the value avail_event + 1)." Whenever the device or driver wants to notify, it first checks if the index update crossed the event index set by the other side. signature.asc Description: PGP signature
Re: [Qemu-devel] virtIO question
From this point of view ,I think it make sense well, thank you very much! but I have another question about notify mechanism between virtIO driver and qemu. according the source code of Linux and qemu, when driver add a sg buffer to send queue named sq, sq->vq->vring.avail->idx++ vq->num_added++ and then use virtqueue_kick_prepare to make sure if need notify qemu. it (new_idx-event_idx)<(new_idx-old_idx) if it is true,then notify other side. However,every time driver add a sg,then virtqueue_kick_prepare is called,and vq->num_added is reseted to 0,so in fact ,I think vq->num_added is always 0 or 1。 as to qemu side,every time when pop a elem from virtqueue,it set VRingUsed.ring[vring.num] to the lastest VRingAvail.idx, this according the arithmetic ((new_idx-event_idx)<(new_idx-old_idx)),it seems that this mechanism does not make sense I do not know if I describe it clearly.or can you give me an example to prove how it make sense!! thanks a lot! zhun...@gmail.com From: Stefan Hajnoczi Date: 2016-11-10 18:32 To: zhun...@gmail.com CC: jkhasdev; qemu Subject: Re: [Qemu-devel] virtIO question On Wed, Nov 09, 2016 at 06:58:16PM +0800, zhun...@gmail.com wrote: > I want to ask a another question,why a virt_queue in virtio include in_sgs > and out_sgs,for example,send_queue of virtIO net driver have in_sgs and > out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it > to use in_sgs?? You can think of every virtqueue buffer as having two scatter-gather lists: 1. out_sgs are driver->device buffers (e.g. tx packet payload) 2. in_sgs are device->driver buffers (e.g. rx packet payload) Look at the virtio-net ctrl virtqueue (see spec and virtio_net_handle_ctrl() for details). Each buffer has: 1. struct virtio_net_ctrl_hdr (out_sgs) 2. request-specific fields (out_sgs) 3. virtio_net_ctrl_ack status byte (in_sgs) The device parses the request and performs the operation. Then it fills in the result (success or error code) in the status byte. Processing ctrl virtqueue buffers therefore requires both guest memory reads (out_sgs) and writes (in_sgs). Most of the other virtio devices also use bi-directional buffers. This may not be obvious if you only consider the virtio-net tx virtqueue, for example, where buffers use out_sgs only. Hope this makes sense. If not, look at the specification again and think about how virtio-net ctrl request processing works.
Re: [Qemu-devel] virtIO question
On Wed, Nov 09, 2016 at 06:58:16PM +0800, zhun...@gmail.com wrote: > I want to ask a another question,why a virt_queue in virtio include in_sgs > and out_sgs,for example,send_queue of virtIO net driver have in_sgs and > out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it > to use in_sgs?? You can think of every virtqueue buffer as having two scatter-gather lists: 1. out_sgs are driver->device buffers (e.g. tx packet payload) 2. in_sgs are device->driver buffers (e.g. rx packet payload) Look at the virtio-net ctrl virtqueue (see spec and virtio_net_handle_ctrl() for details). Each buffer has: 1. struct virtio_net_ctrl_hdr (out_sgs) 2. request-specific fields (out_sgs) 3. virtio_net_ctrl_ack status byte (in_sgs) The device parses the request and performs the operation. Then it fills in the result (success or error code) in the status byte. Processing ctrl virtqueue buffers therefore requires both guest memory reads (out_sgs) and writes (in_sgs). Most of the other virtio devices also use bi-directional buffers. This may not be obvious if you only consider the virtio-net tx virtqueue, for example, where buffers use out_sgs only. Hope this makes sense. If not, look at the specification again and think about how virtio-net ctrl request processing works. signature.asc Description: PGP signature
Re: [Qemu-devel] virtIO question
I want to ask a another question,why a virt_queue in virtio include in_sgs and out_sgs,for example,send_queue of virtIO net driver have in_sgs and out_sgs,when transmit data,It add buffer to out_sgs of send_queue,but how it to use in_sgs?? zhun...@gmail.com From: jitendra kumar khasdev Date: 2016-11-05 23:41 To: Peter Maydell CC: zhun...@gmail.com; qemu-devel Subject: Re: [Qemu-devel] virtIO question Have you looked at the virtio specification? No. This describes the overall structure and communication mechanism, which QEMU and Linux each only implement one half of: http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html Thanks Peter. This doc looks me interesting. --- Jitendra
Re: [Qemu-devel] virtIO question
thanks a lot!I will read the document carefully! æ�¥è‡ª é…æ—� MX5 原始邮件 å�‘件人:jitendra kumar khasdevæ—¶é—´ï¼šå‘¨å… 11月5æ—¥ 23:11 收件人:Peter Maydell 抄é€�:zhun...@gmail.com,qemu-devel 主题:Re: [Qemu-devel] virtIO question Have you looked at the virtio specification? No.  This describes the overall structure and communication mechanism, which QEMU and Linux each only implement one half of: [1]http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html Thanks Peter. This doc looks me interesting. --- Jitendra References 1. http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
Re: [Qemu-devel] virtIO question
> > Have you looked at the virtio specification? No. > This describes > the overall structure and communication mechanism, which > QEMU and Linux each only implement one half of: > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html Thanks Peter. This doc looks me interesting. --- Jitendra
Re: [Qemu-devel] virtIO question
On 5 November 2016 at 14:46, jitendra kumar khasdevwrote: > here is what I find difficulty in understanding the code, > > 1. Qemu virtio datastructures are not so clear. > 2. I didn't find any sort high level design with respective qemu for virtIO. > 3. How qemu virtIO and linux virtIO does shared memoru communication. > > I will deeply thankful, if somebody help me to kick start this because I am > really willing to work on virtIO. I am hoping somebody in community will > help to give me some direction to achive the goal. Have you looked at the virtio specification? This describes the overall structure and communication mechanism, which QEMU and Linux each only implement one half of: http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html Hopefully it will provide the higher level overview that you are looking for. Many of the data structures and their purposes are also directly required and described by the spec. thanks -- PMM
Re: [Qemu-devel] virtIO question
Hi All, I am also exploring virtio implementation in qemu, I try to understand virtio in linux(guest) and qemu, but I am not able to understand I/O stack between linux to qemu. here is what I find difficulty in understanding the code, 1. Qemu virtio datastructures are not so clear. 2. I didn't find any sort high level design with respective qemu for virtIO. 3. How qemu virtIO and linux virtIO does shared memoru communication. I will deeply thankful, if somebody help me to kick start this because I am really willing to work on virtIO. I am hoping somebody in community will help to give me some direction to achive the goal. Jitendra On Sat, Nov 5, 2016 at 4:20 PM, zhun...@gmail.comwrote: > who can explain the means of idx in VRingUsed and VRingAvail structure > about virtIO?? > thanks! > > > > zhun...@gmail.com >
Re: [Qemu-devel] A question about virtual machine communication with Guest through serial device
On Mon, Oct 31, 2016 at 06:41:23AM +, Liu Pedroa wrote: > Hi Everyone. > > > I have a sample question. As the subset description. > > > Guest OS : Ubuntu OS > > QEMU : linux kernel 4.8 > > > so i want the virtual client OS can communication( can send and receive > messages through serial device) with Guest OS. what should i do? Add a chardev to your QEMU command-line (try -serial unix:/tmp/test.sock,server,nowait). Read the qemu(1) man page for details on chardevs and all the options. Inside the guest you need to access the serial port (e.g. /dev/ttyS0 on Linux but make sure no login tty is running on that port). On the host you can access the UNIX domain socket given on your QEMU command-line. That would be /tmp/test.sock in this example. Also consider using virtio-serial instead of the traditional ISA serial device. It allows you to add/remove named (e.g. org.myproject.agent.0) ports at run-time. Search online to find example syntax. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"
- Original Message - > From: "wangyunjian"> To: "Paolo Bonzini" > Cc: berra...@redhat.com, qemu-devel@nongnu.org, "caihe" > Sent: Monday, October 17, 2016 4:17:45 PM > Subject: RE: A question about this commit "char: convert from GIOChannel to > QIOChannel" > > In function tcp_chr_close, the s->sioc needs call object_unref() to be > released ? > > About the below code: > > @@ -3205,10 +3205,13 @@ static void tcp_chr_close(CharDriverState *chr) > s->listen_tag = 0; > } > if (s->listen_ioc) { > object_unref(OBJECT(s->listen_ioc)); > } > +if (s->sioc) { > +object_unref(OBJECT(s->sioc)); > +} > if (s->read_msgfds_num) { > for (i = 0; i < s->read_msgfds_num; i++) { > close(s->read_msgfds[i]); > } > g_free(s->read_msgfds); This is done in tcp_chr_free_connection. Maybe your tree does not have commit 5b498459b441f4639fd4c39c35345637bfc6c16c? Paolo > Thanks > > Yunjian > > -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Monday, October 17, 2016 9:20 PM > To: wangyunjian > Cc: berra...@redhat.com; qemu-devel@nongnu.org; caihe > Subject: Re: A question about this commit "char: convert from GIOChannel to > QIOChannel" > > > > - Original Message - > > From: "wangyunjian" > > To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org > > Cc: "caihe" > > Sent: Monday, October 17, 2016 3:02:32 PM > > Subject: A question about this commit "char: convert from GIOChannel to > > QIOChannel" > > > > Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from > > GIOChannel to QIOChannel”, the old version will call closesocket when > > tcp_chr_close and udp_chr_close are called. But the new version will > > not call closesocket. > > > > This can bring to a socket leak? > > Hi, closesocket is called in io/channel-socket.c > (qio_channel_socket_finalize). > > Thanks, > > Paolo >
Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"
In function tcp_chr_close, the s->sioc needs call object_unref() to be released ? About the below code: @@ -3205,10 +3205,13 @@ static void tcp_chr_close(CharDriverState *chr) s->listen_tag = 0; } if (s->listen_ioc) { object_unref(OBJECT(s->listen_ioc)); } +if (s->sioc) { +object_unref(OBJECT(s->sioc)); +} if (s->read_msgfds_num) { for (i = 0; i < s->read_msgfds_num; i++) { close(s->read_msgfds[i]); } g_free(s->read_msgfds); Thanks Yunjian -Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Monday, October 17, 2016 9:20 PM To: wangyunjian Cc: berra...@redhat.com; qemu-devel@nongnu.org; caihe Subject: Re: A question about this commit "char: convert from GIOChannel to QIOChannel" - Original Message - > From: "wangyunjian"> To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org > Cc: "caihe" > Sent: Monday, October 17, 2016 3:02:32 PM > Subject: A question about this commit "char: convert from GIOChannel to > QIOChannel" > > Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from > GIOChannel to QIOChannel”, the old version will call closesocket when > tcp_chr_close and udp_chr_close are called. But the new version will > not call closesocket. > > This can bring to a socket leak? Hi, closesocket is called in io/channel-socket.c (qio_channel_socket_finalize). Thanks, Paolo
Re: [Qemu-devel] A question about this commit "char: convert from GIOChannel to QIOChannel"
- Original Message - > From: "wangyunjian"> To: berra...@redhat.com, pbonz...@redhat.com, qemu-devel@nongnu.org > Cc: "caihe" > Sent: Monday, October 17, 2016 3:02:32 PM > Subject: A question about this commit "char: convert from GIOChannel to > QIOChannel" > > Commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 “char: convert from > GIOChannel to QIOChannel”, > the old version will call closesocket when tcp_chr_close and udp_chr_close > are called. But the new version will not call closesocket. > > This can bring to a socket leak? Hi, closesocket is called in io/channel-socket.c (qio_channel_socket_finalize). Thanks, Paolo
Re: [Qemu-devel] A question about postcopy safety
On Mon, Sep 05, 2016 at 02:52:14PM +0100, Dr. David Alan Gilbert wrote: > * liut...@yahoo.com (liut...@yahoo.com) wrote: > > Hi David, > > Hi Liutao, > > > I'm studying the process of postcopy migration, and I found that the memory > > pages migrated from source to destination are not encrypted. Does this make > > the VM vulnerable if it's memory has been tampered with during postcopy > > migration? > > > > I think precopy has less risk because the source's memory is always > > altering. If one page is tampered with during network transfer, with source > > still running, then a later version of that page may keep updating. So it > > would be quite difficult to track all different page versions, and tamper > > with the final version of one page. > > > > But when it comes to postcopy, the situation is riskier because one > > specific page is only transferred once. It's easy to capture all > > transferring memory pages, tamper and resend. > > I don't think there's much difference between precopy and postcopy for > security; > the only secure way to do migration is over an encrypted transport and that > solves > it for both precopy and postcopy. Agreed, there's no real world difference in the security of pre & post copy. If you care about security there's no avoiding the need to use an encrypted transport. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] A question about postcopy safety
* liut...@yahoo.com (liut...@yahoo.com) wrote: > Hi David, Hi Liutao, > I'm studying the process of postcopy migration, and I found that the memory > pages migrated from source to destination are not encrypted. Does this make > the VM vulnerable if it's memory has been tampered with during postcopy > migration? > > I think precopy has less risk because the source's memory is always altering. > If one page is tampered with during network transfer, with source still > running, then a later version of that page may keep updating. So it would be > quite difficult to track all different page versions, and tamper with the > final version of one page. > > But when it comes to postcopy, the situation is riskier because one specific > page is only transferred once. It's easy to capture all transferring memory > pages, tamper and resend. I don't think there's much difference between precopy and postcopy for security; the only secure way to do migration is over an encrypted transport and that solves it for both precopy and postcopy. I don't think it would be that hard for a malicious person to track the pages in precopy; and indeed what they could do is wait until an interesting page comes along (say one with a hash or the data they're interested in) and then insert a new version of that page later with their own nasty version on - postcopy wouldn't allow that second version. The challenge is to get a nice fast high speed encryption layer, and for post-copy it should have low added latency. > > When the memory been tampered with, the safety of the VM will be compromised. > > Any ideas? thank you!Liutao Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] A question about postcopy safety
On Mon, Aug 29, 2016 at 12:51:20PM +, liut...@yahoo.com wrote: > Hi David,I'm studying the process of postcopy migration, and I found > that the memory pages migrated from source to destination are not > encrypted. Does this make the VM vulnerable if it's memory has been > tampered with during postcopy migration? If you already haven't, you might want to take a look at this post, which discusses the security details during live migration with post-copy. https://www.berrange.com/posts/2016/08/16/improving-qemu-security-part-7-tls-support-for-migration/ It also has an example of setting the 'tls-creds' field with 'migrate-set-parameters' QMP command to use TLS, before triggering 'migrate' QMP command. > I think precopy has less risk because the source's memory is always > altering. If one page is tampered with during network transfer, with > source still running, then a later version of that page may keep > updating. So it would be quite difficult to track all different page > versions, and tamper with the final version of one page. > > But when it comes to postcopy, the situation is riskier because one > specific page is only transferred once. It's easy to capture all > transferring memory pages, tamper and resend. > > When the memory been tampered with, the safety of the VM will be > compromised. > > Any ideas? thank you!Liutao -- /kashyap
Re: [Qemu-devel] a question
On 26 August 2016 at 10:31, Michael Rolnikwrote: > I want to partially implement AT STK500 board in order to run FreeRTOS AVR > / ATMegaAVR demo. > if you look into ATmega32 documentation you will see that, for example, > Timer/Countet1 registers are held together at memory addresses [0x46 .. > 0xff], but interrupt masks and enable bits are at some other place, actually > 0x58 .. 0x59. (0x58..0x59 is a subset of 0x46..0xff -- I think from the docs that you meant 0x46..0x4f.) I think I would treat this set of timers as a single device that implements all the timers, and which implements also the timer interrupt mask/enable registers. Where there are multiple disjoint register sets you can do this by having the device provide multiple MemoryRegions, which the SoC container device then maps into the memory space at the right places. If you plan to support multiple CPUs which reuse some of the individual timer modules but not all of them, you can structure the code to allow convenient reuse, but that is an internal detail of the overall "timers" device. > create non memory mapped devices. Each device will export the following > functions > > avr__init > avr__read_ > avr__write_ > > create "big" memory mapped device, which will cover the whole IO space and > it will call _read_/_write_ functions of the devices when there is an access > to specific address in the IO memory space. You can do all this with the memory region API. Your various devices can provide multiple memory regions (implementing the bits of memory space that they care about), and then a higher level SoC container device can map those into the right places in a container memory region which it then exports to the board level. You don't need to invent a new API for having devices provide registers. thanks -- PMM
Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
On Thu, Aug 25, 2016 at 01:21:47PM +, Gaohaifeng (A) wrote: > On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote: > > Hi Daniel & Paolo, > > > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > > > the below code segment: > > > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > > void *opaque) > > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > > +void *opaque) > > > { > > > CharDriverState *chr = opaque; > > > TCPCharDriver *s = chr->opaque; > > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, > > > GIOCondition cond, void *opaque) > > > if (len > s->max_size) > > > len = s->max_size; > > > size = tcp_chr_recv(chr, (void *)buf, len); > > > -if (size == 0 || > > > -(size < 0 && > > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > > +if (size == 0 || size == -1) { > > > /* connection closed */ > > > tcp_chr_disconnect(chr); > > > } else if (size > 0) { > > > > > > The old version will not call tcp_chr_disconnect when error is not > > > EAGAIN. > > > The new version will call tcp_chr_disconnect when size==-1. From the > > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > > tcp_chr_disconnect. > > > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > > since link status is not up. The link is down because > > > tcp_chr_disconnect will set it. > > > > > > Can you explain it why EAGIN here changes the behavior ? > > > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an > > event loop watch so that tcp_chr_read is called when there is incoming data > > to be read. IOW, when tcp_chr_read is invoked, I would always expect that > > at least 1 byte is available, and so EAGAIN should not occurr. > > > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind > > of race with other code also consuming bytes from the same socket. > > It could easily reproduce this when repeating 'rmmod virtio_net;modprobe > virtio_net'. > The call stack: > #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 > #1 0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, > opaque=0x7f09d5b2d540) at qemu-char.c:2917 > #2 0x7f09d46364fa in qio_channel_fd_source_dispatch > (source=0x7f093e603b30, callback=0x7f09d43b1fcb , > user_data=0x7f09d5b2d540) at io/channel-watch.c:84 > #3 0x7f09d2ea in g_main_context_dispatch () from > /lib64/libglib-2.0.so.0 > #4 0x7f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 > #5 0x7f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at > main-loop.c:258 > #6 0x7f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 > #7 0x7f09d43bbdd2 in main_loop () at vl.c:2119 > #8 0x7f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, > envp=0x7ffe7ad66b70) at vl.c:4896 > (gdb) f 1 > #1 0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, > opaque=0x7f09d5b2d540) at qemu-char.c:2917 > 2917 tcp_chr_disconnect(chr); > (gdb) p errno > $1 = 11 > > EAGAIN = 11 I think what is happening is a race condition - the main event loop thread sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is triggering vhost_user_read() in a different thread which consumes the incoming I/O before tcp_chr_read can handle it. IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read to avoid the disconnects. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote: > Hi Daniel & Paolo, > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > the below code segment: > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > void *opaque) > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > +void *opaque) > > { > > CharDriverState *chr = opaque; > > TCPCharDriver *s = chr->opaque; > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, > > GIOCondition cond, void *opaque) > > if (len > s->max_size) > > len = s->max_size; > > size = tcp_chr_recv(chr, (void *)buf, len); > > -if (size == 0 || > > -(size < 0 && > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > +if (size == 0 || size == -1) { > > /* connection closed */ > > tcp_chr_disconnect(chr); > > } else if (size > 0) { > > > > The old version will not call tcp_chr_disconnect when error is not > > EAGAIN. > > The new version will call tcp_chr_disconnect when size==-1. From the > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > tcp_chr_disconnect. > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > since link status is not up. The link is down because > > tcp_chr_disconnect will set it. > > > > Can you explain it why EAGIN here changes the behavior ? > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event > loop watch so that tcp_chr_read is called when there is incoming data to be > read. IOW, when tcp_chr_read is invoked, I would always expect that at least > 1 byte is available, and so EAGAIN should not occurr. > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of > race with other code also consuming bytes from the same socket. It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'. The call stack: #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 #1 0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 #2 0x7f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb , user_data=0x7f09d5b2d540) at io/channel-watch.c:84 #3 0x7f09d2ea in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #4 0x7f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 #5 0x7f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258 #6 0x7f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 #7 0x7f09d43bbdd2 in main_loop () at vl.c:2119 #8 0x7f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896 (gdb) f 1 #1 0x7f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 2917tcp_chr_disconnect(chr); (gdb) p errno $1 = 11 EAGAIN = 11 > > Regards, > Daniel
Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
On Tue, Aug 23, 2016 at 08:57:44AM +, Gaohaifeng (A) wrote: > Hi Daniel & Paolo, > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > the below code segment: > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void > *opaque) > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void > *opaque) > { > CharDriverState *chr = opaque; > TCPCharDriver *s = chr->opaque; > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, > GIOCondition cond, void *opaque) > if (len > s->max_size) > len = s->max_size; > size = tcp_chr_recv(chr, (void *)buf, len); > -if (size == 0 || > -(size < 0 && > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > +if (size == 0 || size == -1) { > /* connection closed */ > tcp_chr_disconnect(chr); > } else if (size > 0) { > > The old version will not call tcp_chr_disconnect when error is not > EAGAIN. > The new version will call tcp_chr_disconnect when size==-1. From the > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > tcp_chr_disconnect. > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since > link status is not up. The link is down because tcp_chr_disconnect will > set it. > > Can you explain it why EAGIN here changes the behavior ? tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read. IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr. So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] A question about tb_next_offset[2]
On 26 July 2016 at 19:49, Kartik Ramkrishnanwrote: > Hello all, > >I am running an i386 binary in user mode using qemu. > >In the code, I am looking for the next location that the simulated > program counter jumps to when a TranslationBlock completes execution. This > address should be the guest address, not the address on the host. The place we go to is whatever the PC in the CPUState says at the point when we finish execution of the TB. > I found a variable called tb_next_offset[2] in 'struct TranslationBlock' in > exec-all.h , which says (offset of original jump target). It sounds like > the jump offset can be added to the target code's last address to obtain > the new PC value. This field was renamed earlier this year, so it's called jmp_reset_offset. It's part of an optimisation where rather than finishing the TB, and then going out to the top level loop and looking up the next TB from the guest PC, we can patch the first TB to directly jump to the second TB. jmp_reset_offset is the offset within the generated code of the branch instruction which we need to patch in order to create this direct link. It has nothing to do with the guest address. thanks -- PMM
Re: [Qemu-devel] A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2
Hi, I got it, thanks for your quick reply :) Regards, -Gonglei > -Original Message- > From: Daniel P. Berrange [mailto:berra...@redhat.com] > Sent: Monday, July 18, 2016 5:07 PM > To: Gonglei (Arei) > Cc: Paolo Bonzini; Lilijun (Jerry); qemu-devel@nongnu.org > Subject: Re: A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2 > > On Mon, Jul 18, 2016 at 08:41:32AM +, Gonglei (Arei) wrote: > > Hi Daniel & Paolo, > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", > > about the below code segment: > > > > static bool qemu_chr_open_socket_fd(CharDriverState *chr, Error **errp) > > { > > TCPCharDriver *s = chr->opaque; > > -int fd; > > +QIOChannelSocket *sioc = qio_channel_socket_new(); > > > > if (s->is_listen) { > > -fd = socket_listen(s->addr, errp); > > +if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) { > > +goto fail; > > +} > > +qemu_chr_finish_socket_connection(chr, sioc); > > } else if (s->reconnect_time) { > > -fd = socket_connect(s->addr, errp, qemu_chr_socket_connected, > chr); > > -return fd >= 0; > > +qio_channel_socket_connect_async(sioc, s->addr, > > + > qemu_chr_socket_connected, > > + chr, NULL); > > } else { > > -fd = socket_connect(s->addr, errp, NULL, NULL); > > -} > > -if (fd < 0) { > > -return false; > > +if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) { > > +goto fail; > > +} > > +qemu_chr_finish_socket_connection(chr, sioc); > > } > > > > -qemu_chr_finish_socket_connection(chr, fd); > > return true; > > + > > + fail: > > +object_unref(OBJECT(sioc)); > > +return false; > > } > > > > Why did you change socket_connect() to qio_channel_socket_connect_async > > but not qio_channel_socket_connect_sync when s->reconnect_time is ture? > Thanks. > > I can't get the reason from the commit message. > > When the socket_connect() function is called with a non-null value for > the NonBlockingConnectHandler parameter, it'll put the socket into > non-blocking mode before doing the connection. Therefore socket_connect() > method is no longer guaranteed to actually connect to the socket - it > may simply start the connection and leave an event handler to finish > it. This avoids blocking the caller waiting for the connectionb to > finish which may take non-negligible time with TCP sockets. > > The qio_channel_socket_connect_sync() method will always block until > completion of the connect, so we use qio_channel_socket_connect_async > which always does the connection attempt in the background. > > The old code which may or may not finish the connect() in the background > was a recipe for bugs because it forced callers to have 2 completely > separate codepaths for the same API call and chances are only one of > those code paths would be tested by the developers. With the code code > we're guaranteed to always complete in the background, so the callers > only have 1 codepath to worry about and so they'll be forced to ensure > that codepath works. > > > PS: We encountered a problem that when config vhost-user-net reconneticon > function, > > the virtio_net_get_features() didn't get a correct value after this changed, > and it was > > fixed when we change the async to sync. > > It seems that you were mostly lucky with the old code in that presumably > socket_connect() would complete the connection fully. There was never any > guarantee of this though - it was perfectly feasible that socket_connect() > would finish the connect in the background. So IMHO the vhost user code > was already buggy if it doesn't cope with the connect finishing in the > background. > > > Regards, > Daniel > -- > |: http://berrange.com -o- > http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- > http://virt-manager.org :| > |: http://autobuild.org -o- > http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- > http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] A question about 9894dc0cdcc397ee5b26370bc53da6d360a363c2
On Mon, Jul 18, 2016 at 08:41:32AM +, Gonglei (Arei) wrote: > Hi Daniel & Paolo, > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", > about the below code segment: > > static bool qemu_chr_open_socket_fd(CharDriverState *chr, Error **errp) > { > TCPCharDriver *s = chr->opaque; > -int fd; > +QIOChannelSocket *sioc = qio_channel_socket_new(); > > if (s->is_listen) { > -fd = socket_listen(s->addr, errp); > +if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) { > +goto fail; > +} > +qemu_chr_finish_socket_connection(chr, sioc); > } else if (s->reconnect_time) { > -fd = socket_connect(s->addr, errp, qemu_chr_socket_connected, chr); > -return fd >= 0; > +qio_channel_socket_connect_async(sioc, s->addr, > + qemu_chr_socket_connected, > + chr, NULL); > } else { > -fd = socket_connect(s->addr, errp, NULL, NULL); > -} > -if (fd < 0) { > -return false; > +if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) { > +goto fail; > +} > +qemu_chr_finish_socket_connection(chr, sioc); > } > > -qemu_chr_finish_socket_connection(chr, fd); > return true; > + > + fail: > +object_unref(OBJECT(sioc)); > +return false; > } > > Why did you change socket_connect() to qio_channel_socket_connect_async > but not qio_channel_socket_connect_sync when s->reconnect_time is ture? > Thanks. > I can't get the reason from the commit message. When the socket_connect() function is called with a non-null value for the NonBlockingConnectHandler parameter, it'll put the socket into non-blocking mode before doing the connection. Therefore socket_connect() method is no longer guaranteed to actually connect to the socket - it may simply start the connection and leave an event handler to finish it. This avoids blocking the caller waiting for the connectionb to finish which may take non-negligible time with TCP sockets. The qio_channel_socket_connect_sync() method will always block until completion of the connect, so we use qio_channel_socket_connect_async which always does the connection attempt in the background. The old code which may or may not finish the connect() in the background was a recipe for bugs because it forced callers to have 2 completely separate codepaths for the same API call and chances are only one of those code paths would be tested by the developers. With the code code we're guaranteed to always complete in the background, so the callers only have 1 codepath to worry about and so they'll be forced to ensure that codepath works. > PS: We encountered a problem that when config vhost-user-net reconneticon > function, > the virtio_net_get_features() didn't get a correct value after this changed, > and it was > fixed when we change the async to sync. It seems that you were mostly lucky with the old code in that presumably socket_connect() would complete the connection fully. There was never any guarantee of this though - it was perfectly feasible that socket_connect() would finish the connect in the background. So IMHO the vhost user code was already buggy if it doesn't cope with the connect finishing in the background. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [Nbd] question on ioctl NBD_SET_FLAGS
On 20 Apr 2016, at 17:18, Wouter Verhelstwrote: > Hi Eric, > > On Wed, Apr 20, 2016 at 09:42:02AM -0600, Eric Blake wrote: > [...] >> But in 3.9, the overlap bug was still present, and the set of global >> flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which >> overlaps with NBD_FLAG_READ_ONLY. Ouch. This means that a client >> talking to a server that advertises NO_ZEROES means that the client will >> mistakenly tell the kernel to treat EVERY export as read-only, even if >> the client doesn't respond with NBD_FLAG_C_NO_ZEROES. >> >> 3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32 >> *), and no longer tries to merge global flags with transmission flags; >> only the transmission flags are ever passed to the kernel via >> NBD_SET_FLAGS. Maybe it's good that there was only 2 weeks between 3.9 >> and 3.10, so hopefully few distros are trying to ship that broken version. > > Well, yeah, since 3.10 was an "oops" release when 3.9 exposed that bug > (which indeed had existed for a while) and which was reported quite > quickly on the list. Released versions of nbd which have the bug exist > though, and trying to have a 3.8 (or below) client talk to a 3.9 (or > above) server has the same issue. > > I decided that there was no way in which I could fix it, and that "the > export is readonly" is bad but not a "critical data loss" kind of bug, > so releasing 3.10 was pretty much the only sane thing I could do (other > than delaying NO_ZEROES, which might have worked too). For what it's worth, Ubuntu 14.04 (still under long term support) ships with nbd-client 3.7 and I've just had a bug report filed against gonbdserver where all exports go readonly. https://github.com/abligh/gonbdserver/issues/4 I've added an option to disable NBD_FLAG_NO_ZEROS on nbd-server but that's pretty horrible. And I think this will affect all pre-3.10 clients talking to 3.9 and later servers. I'm going to find a minimal patch to nbd-client and offer that to Ubuntu / Debian. This message is here in part so I have something to point them at on the mailing list :-) -- Alex Bligh
Re: [Qemu-devel] [Nbd] question on ioctl NBD_SET_FLAGS
Hi Eric, On Wed, Apr 20, 2016 at 09:42:02AM -0600, Eric Blake wrote: [...] > But in 3.9, the overlap bug was still present, and the set of global > flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which > overlaps with NBD_FLAG_READ_ONLY. Ouch. This means that a client > talking to a server that advertises NO_ZEROES means that the client will > mistakenly tell the kernel to treat EVERY export as read-only, even if > the client doesn't respond with NBD_FLAG_C_NO_ZEROES. > > 3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32 > *), and no longer tries to merge global flags with transmission flags; > only the transmission flags are ever passed to the kernel via > NBD_SET_FLAGS. Maybe it's good that there was only 2 weeks between 3.9 > and 3.10, so hopefully few distros are trying to ship that broken version. Well, yeah, since 3.10 was an "oops" release when 3.9 exposed that bug (which indeed had existed for a while) and which was reported quite quickly on the list. Released versions of nbd which have the bug exist though, and trying to have a 3.8 (or below) client talk to a 3.9 (or above) server has the same issue. I decided that there was no way in which I could fix it, and that "the export is readonly" is bad but not a "critical data loss" kind of bug, so releasing 3.10 was pretty much the only sane thing I could do (other than delaying NO_ZEROES, which might have worked too). [...] > But do we need to document in the kernel code that existing clients > mistakenly pass too many bits to the NBD_SET_FLAGS ioctl, so that if we > ever reach the future point where we need more than 16 transmission > flags, AND where we have more than 2 global flags defined, existing qemu > 2.5 clients don't confuse the kernel when calling NBD_SET_FLAGS? Or do > we think that it is unlikely enough to worry about, where by the time > there are more than 16 transmission flags, users are likely to already > be using new-enough qemu that doesn't send global flags to the kernel? I'm going to assume the latter is the case, yeah, but we could skip bits 16 and 17 (the only two handshake flags currently defined) and use the upper 14 bits before using the lower 2 if it does turn out to be a problem. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] TCG question reg. tcg_cpu_exec()
On 14/03/2016 16:26, Pranith Kumar wrote: > I am trying to understand what scenarios can result in no TBs > executing for that CPU. My understanding is that there is a pending > operation which needs to be handled before we can execute TBs from > this CPU(I/O?). For example the CPU could be halted and won't execute until it receives an interrupt. Paolo
Re: [Qemu-devel] Quick question on NVME on qemu
On Sat, 10 Oct 2015 00:39:43 -0700 sahlot arvindwrote: > Hello, > > Does qemu emulate a proper NVMe controller? Can someone please tell me how s far as I'm aware, QEMU doesn't emulate any NVMe controller yet. > can I use it to emulate a proper NVMe controller? Is there some way to > enable logging PCI accesses to NVMe controller so that I can see what > accesses my driver or OS is doing? > > Thanks in advance! > Sahlot
Re: [Qemu-devel] Quick question on NVME on qemu
Am 12.10.2015 um 10:05 hat Igor Mammedov geschrieben: > On Sat, 10 Oct 2015 00:39:43 -0700 > sahlot arvindwrote: > > > Hello, > > > > Does qemu emulate a proper NVMe controller? Can someone please tell me how > s far as I'm aware, QEMU doesn't emulate any NVMe controller yet. It does, though I'm not sure how complete it is. You can use it with -drive file=...,if=none,id=some_id -device nvme,drive=some_id Kevin > > can I use it to emulate a proper NVMe controller? Is there some way to > > enable logging PCI accesses to NVMe controller so that I can see what > > accesses my driver or OS is doing? > > > > Thanks in advance! > > Sahlot > >
Re: [Qemu-devel] Quick question on NVME on qemu
On Mon, 12 Oct 2015, Kevin Wolf wrote: Am 12.10.2015 um 10:05 hat Igor Mammedov geschrieben: On Sat, 10 Oct 2015 00:39:43 -0700 sahlot arvindwrote: Hello, Does qemu emulate a proper NVMe controller? Can someone please tell me how s far as I'm aware, QEMU doesn't emulate any NVMe controller yet. It does, though I'm not sure how complete it is. You can use it with -drive file=...,if=none,id=some_id -device nvme,drive=some_id Yep, that's right. It is a basic 1.0 compliant (mostly) controller. There are updates for up to 1.2 compliance that are in a public tree, but not merged to mainline. http://git.infradead.org/users/kbusch/qemu-nvme.git can I use it to emulate a proper NVMe controller? Is there some way to enable logging PCI accesses to NVMe controller so that I can see what accesses my driver or OS is doing? It's not a problem to add more NVMe features. There's currently no such logging mechanism, though. It shouldn't be too difficult to add if you want to take a stab at it.
Re: [Qemu-devel] virtio: Question on VRING_DESC_F_WRITE flag
On Wed, May 13, 2015 at 07:16:13PM +, Subhashini Venkataraman wrote: I have a question on VRING_DESC_F_WRITE flag. As per the specifications, if this flag is set, then the corresponding buffer pointed by the descriptor is Write only, otherwise Read Only. I am trying to understand why there is no READ_AND_WRITE option, i.e. a buffer could be Read by device emulation and written back into. All virtio devices are designed so that descriptors are read-only or write-only. A typical request looks like this: [header (RO)][data (RO)][status (WO)] or [header (RO)][data (WO)][status (WO)] Not sure what the original requirement for this design was, but the read-only/write-only approach means that buffers could be sent over a transport (like a pipe or TCP socket). If a buffer was read-write then it would be involve sending back a modified copy, which is unnecessary. Stefan pgp_Kf2hZh170.pgp Description: PGP signature
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On Wed, 2015-03-25 at 09:18 +0800, Chen, Tiejun wrote: Actually my problem is that, I need to add a new parameter, 'flag', like this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools can't be compiled successfully. Or maybe you're suggesting I may isolate this file while building tools, right? I answered this in my original reply: it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. The nop value is whatever value should be passed to retain the current behaviour. Ian.
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On 2015/3/25 18:26, Ian Campbell wrote: On Wed, 2015-03-25 at 09:18 +0800, Chen, Tiejun wrote: Actually my problem is that, I need to add a new parameter, 'flag', like this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools can't be compiled successfully. Or maybe you're suggesting I may isolate this file while building tools, right? I answered this in my original reply: it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Yes, I knew this but I'm just getting a little confusion again after we're starting to talk if xc.c is compiled... And I will try to do this. The nop value is whatever value should be passed to retain the current behaviour. Sure. Thanks Tiejun
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On Tue, 2015-03-24 at 18:15 +0800, Chen, Tiejun wrote: On 2015/3/24 17:51, Ian Campbell wrote: On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote: All guys, Thanks for your reply. Sorry to bother you. I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like pyxc_methods[] and pyxl_methods[]? They are registered with the Python runtime, so they are called from Python code. The first member of the struct is the pythonic function Sorry I don't understanding this. So seems you mean instead of xl, this is called by the third party user with python? Yes, tools/python/xen is the python bindings for various C libraries supported by Xen. NB, the libxl ones are broken and not even compiled right now, you can ignore them. name, e.g. from xc.c: { domain_create, Otherwise, often we always perform `xl create xxx' to create a VM. So I think this should go into this flow like this, xl_cmdtable.c:main_create() | + create_domain() | + libxl_domain_create_new() | + do_domain_create() | + Right? Yes, xl is written in C not python so tools/python doesn't enter the picture. (PyCFunction)pyxc_domain_create, So I don't see 'pyxc_domain_create' is called. Or I'm missing something... Chances are that there are no intree users of this code any more, xend would have used it at one time with something like: import xen.lowlevel.xc xc = xen.lowlevel.xc.xc() xc.domain_create() etc. In my specific case, I'm trying to introduce a new flag to each a device while assigning device. So this means I have to add a parameter, 'flag', into int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf) Then this is extended as int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf, uint32_t flag) After this introduction, obviously I should cover all cases using xc_assign_device(). And also I found this fallout goes into these two files. For example, here pyxc_assign_device() is involved. Currently it has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str' should represent all pci devices with SBDF format, right? It appears so, yes. But I don't know exactly what rule should be complied to construct this sort of flag into 'pci_str', or any reasonable idea to achieve my goal? If it is non-trivial to fix them IMHO it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Should I extend this 'pci_str' like Seg,bus,device,function:flag? But I'm not sure if I'm breaking the existing usage since like I said, I don't know what scenarios are using these methods. Like I said in the paragraph above, if it is complicated then it is fine to ignore this new parameter from Python. I don't know what the semantics of flag is, if it is per SBDF then I suppose if you really wanted to expose this here then you would need to invent some syntax for doing so. Ian.
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On 2015/3/24 18:20, Ian Campbell wrote: On Tue, 2015-03-24 at 18:15 +0800, Chen, Tiejun wrote: On 2015/3/24 17:51, Ian Campbell wrote: On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote: All guys, Thanks for your reply. Sorry to bother you. I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like pyxc_methods[] and pyxl_methods[]? They are registered with the Python runtime, so they are called from Python code. The first member of the struct is the pythonic function Sorry I don't understanding this. So seems you mean instead of xl, this is called by the third party user with python? Yes, tools/python/xen is the python bindings for various C libraries supported by Xen. Thanks for your explanation. NB, the libxl ones are broken and not even compiled right now, you can ignore them. Looks this is still compiled now. name, e.g. from xc.c: { domain_create, Otherwise, often we always perform `xl create xxx' to create a VM. So I think this should go into this flow like this, xl_cmdtable.c:main_create() | + create_domain() | + libxl_domain_create_new() | + do_domain_create() | + Right? Yes, xl is written in C not python so tools/python doesn't enter the picture. Yeah. (PyCFunction)pyxc_domain_create, So I don't see 'pyxc_domain_create' is called. Or I'm missing something... Chances are that there are no intree users of this code any more, xend would have used it at one time with something like: import xen.lowlevel.xc xc = xen.lowlevel.xc.xc() xc.domain_create() etc. In my specific case, I'm trying to introduce a new flag to each a device while assigning device. So this means I have to add a parameter, 'flag', into int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf) Then this is extended as int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf, uint32_t flag) After this introduction, obviously I should cover all cases using xc_assign_device(). And also I found this fallout goes into these two files. For example, here pyxc_assign_device() is involved. Currently it has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str' should represent all pci devices with SBDF format, right? It appears so, yes. But I don't know exactly what rule should be complied to construct this sort of flag into 'pci_str', or any reasonable idea to achieve my goal? If it is non-trivial to fix them IMHO it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Should I extend this 'pci_str' like Seg,bus,device,function:flag? But I'm not sure if I'm breaking the existing usage since like I said, I don't know what scenarios are using these methods. Like I said in the paragraph above, if it is complicated then it is fine to ignore this new parameter from Python. I don't know what the semantics of flag is, if it is per SBDF then I Yes, this should be a flag specific to a SBDF. You know, I'm working to fix RMRR completely. Based on some discussion about that design ( I assume you may read that thread previously :) ), now we probably need to pass a flag to introduce our policy. suppose if you really wanted to expose this here then you would need to invent some syntax for doing so. Definitely. When I finish this I will send you to review technically. Again, really appreciate your clarification to me. Thanks Tiejun
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On Tue, 2015-03-24 at 18:31 +0800, Chen, Tiejun wrote: NB, the libxl ones are broken and not even compiled right now, you can ignore them. Looks this is still compiled now. xc is, xl is not, I am sure of that. I don't know what the semantics of flag is, if it is per SBDF then I Yes, this should be a flag specific to a SBDF. You know, I'm working to fix RMRR completely. Based on some discussion about that design ( I assume you may read that thread previously :) ), now we probably need to pass a flag to introduce our policy. Unless you have a concrete requirement to expose RMRR via the Python bindings to libxc (i.e. you know somebody is using them) then I think you should not bother. Making RMRR work via the (C) interface to libxl used by xl and libvirt is sufficient for a new in tree feature. Ian. suppose if you really wanted to expose this here then you would need to invent some syntax for doing so. Definitely. When I finish this I will send you to review technically. Again, really appreciate your clarification to me. Thanks Tiejun
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote: All guys, Sorry to bother you. I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like pyxc_methods[] and pyxl_methods[]? They are registered with the Python runtime, so they are called from Python code. The first member of the struct is the pythonic function name, e.g. from xc.c: { domain_create, (PyCFunction)pyxc_domain_create, METH_VARARGS | METH_KEYWORDS, \n Create a new domain.\n dom[int, 0]:Domain identifier to use (allocated if zero).\n Returns: [int] new domain identifier; -1 on error.\n }, defines a method called domain_create, in the xen.lowlevel.xc namespace. And how should we call these approaches? I'm not sure what you are asking here. In my specific case, I'm trying to introduce a new flag to each a device while assigning device. So this means I have to add a parameter, 'flag', into int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf) Then this is extended as int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf, uint32_t flag) After this introduction, obviously I should cover all cases using xc_assign_device(). And also I found this fallout goes into these two files. For example, here pyxc_assign_device() is involved. Currently it has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str' should represent all pci devices with SBDF format, right? It appears so, yes. But I don't know exactly what rule should be complied to construct this sort of flag into 'pci_str', or any reasonable idea to achieve my goal? If it is non-trivial to fix them IMHO it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Ian.
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On 2015/3/24 17:51, Ian Campbell wrote: On Tue, 2015-03-24 at 16:47 +0800, Chen, Tiejun wrote: All guys, Thanks for your reply. Sorry to bother you. I have a question to two files, tools/python/xen/lowlevel/xc/xc.c and tools/python/xen/lowlevel/xl/xl.c. Who is a caller to those methods like pyxc_methods[] and pyxl_methods[]? They are registered with the Python runtime, so they are called from Python code. The first member of the struct is the pythonic function Sorry I don't understanding this. So seems you mean instead of xl, this is called by the third party user with python? name, e.g. from xc.c: { domain_create, Otherwise, often we always perform `xl create xxx' to create a VM. So I think this should go into this flow like this, xl_cmdtable.c:main_create() | + create_domain() | + libxl_domain_create_new() | + do_domain_create() | + Right? (PyCFunction)pyxc_domain_create, So I don't see 'pyxc_domain_create' is called. Or I'm missing something... METH_VARARGS | METH_KEYWORDS, \n Create a new domain.\n dom[int, 0]:Domain identifier to use (allocated if zero).\n Returns: [int] new domain identifier; -1 on error.\n }, defines a method called domain_create, in the xen.lowlevel.xc namespace. And how should we call these approaches? I'm not sure what you are asking here. If you can give a real case to call this, things couldn't be better :) In my specific case, I'm trying to introduce a new flag to each a device while assigning device. So this means I have to add a parameter, 'flag', into int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf) Then this is extended as int xc_assign_device( xc_interface *xch, uint32_t domid, uint32_t machine_sbdf, uint32_t flag) After this introduction, obviously I should cover all cases using xc_assign_device(). And also I found this fallout goes into these two files. For example, here pyxc_assign_device() is involved. Currently it has two parameters, 'dom' and 'pci_str', and as I understand 'pci_str' should represent all pci devices with SBDF format, right? It appears so, yes. But I don't know exactly what rule should be complied to construct this sort of flag into 'pci_str', or any reasonable idea to achieve my goal? If it is non-trivial to fix them IMHO it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Should I extend this 'pci_str' like Seg,bus,device,function:flag? But I'm not sure if I'm breaking the existing usage since like I said, I don't know what scenarios are using these methods. Thanks Tiejun
Re: [Qemu-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On 2015/3/24 18:40, Ian Campbell wrote: On Tue, 2015-03-24 at 18:31 +0800, Chen, Tiejun wrote: NB, the libxl ones are broken and not even compiled right now, you can ignore them. Looks this is still compiled now. xc is, xl is not, I am sure of that. Indeed, you're right :) I don't know what the semantics of flag is, if it is per SBDF then I Yes, this should be a flag specific to a SBDF. You know, I'm working to fix RMRR completely. Based on some discussion about that design ( I assume you may read that thread previously :) ), now we probably need to pass a flag to introduce our policy. Unless you have a concrete requirement to expose RMRR via the Python bindings to libxc (i.e. you know somebody is using them) then I think you should not bother. Actually my problem is that, I need to add a new parameter, 'flag', like this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools can't be compiled successfully. Or maybe you're suggesting I may isolate this file while building tools, right? Making RMRR work via the (C) interface to libxl used by xl and libvirt is sufficient for a new in tree feature. Yeah. Thanks Tiejun Ian. suppose if you really wanted to expose this here then you would need to invent some syntax for doing so. Definitely. When I finish this I will send you to review technically. Again, really appreciate your clarification to me. Thanks Tiejun
Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest
Christopher Covington c...@codeaurora.org writes: Hi Alex, On 03/10/2015 09:12 PM, Alex Sun wrote: I downloaded QEMU 2.2.0, and built a qemu-system-arm from there. I loaded a versatile kernel 2.6.32.5 and my own file system. #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot I see following message while the kernel is booting: [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011 [3.884631] console [ttyAMA0] enabled [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011 [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011 I think the serial ports of ttyAMA[0-2] are enabled. Then I try to do serial redirection: #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot *-serial stdio -serial pty* However, I get error message: QEMU 2.2.0 monitor - type 'help' for more information *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple character devices* It's weired to me. I don't think there is another one other than qemu itself holding the stdio. You can connect multiple users to a character device in multiplexing mode. Quoting the fine manual: A character device may be used in multiplexing mode by multiple front-ends. The key sequence of @key{Control-a} and @key{c} will rotate the input focus between attached front-ends. Specify @option{mux=on} to enable this mode. I think the conflict is that when you specify -nographic, the QEMU monitor uses stdio. I find it necessary to add -monitor none to all of my command lines. I use -nodefault. One stone, many annoying birds.
Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest
Hi Alex, On 03/10/2015 09:12 PM, Alex Sun wrote: I downloaded QEMU 2.2.0, and built a qemu-system-arm from there. I loaded a versatile kernel 2.6.32.5 and my own file system. #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot I see following message while the kernel is booting: [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011 [3.884631] console [ttyAMA0] enabled [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011 [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011 I think the serial ports of ttyAMA[0-2] are enabled. Then I try to do serial redirection: #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot *-serial stdio -serial pty* However, I get error message: QEMU 2.2.0 monitor - type 'help' for more information *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple character devices* It's weired to me. I don't think there is another one other than qemu itself holding the stdio. I think the conflict is that when you specify -nographic, the QEMU monitor uses stdio. I find it necessary to add -monitor none to all of my command lines. Regards, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [Qemu-devel] Dummy question for setting up a serial connection between host and guest
Thank you Chris! Your suggestion is really working!!! Awesome!!! I find in this way if I input ^C in stdio I'm gonna kill the qemu process. But I think it's fine to me ;-). Did you managed to make serial communication between host (which holds QEMU client) and guest (QEMU client)? I tried to use -serial pty but realized that it is only one way traffic, e.x. as I experimented, I can send traffic from guest to host but I can't do that from host to guest. If it's not gonna work, I guess I have to use socket to make that communication. Thanks! Alex On Thu, Mar 12, 2015 at 9:33 AM, Christopher Covington c...@codeaurora.org wrote: Hi Alex, On 03/10/2015 09:12 PM, Alex Sun wrote: I downloaded QEMU 2.2.0, and built a qemu-system-arm from there. I loaded a versatile kernel 2.6.32.5 and my own file system. #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot I see following message while the kernel is booting: [3.873429] dev:f1: ttyAMA0 at MMIO 0x101f1000 (irq = 12) is a AMBA/PL011 [3.884631] console [ttyAMA0] enabled [3.889783] dev:f2: ttyAMA1 at MMIO 0x101f2000 (irq = 13) is a AMBA/PL011 [3.890646] dev:f3: ttyAMA2 at MMIO 0x101f3000 (irq = 14) is a AMBA/PL011 I think the serial ports of ttyAMA[0-2] are enabled. Then I try to do serial redirection: #qemu-system-arm -pidfile /tmp/qemu_0_pids/0.pid -M versatilepb -option-rom efi-rtl8139.rom -initrd newinitrd -kernel vmlinuz-2.6.32-5-versatile -append root=/dev/sda1 console=ttyAMA0 -hda new.qcow2 -drive file=fat:rw:./data -nographic -no-reboot *-serial stdio -serial pty* However, I get error message: QEMU 2.2.0 monitor - type 'help' for more information *(qemu) qemu-system-arm: -serial stdio: cannot use stdio by multiple character devices* It's weired to me. I don't think there is another one other than qemu itself holding the stdio. I think the conflict is that when you specify -nographic, the QEMU monitor uses stdio. I find it necessary to add -monitor none to all of my command lines. Regards, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [Qemu-devel] a question for control queue
On 01/09/2015 02:41 PM, Ouyang, Changchun wrote: Hi all, I have a question about the control queue in qemu, When the qemu have configured the control queue, and guest also negotiated the control queue successfully with qemu, Will the qemu will let vhost know guest try to use control queue to send some commands? Currently not. Vhost is only in charge of data path so control virtqueue is still handled by qemu. So the filtering does not even work if vhost is used. The plan is let management (libvirt) to be notified when guest want to do filtering. And then libvirt can configure the filter of host devices. Or could the vhost also setup the control queue to communicate directly with control queue on guest? Technically, we can. How to do that? Just do like what we did for rx virtqueue and tx virtqueue. But I see several issues: - For security reason, qemu was usually run as non-privileged process. This means vhost kernel thread does not have the privilege to configure filter in host. - Vhost kernel thread know little about the backend (which could be tun/macvtap or even packet socket). But for vhost-user implementation, it may make sense but I'm not sure. Hope anyone could shed some lights on this question. Thanks in advance! Thanks Changchun
Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi
On 2015-1-8 18:03, Vadim Rozenfeld wrote: On Thu, 2015-01-08 at 17:27 +0800, Wangting (Kathy) wrote: On 2015-1-8 17:01, Vadim Rozenfeld wrote: On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote: Hi Vadim, In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge the patch about Bug 1077566 into it, but I find that MSI still doesn't work, because the value of IRQ is not negative in the device manager. I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that pMsixCapOffset-MessageControl.MSIXEnable is zero when StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled. I have tested that MSI is successfully enabled in virtio-win-0.1-81. But I still want to find out why the patch is not work in virtio-win-0.1-74. Is there some important points I missed? Did you enable MSISupported in inf file? Cheers, Vadim. Where is the inf file? Do you mean qemupciserial.inf? vioscsi.inx in the source directory. After stamping, it will be copied into vioscsi.inf, as part of build process. Sorry, I forget to update vioscsi.inf after compiling. The patch is OK. Thanks. Ting Thank you very much. Best wishes, Ting Wang
Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi
On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote: Hi Vadim, In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge the patch about Bug 1077566 into it, but I find that MSI still doesn't work, because the value of IRQ is not negative in the device manager. I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that pMsixCapOffset-MessageControl.MSIXEnable is zero when StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled. I have tested that MSI is successfully enabled in virtio-win-0.1-81. But I still want to find out why the patch is not work in virtio-win-0.1-74. Is there some important points I missed? Did you enable MSISupported in inf file? Cheers, Vadim. Thank you very much. Best wishes, Ting Wang
Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi
On 2015-1-8 17:01, Vadim Rozenfeld wrote: On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote: Hi Vadim, In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge the patch about Bug 1077566 into it, but I find that MSI still doesn't work, because the value of IRQ is not negative in the device manager. I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that pMsixCapOffset-MessageControl.MSIXEnable is zero when StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled. I have tested that MSI is successfully enabled in virtio-win-0.1-81. But I still want to find out why the patch is not work in virtio-win-0.1-74. Is there some important points I missed? Did you enable MSISupported in inf file? Cheers, Vadim. Where is the inf file? Do you mean qemupciserial.inf? Thank you very much. Best wishes, Ting Wang
Re: [Qemu-devel] Fwd: [question] About MSI for vioscsi
On Thu, 2015-01-08 at 17:27 +0800, Wangting (Kathy) wrote: On 2015-1-8 17:01, Vadim Rozenfeld wrote: On Thu, 2015-01-08 at 16:40 +0800, Wangting (Kathy) wrote: Hi Vadim, In order to enable MSI for vioscsi in virtio-win-0.1-74, I try to merge the patch about Bug 1077566 into it, but I find that MSI still doesn't work, because the value of IRQ is not negative in the device manager. I try to trace the I/O path with ENABLE_COM_DEBUG, and I find that pMsixCapOffset-MessageControl.MSIXEnable is zero when StorPortGetBusData returned in VioScsiFindAdapter, so MSI can't be enabled. I have tested that MSI is successfully enabled in virtio-win-0.1-81. But I still want to find out why the patch is not work in virtio-win-0.1-74. Is there some important points I missed? Did you enable MSISupported in inf file? Cheers, Vadim. Where is the inf file? Do you mean qemupciserial.inf? vioscsi.inx in the source directory. After stamping, it will be copied into vioscsi.inf, as part of build process. Thank you very much. Best wishes, Ting Wang
Re: [Qemu-devel] checkpatch.pl question
Il 07/06/2014 18:27, Peter Maydell ha scritto: On 7 June 2014 17:00, Stefan Weil s...@weilnetz.de wrote: Am 07.06.2014 16:58, schrieb Peter Maydell: It's clearly something to do with it getting confused by the type name, because you can suppress the warning by just changing it so it has an _t suffix, for instance. In particular notice that the set of allowed type names constructed by the build_types() subroutine is extremely limited: it looks to me as if the script makes assumptions based on kernel style (where 'struct foo' is preferred over a typedef and plain 'foo') that allow it to assume that if it's not one of a very few allowed formats then it's not a type name. Yes, but that's only part of the story. checkpatch.pl contains some special handling for the standard C data types and also knows some Linux data type patterns. It also handles the above case correctly in most circumstances because there is special code for * used in function argument lists. Interesting. There are obviously multiple situations where it fails in this way for different reasons. This is the case I saw yesterday: @@ -125,19 +125,20 @@ static TCGRegSet tcg_target_available_regs[2]; static TCGRegSet tcg_target_call_clobber_regs; #if TCG_TARGET_INSN_UNIT_SIZE == 1 -static inline void tcg_out8(TCGContext *s, uint8_t v) +static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v) { *s-code_ptr++ = v; } No macros or previous context for it to get confused by. Saying TCGContext_t instead silences the warning. Given QEMU's coding style, any camelcase identifier (or equivalently and identifier containing an uppercase and a lowercase letter) could be considered a type. Paolo
Re: [Qemu-devel] checkpatch.pl question
On 6 June 2014 08:17, Markus Armbruster arm...@redhat.com wrote: --debug values=1 produces 188 . extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, 188 EEVVVNTTVVCCVVVCC which suggests it recognizes the declaration just fine. Copying a few possible victims^Wexperts. It's clearly something to do with it getting confused by the type name, because you can suppress the warning by just changing it so it has an _t suffix, for instance. In particular notice that the set of allowed type names constructed by the build_types() subroutine is extremely limited: it looks to me as if the script makes assumptions based on kernel style (where 'struct foo' is preferred over a typedef and plain 'foo') that allow it to assume that if it's not one of a very few allowed formats then it's not a type name. I find this script extremely hard to understand, though. thanks -- PMM
Re: [Qemu-devel] checkpatch.pl question
Am 07.06.2014 16:58, schrieb Peter Maydell: On 6 June 2014 08:17, Markus Armbruster arm...@redhat.com wrote: --debug values=1 produces 188 . extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid, 188 EEVVVNTTVVCCVVVCC which suggests it recognizes the declaration just fine. Copying a few possible victims^Wexperts. It's clearly something to do with it getting confused by the type name, because you can suppress the warning by just changing it so it has an _t suffix, for instance. In particular notice that the set of allowed type names constructed by the build_types() subroutine is extremely limited: it looks to me as if the script makes assumptions based on kernel style (where 'struct foo' is preferred over a typedef and plain 'foo') that allow it to assume that if it's not one of a very few allowed formats then it's not a type name. I find this script extremely hard to understand, though. thanks -- PMM Yes, but that's only part of the story. checkpatch.pl contains some special handling for the standard C data types and also knows some Linux data type patterns. It also handles the above case correctly in most circumstances because there is special code for * used in function argument lists. Here checkpatch.pl gets confused by a totally different line of the patch: type_init(register_vfio_pci_dev_type) Simply adding a semicolon at the end of that line would help, but is of course not correct (note that there is already some QEMU code which adds such a semicolon and which should be fixed). It's generally very difficult or even impossible for code analysers with a limited view to analyse macro calls correctly. A human reviewer has the same problem. checkpatch.pl could be improved to handle the patch correctly: the type_init line and the line which raises the error message belong to different files. Obviously some global state variables are not reset when checkpatch.pl detects patch code belonging to a new file. I'm still searching which ones. A similar problem might also occur when more than one patch is checked: this also does not reset all relevant variables when switching from one patch file to the next one. The original Linux version of checkpatch.pl shows the same bug. Regards Stefan PS. I just sent a first patch for checkpatch.pl which is totally unrelated to the above problem, but which I noticed while I investigated it.