Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
Ping :) Any ideas on this? On 7/17/24 10:57 AM, Daniil Tatianin wrote: On 5/29/24 6:27 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 16:34, Markus Armbruster wrote: Daniil Tatianin writes: On 5/29/24 4:39 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:43, Daniil Tatianin wrote: On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:03, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. In previous version you said: > This isn't really related to migration though. Serverless is based > on constantly stopping and resuming the VM on e.g. every HTTP > request to an endpoint. Which made some sense. Maybe mention HTTP? And point to that use case (possibly with QMP commands) in the commit description? Hmm, maybe it would be helpful for people who don't know what serverless means. How about: This is useful after a long stop-const pause, which is common for serverless-type workloads, e.g. stopping/resuming the VM on every HTTP request to an endpoint, which might involve a long pause in between the requests, causing time drift in the guest. Please help me understand your workflow. Your management layer call @stop and @cont QMP commands, is that right? Yes, that is correct. @cont will emit a @RESUME event. If we could listen to QAPI events from C code, we could have the mc146818rtc device automatically sync on VM resume, and no need for this async command. Perhaps? I'm not sure how that would be implemented, but let's see what Markus has to say. You can't listen on an event in QEMU itself. You can only hook into the place that generates the event. Apparently "qemu/notify.h" could be use for QAPI events (currently only used by migration). Big change, to be discussed later. The RESUME event is sent from vm_prepare_start() in system/cpus.c. Good spot, it is where we call synchronize_pre_resume() for vCPUs, which is exactly what Daniil wants for RTC devices. I'd rather we call here rtc_synchronize_pre_resume(), which would mostly be qmp_rtc_inject_irq_broadcast() content, without using QMP at all. But for back-compat we need some CLI option "sync-rtc-on-resume" default to false. Preferably a mc146818rtc property to KISS. That would solve Daniil problem and make Markus/myself happier. So I started looking into this, and I'm a bit unsure about what we want this API to look like. What I mean is there isn't a generic RTC abstraction in QEMU, likewise there isn't an "RTC" global variable you can easily use to hook up some sort of API or ops-like functions like cpu_accel does. One simple solution I'm seeing is making an mc146818-specific API like mc146818rtc_synchronize_pre_resume(), and call that directly wrapped inside an ifdef CONFIG_MC146818RTC inside system/cpus.c. We can then check the sync-on-resume property inside of that helper and optionally just return from it if it's not set. Any objections on this approach? Is there a better way to do this? Thanks! Paolo, any objection? Regards, Phil.
Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
On 5/29/24 6:27 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 16:34, Markus Armbruster wrote: Daniil Tatianin writes: On 5/29/24 4:39 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:43, Daniil Tatianin wrote: On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:03, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. In previous version you said: > This isn't really related to migration though. Serverless is based > on constantly stopping and resuming the VM on e.g. every HTTP > request to an endpoint. Which made some sense. Maybe mention HTTP? And point to that use case (possibly with QMP commands) in the commit description? Hmm, maybe it would be helpful for people who don't know what serverless means. How about: This is useful after a long stop-const pause, which is common for serverless-type workloads, e.g. stopping/resuming the VM on every HTTP request to an endpoint, which might involve a long pause in between the requests, causing time drift in the guest. Please help me understand your workflow. Your management layer call @stop and @cont QMP commands, is that right? Yes, that is correct. @cont will emit a @RESUME event. If we could listen to QAPI events from C code, we could have the mc146818rtc device automatically sync on VM resume, and no need for this async command. Perhaps? I'm not sure how that would be implemented, but let's see what Markus has to say. You can't listen on an event in QEMU itself. You can only hook into the place that generates the event. Apparently "qemu/notify.h" could be use for QAPI events (currently only used by migration). Big change, to be discussed later. The RESUME event is sent from vm_prepare_start() in system/cpus.c. Good spot, it is where we call synchronize_pre_resume() for vCPUs, which is exactly what Daniil wants for RTC devices. I'd rather we call here rtc_synchronize_pre_resume(), which would mostly be qmp_rtc_inject_irq_broadcast() content, without using QMP at all. But for back-compat we need some CLI option "sync-rtc-on-resume" default to false. Preferably a mc146818rtc property to KISS. That would solve Daniil problem and make Markus/myself happier. So I started looking into this, and I'm a bit unsure about what we want this API to look like. What I mean is there isn't a generic RTC abstraction in QEMU, likewise there isn't an "RTC" global variable you can easily use to hook up some sort of API or ops-like functions like cpu_accel does. One simple solution I'm seeing is making an mc146818-specific API like mc146818rtc_synchronize_pre_resume(), and call that directly wrapped inside an ifdef CONFIG_MC146818RTC inside system/cpus.c. We can then check the sync-on-resume property inside of that helper and optionally just return from it if it's not set. Any objections on this approach? Is there a better way to do this? Thanks! Paolo, any objection? Regards, Phil.
Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
On 5/29/24 6:27 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 16:34, Markus Armbruster wrote: Daniil Tatianin writes: On 5/29/24 4:39 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:43, Daniil Tatianin wrote: On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:03, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. In previous version you said: > This isn't really related to migration though. Serverless is based > on constantly stopping and resuming the VM on e.g. every HTTP > request to an endpoint. Which made some sense. Maybe mention HTTP? And point to that use case (possibly with QMP commands) in the commit description? Hmm, maybe it would be helpful for people who don't know what serverless means. How about: This is useful after a long stop-const pause, which is common for serverless-type workloads, e.g. stopping/resuming the VM on every HTTP request to an endpoint, which might involve a long pause in between the requests, causing time drift in the guest. Please help me understand your workflow. Your management layer call @stop and @cont QMP commands, is that right? Yes, that is correct. @cont will emit a @RESUME event. If we could listen to QAPI events from C code, we could have the mc146818rtc device automatically sync on VM resume, and no need for this async command. Perhaps? I'm not sure how that would be implemented, but let's see what Markus has to say. You can't listen on an event in QEMU itself. You can only hook into the place that generates the event. Apparently "qemu/notify.h" could be use for QAPI events (currently only used by migration). Big change, to be discussed later. The RESUME event is sent from vm_prepare_start() in system/cpus.c. Good spot, it is where we call synchronize_pre_resume() for vCPUs, which is exactly what Daniil wants for RTC devices. I'd rather we call here rtc_synchronize_pre_resume(), which would mostly be qmp_rtc_inject_irq_broadcast() content, without using QMP at all. But for back-compat we need some CLI option "sync-rtc-on-resume" default to false. Preferably a mc146818rtc property to KISS. That would solve Daniil problem and make Markus/myself happier. Paolo, any objection? Hey there! Since Paolo never replied I'm going to take that as a no then. Is everyone else okay with this idea? If there are no objections I'm going to try and implement this. Thanks! Regards, Phil.
Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
On 5/29/24 4:39 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:43, Daniil Tatianin wrote: On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:03, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. In previous version you said: > This isn't really related to migration though. Serverless is based > on constantly stopping and resuming the VM on e.g. every HTTP > request to an endpoint. Which made some sense. Maybe mention HTTP? And point to that use case (possibly with QMP commands) in the commit description? Hmm, maybe it would be helpful for people who don't know what serverless means. How about: This is useful after a long stop-const pause, which is common for serverless-type workloads, e.g. stopping/resuming the VM on every HTTP request to an endpoint, which might involve a long pause in between the requests, causing time drift in the guest. Please help me understand your workflow. Your management layer call @stop and @cont QMP commands, is that right? Yes, that is correct. @cont will emit a @RESUME event. If we could listen to QAPI events from C code, we could have the mc146818rtc device automatically sync on VM resume, and no need for this async command. Perhaps? I'm not sure how that would be implemented, but let's see what Markus has to say. I'll let our QAPI expert enlighten me on this :)
Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
On 5/29/24 3:36 PM, Philippe Mathieu-Daudé wrote: On 29/5/24 14:03, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..96ecd43036 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ + MC146818RtcState *s; + + QLIST_FOREACH(s, _devices, link) { + /* Update-ended interrupt enable */ + s->cmos_data[RTC_REG_B] |= REG_B_UIE; + + /* Interrupt request flag | update interrupt flag */ + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + + qemu_irq_raise(s->irq); + } +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. In previous version you said: > This isn't really related to migration though. Serverless is based > on constantly stopping and resuming the VM on e.g. every HTTP > request to an endpoint. Which made some sense. Maybe mention HTTP? And point to that use case (possibly with QMP commands) in the commit description? Hmm, maybe it would be helpful for people who don't know what serverless means. How about: This is useful after a long stop-const pause, which is common for serverless-type workloads, e.g. stopping/resuming the VM on every HTTP request to an endpoint, which might involve a long pause in between the requests, causing time drift in the guest. Make that "workloads". "For all existing RTCs" is a lie. It's really just all mc146818s. The command works as documented only as long as the VM has no other RTCs. @rtc-mc146818-force-sync-broadcast sounds clearer & safer; IIUC the command goal, mentioning IRQ injection is irrelevant (implementation detail). I like this if Markus is okay with that. If we go with this, would it make sense to drop the "Bug" clause? (I'm trying to not spread the problems we already have with @rtc-reset-reinjection). +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } The conditional kind-of-sort-of ensures "VM has no other RTCs": TARGET_I386 compiles only this file in hw/rtc/, and therefore can't have other RTCs (unless they're hiding in some other
Re: [PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
Thanks for the review Markus! I will fix the wording and add a "Bug:" clause for the next revision. On 5/29/24 3:03 PM, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..96ecd43036 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ +MC146818RtcState *s; + +QLIST_FOREACH(s, _devices, link) { +/* Update-ended interrupt enable */ +s->cmos_data[RTC_REG_B] |= REG_B_UIE; + +/* Interrupt request flag | update interrupt flag */ +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. Make that "workloads". "For all existing RTCs" is a lie. It's really just all mc146818s. The command works as documented only as long as the VM has no other RTCs. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } The conditional kind-of-sort-of ensures "VM has no other RTCs": TARGET_I386 compiles only this file in hw/rtc/, and therefore can't have other RTCs (unless they're hiding in some other directory). Brittle. When we move to single binary, we will compile in other RTCs. How can we ensure "VM has no nother RTCs" then? What if one of these other RTCs can be added with -device or device_add? When this falls apart because the VM does have other RTCs, it can only do so silently: the command can't tell us for which RTCs it actually injected an interrupt. Documentation making promises the implementation doesn't actually deliver can only end in tears. The only reason I'm not rejecting this patch out of hand is the existing and similarly broken rtc-reset-reinjection. I'm willing to reluctantly accept it with honest documentation. Perhaps: "Bug: RTCs other than mc146818rtc are silently ignored." Much, much better would be an interface that's actually usable with multiple RTCs. We'd have to talk how interrupt injection could be used with such a machine. Anything less will likely need to be replaced later on. + ## # @SevState: #
[PATCH v4] mc146818rtc: add a way to generate RTC interrupts via QMP
This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Acked-by: Philippe Mathieu-Daudé Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. Changes since v3: - Fix checkpatch complaints about usage of C99 comments --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..96ecd43036 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ +MC146818RtcState *s; + +QLIST_FOREACH(s, _devices, link) { +/* Update-ended interrupt enable */ +s->cmos_data[RTC_REG_B] |= REG_B_UIE; + +/* Interrupt request flag | update interrupt flag */ +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP
On 5/27/24 8:01 PM, Philippe Mathieu-Daudé wrote: Hi Daniil, On 21/5/24 10:08, Daniil Tatianin wrote: Could you please take a look at this revision? I think I've taken everyone's feedback into account. Sorry for the delay, I missed your patch since you didn't Cc me (Markus asked me to look at this). Oops, my bad for forgetting to Cc you! Thanks for addressing the previous requests. Thank you! On 5/14/24 9:57 AM, Daniil Tatianin wrote: ping :) 06.05.2024, 11:34, "Daniil Tatianin" : This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..2b3754f5c6 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ + MC146818RtcState *s; + + QLIST_FOREACH(s, _devices, link) { + // Update-ended interrupt enable This doesn't pass the checkpatch script because it isn't QEMU coding style: https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style Looks like // comments are not allowed, will fix. + s->cmos_data[RTC_REG_B] |= REG_B_UIE; + + // Interrupt request flag | update interrupt flag + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + + qemu_irq_raise(s->irq); + } +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } Your new command doesn't make my life harder than the current 'rtc-reset-reinjection' command, so if this is useful to you, I'm OK to: Acked-by: Philippe Mathieu-Daudé (assuming the comment style is fixed). I'll see later how to deal with that with heterogeneous emulation. Regards, Phil. Thank you! +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP
Could you please take a look at this revision? I think I've taken everyone's feedback into account. Thank you! On 5/14/24 9:57 AM, Daniil Tatianin wrote: ping :) 06.05.2024, 11:34, "Daniil Tatianin" : This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..2b3754f5c6 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ + MC146818RtcState *s; + + QLIST_FOREACH(s, _devices, link) { + // Update-ended interrupt enable + s->cmos_data[RTC_REG_B] |= REG_B_UIE; + + // Interrupt request flag | update interrupt flag + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + + qemu_irq_raise(s->irq); + } +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP
ping :) 06.05.2024, 11:34, "Daniil Tatianin" :This can be used to force-synchronize the time in guest after a longstop-cont pause, which can be useful for serverless-type workload.Also add a comment to highlight the fact that this (and one other QMPcommand) only works for the MC146818 RTC controller.Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru>---Changes since v0:- Rename to rtc-inject-irq to match other similar API- Add a comment to highlight that this only works for the I386 RTCChanges since v1:- Added a description below the QMP command to explain how it can be used and what it does.Changes since v2:- Add a 'broadcast' suffix.- Change the comments to explain the flags we're setting.- Change the command description to fix styling & explain that it's a broadcast command.--- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 19 +++ 3 files changed, 40 insertions(+)diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.cindex 3379f92748..2b3754f5c6 100644--- a/hw/rtc/mc146818rtc.c+++ b/hw/rtc/mc146818rtc.c@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/*+ * NOTE:+ * The two QMP functions below are _only_ implemented for the MC146818.+ * All other RTC devices ignore this.+ */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s;@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp)+{+ MC146818RtcState *s;++ QLIST_FOREACH(s, _devices, link) {+ // Update-ended interrupt enable+ s->cmos_data[RTC_REG_B] |= REG_B_UIE;++ // Interrupt request flag | update interrupt flag+ s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;++ qemu_irq_raise(s->irq);+ }+}+ static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered();diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.hindex 97cec0b3e8..e9dd0f9c72 100644--- a/include/hw/rtc/mc146818rtc.h+++ b/include/hw/rtc/mc146818rtc.h@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp);+void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */diff --git a/qapi/misc-target.json b/qapi/misc-target.jsonindex 4e0a6492a9..7d388a3753 100644--- a/qapi/misc-target.json+++ b/qapi/misc-target.json@@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +##+# @rtc-inject-irq-broadcast:+#+# Inject an RTC interrupt for all existing RTCs on the system.+# The interrupt forces the guest to synchronize the time with RTC.+# This is useful after a long stop-cont pause, which is common for+# serverless-type workload.+#+# Since: 9.1+#+# Example:+#+# -> { "execute": "rtc-inject-irq-broadcast" }+# <- { "return": {} }+#+##+{ 'command': 'rtc-inject-irq-broadcast',+ 'if': 'TARGET_I386' }+ ## # @SevState: #--2.34.1
[PATCH v3] mc146818rtc: add a way to generate RTC interrupts via QMP
This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. Changes since v2: - Add a 'broadcast' suffix. - Change the comments to explain the flags we're setting. - Change the command description to fix styling & explain that it's a broadcast command. --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 19 +++ 3 files changed, 40 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index 3379f92748..2b3754f5c6 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq_broadcast(Error **errp) +{ +MC146818RtcState *s; + +QLIST_FOREACH(s, _devices, link) { +// Update-ended interrupt enable +s->cmos_data[RTC_REG_B] |= REG_B_UIE; + +// Interrupt request flag | update interrupt flag +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..e9dd0f9c72 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq_broadcast(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..7d388a3753 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,25 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq-broadcast: +# +# Inject an RTC interrupt for all existing RTCs on the system. +# The interrupt forces the guest to synchronize the time with RTC. +# This is useful after a long stop-cont pause, which is common for +# serverless-type workload. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq-broadcast" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq-broadcast', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP
On 4/29/24 4:39 PM, Philippe Mathieu-Daudé wrote: (+Peter who has more experience on such design). On 29/4/24 13:32, Markus Armbruster wrote: Philippe Mathieu-Daudé writes: Hi Daniil, Markus, On 26/4/24 10:39, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. What is a "serverless-type workload"? That's when your VM instance only runs on demand for a few seconds before going to sleep again until the next user request comes in. Signed-off-by: Daniil Tatianin --- hw/rtc/mc146818rtc.c | 15 +++ include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 16 3 files changed, 32 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..6980a78d5f 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_notify(Error **errp) +{ + MC146818RtcState *s; + + /* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt What part of this document explains why this change is required? I probably missed it. Explaining it here briefly would be more useful. Sure, will do! + */ + QLIST_FOREACH(s, _devices, link) { + s->cmos_data[RTC_REG_B] |= REG_B_UIE; // Update-ended interrupt enable + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; // interrupt request flag // update interrupt flag + qemu_irq_raise(s->irq); + } +} + Note for later: qmp_rtc_notify() works on all realized mc146818rtc devices. Other kinds of RTC devices are silently ignored. Just like qmp_rtc_reset_reinjection(). IMO to avoid any future ambiguity (in heterogeneous machines), this command must take a QOM device path (or a list of) and only notify those. Let's compare: • With QOM path: · You need to know the machine's RTC device(s). Unfortunately, this is bothersome, as the QOM path is not stable. But we'll need more of that with dynamic machines... For Q35, it's generally "/machine/unattached/device[N]/rtc", but N varies with configuration (TCG N=2, KVM N=3 for me), and it might vary with machine type version. That's because the machine code creates ICH9-LPC without a proper name. We do that a lot. I hate it. Likewise for i440FX with PIIX3 instead of ICH9-LPC. For isapc, it's /machine/unattached/device[3]. I suspect the 3 isn't reliable there, either. microvm doesn't seem to have an RTC by default. · If the device so named doesn't support IRQ inject, the command should fail. Yes, why the management app would want to run this command if there are not RTC on the machine? · Could be generalized to non-RTC devices when that's useful. • Broadcast: · You don't need to know the machine's RTC device(s). · If there are multiple RTC devices that support IRQ inject, we inject for each of them. There is no way to select specific RTCs. · If there is no RTC device that supports IRQ inject, the command does nothing silently. I don't like silent failures. It could be made to fail instead. If it wasn't for the unstable QOM path problem, I'd advise against the broadcast interface. Thoughts? Something bugs me in this patch but I couldn't figure out what I am missing. The issue is when migrated VM is restored. I don't get why the behavior depends on an external decision (via external management transport). Don't we have post_load() hooks for such tuning? This device implements it in rtc_post_load(). This isn't really related to migration though. Serverless is based on constantly stopping and resuming the VM on e.g. every HTTP request to an endpoint. As far as Markus' comment goes, I propose we go the broadcast route, and just add the -broadcast suffix to the current command name. If everyone is okay with that, I will submit a v3. Thank you! Regards, Phil. PD: BTW tomorrow community call could be a good opportunity to discuss this.
Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP
On 4/29/24 12:40 PM, Philippe Mathieu-Daudé wrote: On 29/4/24 11:34, Daniil Tatianin wrote: On 4/29/24 11:51 AM, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json | 16 3 files changed, 37 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..8501b55cbd 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq(Error **errp) +{ + MC146818RtcState *s; + + /* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ + QLIST_FOREACH(s, _devices, link) { + s->cmos_data[RTC_REG_B] |= REG_B_UIE; + s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; + qemu_irq_raise(s->irq); + } +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..6cd9761d80 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..d84a5d07a2 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,22 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq: +# +# Inject an RTC interrupt. Your cover letter explains what this could be good for. Would it make sense to explain it here, too? Sure, sounds useful. I'll add a description in the next version. Please also see my comments on the previous patch: https://lore.kernel.org/qemu-devel/11c78645-e87b-4a43-8191-a73540c36...@linaro.org/ I think this makes sense, but there's already a similar command, which doesn't do it. Should that one be changed as well then? Or do we only change the interface for this one?
[PATCH v2] mc146818rtc: add a way to generate RTC interrupts via QMP
This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC Changes since v1: - Added a description below the QMP command to explain how it can be used and what it does. --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 18 ++ 3 files changed, 39 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..8501b55cbd 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq(Error **errp) +{ +MC146818RtcState *s; + +/* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ +QLIST_FOREACH(s, _devices, link) { +s->cmos_data[RTC_REG_B] |= REG_B_UIE; +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..6cd9761d80 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..0f2479f8f4 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,24 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq: +# +# Inject an RTC interrupt. This command forces the guest to synchornize +# the time with RTC. This is useful after a long stop-cont pause, which +# is common for serverless-type workload. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP
On 4/29/24 11:51 AM, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 16 3 files changed, 37 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..8501b55cbd 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq(Error **errp) +{ +MC146818RtcState *s; + +/* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ +QLIST_FOREACH(s, _devices, link) { +s->cmos_data[RTC_REG_B] |= REG_B_UIE; +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..6cd9761d80 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..d84a5d07a2 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,22 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq: +# +# Inject an RTC interrupt. Your cover letter explains what this could be good for. Would it make sense to explain it here, too? Sure, sounds useful. I'll add a description in the next version. Thanks +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq', + 'if': 'TARGET_I386' } + ## # @SevState: #
[PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP
This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Also add a comment to highlight the fact that this (and one other QMP command) only works for the MC146818 RTC controller. Signed-off-by: Daniil Tatianin --- Changes since v0: - Rename to rtc-inject-irq to match other similar API - Add a comment to highlight that this only works for the I386 RTC --- hw/rtc/mc146818rtc.c | 20 include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 16 3 files changed, 37 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..8501b55cbd 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s) static QLIST_HEAD(, MC146818RtcState) rtc_devices = QLIST_HEAD_INITIALIZER(rtc_devices); +/* + * NOTE: + * The two QMP functions below are _only_ implemented for the MC146818. + * All other RTC devices ignore this. + */ void qmp_rtc_reset_reinjection(Error **errp) { MC146818RtcState *s; @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_inject_irq(Error **errp) +{ +MC146818RtcState *s; + +/* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ +QLIST_FOREACH(s, _devices, link) { +s->cmos_data[RTC_REG_B] |= REG_B_UIE; +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..6cd9761d80 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_inject_irq(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..d84a5d07a2 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,22 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-inject-irq: +# +# Inject an RTC interrupt. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-inject-irq" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-inject-irq', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP
On 4/26/24 11:39 AM, Markus Armbruster wrote: Daniil Tatianin writes: This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Signed-off-by: Daniil Tatianin --- hw/rtc/mc146818rtc.c | 15 +++ include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 16 3 files changed, 32 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..6980a78d5f 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_notify(Error **errp) +{ +MC146818RtcState *s; + +/* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ +QLIST_FOREACH(s, _devices, link) { +s->cmos_data[RTC_REG_B] |= REG_B_UIE; +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; +qemu_irq_raise(s->irq); +} +} + Note for later: qmp_rtc_notify() works on all realized mc146818rtc devices. Other kinds of RTC devices are silently ignored. Just like qmp_rtc_reset_reinjection(). static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..5229dffbbd 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_notify(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..20457b0acc 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,22 @@ ## # @rtc-reset-reinjection: # # This command will reset the RTC interrupt reinjection backlog. Can # be used if another mechanism to synchronize guest time is in effect, # for example QEMU guest agent's guest-set-time command. # # Since: 2.1 # # Example: # # -> { "execute": "rtc-reset-reinjection" } # <- { "return": {} } ## { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-notify: +# +# Generate an RTC interrupt. Our QMP command to generate NMIs is called inject-nmi. Call this one inject-rtc-irq for consistency? rtc-inject-irq? This makes sense, I'll rename in the next version. Thanks. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-notify" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-notify', + 'if': 'TARGET_I386' } + As noted above, both commands silently ignore RTCs other than mc146818rtc. They're only available with TARGET_I386. As long as all machines available with TARGET_I386 can only ever contain mc146818rtc RTCs, ignoring other RTCs is a non-problem in practice. Feels a bit fragile to me. Thoughts? Feels a bit fragile indeed, but this code has been there since 2.1, and I guess no one really found this to be a problem. ## # @SevState: #
[PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP
This can be used to force-synchronize the time in guest after a long stop-cont pause, which can be useful for serverless-type workload. Signed-off-by: Daniil Tatianin --- hw/rtc/mc146818rtc.c | 15 +++ include/hw/rtc/mc146818rtc.h | 1 + qapi/misc-target.json| 16 3 files changed, 32 insertions(+) diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c index f4c1869232..6980a78d5f 100644 --- a/hw/rtc/mc146818rtc.c +++ b/hw/rtc/mc146818rtc.c @@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp) } } +void qmp_rtc_notify(Error **errp) +{ +MC146818RtcState *s; + +/* + * See: + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt + */ +QLIST_FOREACH(s, _devices, link) { +s->cmos_data[RTC_REG_B] |= REG_B_UIE; +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF; +qemu_irq_raise(s->irq); +} +} + static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s) { kvm_reset_irq_delivered(); diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h index 97cec0b3e8..5229dffbbd 100644 --- a/include/hw/rtc/mc146818rtc.h +++ b/include/hw/rtc/mc146818rtc.h @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int base_year, void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val); int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr); void qmp_rtc_reset_reinjection(Error **errp); +void qmp_rtc_notify(Error **errp); #endif /* HW_RTC_MC146818RTC_H */ diff --git a/qapi/misc-target.json b/qapi/misc-target.json index 4e0a6492a9..20457b0acc 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -19,6 +19,22 @@ { 'command': 'rtc-reset-reinjection', 'if': 'TARGET_I386' } +## +# @rtc-notify: +# +# Generate an RTC interrupt. +# +# Since: 9.1 +# +# Example: +# +# -> { "execute": "rtc-notify" } +# <- { "return": {} } +# +## +{ 'command': 'rtc-notify', + 'if': 'TARGET_I386' } + ## # @SevState: # -- 2.34.1
Re: [PATCH v2 1/3] i386/a-b-bootblock: factor test memory addresses out into constants
Ping! :) 19.09.2023, 13:23, "Daniil Tatianin" :So that we have less magic numbers to deal with. This also allows us toreuse these in the following commits.Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru>Reviewed-by: Peter Xu <pet...@redhat.com>--- tests/migration/i386/a-b-bootblock.S | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-)diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.Sindex 3d464c7568..036216e4a7 100644--- a/tests/migration/i386/a-b-bootblock.S+++ b/tests/migration/i386/a-b-bootblock.S@@ -34,6 +34,10 @@ start: # at 0x7c00 ? mov $16,%eax mov %eax,%ds +# Start from 1MB+.set TEST_MEM_START, (1024*1024)+.set TEST_MEM_END, (100*1024*1024)+ mov $65,%ax mov $0x3f8,%dx outb %al,%dx@@ -41,12 +45,11 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl mainloop:- # Start from 1MB- mov $(1024*1024),%eax+ mov $TEST_MEM_START,%eax innerloop: incb (%eax) add $4096,%eax- cmp $(100*1024*1024),%eax+ cmp $TEST_MEM_END,%eax jl innerloop inc %bl--2.34.1
Re: [PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start
27.09.2023, 00:02, "Peter Xu" :On Thu, Sep 07, 2023 at 10:29:42PM +0300, Daniil Tatianin wrote: This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant. We fix this by making sure we zero the first byte of every testable page in range beforehand.What's the difference between this one and:https://lore.kernel.org/r/20230907193051.1609310-1-d-tatia...@yandex-team.ru?There's a v2 version of this series that adds a fix for a similar issue in s390x as well, please look at that one instead.Thanks, --Peter Xu
Re: [PATCH 2/2] i386/a-b-bootblock: zero the first byte of each page on start
26.09.2023, 23:41, "Vladimir Sementsov-Ogievskiy" :On 07.09.23 22:29, Daniil Tatianin wrote: The migration qtest all the way up to this point used to work by sheer luck relying on the contents of all pages from 1MiB to 100MiB to contain the same one value in the first byte initially. This easily breaks if we reduce the amount of RAM for the test instances from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the HighMemory allocator since commit dc88f9b72df ("malloc: use large ZoneHigh when there is enough memory"). This would result in the following errors: 12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 2.74s killed by signal 6 SIGABRT stderr: Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb current = 9e hit_edge = 1 Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb current = 89 hit_edge = 1 Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb current = 23 hit_edge = 1 Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb current = 31 hit_edge = 1 Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb current = 70 hit_edge = 1 Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb current = ff hit_edge = 1 Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb current = 54 hit_edge = 1 Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb current = 64 hit_edge = 1 Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb current = 1d hit_edge = 1 Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb current = 1a hit_edge = 1 and in another 26 pages** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) Fix this by always zeroing the first byte of each page in the range so that we get consistent results no matter the initial contents. Fixes: ea0c6d62391 ("test: Postcopy") Signed-off-by: Daniil Tatianin <d-tatia...@yandex-team.ru> --- tests/migration/i386/a-b-bootblock.S | 9 + tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 036216e4a7..6bbd60 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -44,6 +44,15 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl + +pre_zero: + mov $TEST_MEM_START,%eax +do_zero: + movb $0, (%eax) + add $4096,%eax + cmp $TEST_MEM_END,%eax + jl do_zero + mainloop: mov $TEST_MEM_START,%eax innerloop: diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index b7b0fce2ee..5b523917ce 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,18 +4,18 @@* the header and the assembler differences in your patch submission.*/ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10, - 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, - 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, - 0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, - 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, - 0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, + 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, + 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, + 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, + 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, + 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,I understand the idea of patch, but don't follow why and how this bo
Re: [PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start
27.09.2023, 00:02, "Peter Xu" :On Thu, Sep 07, 2023 at 10:29:42PM +0300, Daniil Tatianin wrote: This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant. We fix this by making sure we zero the first byte of every testable page in range beforehand.What's the difference between this one and:https://lore.kernel.org/r/20230907193051.1609310-1-d-tatia...@yandex-team.ru?I think i initially forgot something and had to resend, but not sure. Anyway, disregard that one. --Peter Xu
Re: [PATCH v2 0/3] migration-qtest: zero the first byte of each page on start
ping :)
[PATCH v2 0/3] migration-qtest: zero the first byte of each page on start
This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant. We fix this by making sure we zero the first byte of every testable page in range beforehand. Changes since v1: - Add a fix for the s390x test binary as well as suggested by Peter Xu Daniil Tatianin (3): i386/a-b-bootblock: factor test memory addresses out into constants i386/a-b-bootblock: zero the first byte of each page on start s390x/a-b-bios: zero the first byte of each page on start tests/migration/i386/a-b-bootblock.S | 18 +- tests/migration/i386/a-b-bootblock.h | 16 +- tests/migration/s390x/a-b-bios.c | 8 + tests/migration/s390x/a-b-bios.h | 380 ++- 4 files changed, 234 insertions(+), 188 deletions(-) -- 2.34.1
[PATCH v2 2/3] i386/a-b-bootblock: zero the first byte of each page on start
The migration qtest all the way up to this point used to work by sheer luck relying on the contents of all pages from 1MiB to 100MiB to contain the same one value in the first byte initially. This easily breaks if we reduce the amount of RAM for the test instances from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the HighMemory allocator since commit dc88f9b72df ("malloc: use large ZoneHigh when there is enough memory"). This would result in the following errors: 12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 2.74s killed by signal 6 SIGABRT stderr: Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb current = 9e hit_edge = 1 Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb current = 89 hit_edge = 1 Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb current = 23 hit_edge = 1 Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb current = 31 hit_edge = 1 Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb current = 70 hit_edge = 1 Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb current = ff hit_edge = 1 Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb current = 54 hit_edge = 1 Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb current = 64 hit_edge = 1 Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb current = 1d hit_edge = 1 Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb current = 1a hit_edge = 1 and in another 26 pages** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) Fix this by always zeroing the first byte of each page in the range so that we get consistent results no matter the initial contents. Fixes: ea0c6d62391 ("test: Postcopy") Signed-off-by: Daniil Tatianin Reviewed-by: Peter Xu --- tests/migration/i386/a-b-bootblock.S | 9 + tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 036216e4a7..6bbd60 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -44,6 +44,15 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl + +pre_zero: +mov $TEST_MEM_START,%eax +do_zero: +movb $0, (%eax) +add $4096,%eax +cmp $TEST_MEM_END,%eax +jl do_zero + mainloop: mov $TEST_MEM_START,%eax innerloop: diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index b7b0fce2ee..5b523917ce 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,18 +4,18 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10, - 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, - 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, - 0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, - 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, - 0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, + 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, + 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, + 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, + 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, + 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, -- 2.34.1
[PATCH v2 3/3] s390x/a-b-bios: zero the first byte of each page on start
Same as with the x86 verison of this test, we relied on the contents of all pages in RAM to be the same across the entire test range, which is very fragile. Zero the first byte of each page before running the increment loop to fix this. Fixes: 5571dc824b ("tests/migration: Enable the migration test on s390x, too") Signed-off-by: Daniil Tatianin --- tests/migration/s390x/a-b-bios.c | 8 + tests/migration/s390x/a-b-bios.h | 380 +-- 2 files changed, 211 insertions(+), 177 deletions(-) diff --git a/tests/migration/s390x/a-b-bios.c b/tests/migration/s390x/a-b-bios.c index a0327cd153..ff99a3ef57 100644 --- a/tests/migration/s390x/a-b-bios.c +++ b/tests/migration/s390x/a-b-bios.c @@ -27,6 +27,14 @@ void main(void) sclp_setup(); sclp_print("A"); +/* + * Make sure all of the pages have consistent contents before incrementing + * the first byte below. + */ +for (addr = START_ADDRESS; addr < END_ADDRESS; addr += 4096) { +*(volatile char *)addr = 0; +} + while (1) { for (addr = START_ADDRESS; addr < END_ADDRESS; addr += 4096) { *(volatile char *)addr += 1; /* Change pages */ diff --git a/tests/migration/s390x/a-b-bios.h b/tests/migration/s390x/a-b-bios.h index e722dc7c40..96103dadbb 100644 --- a/tests/migration/s390x/a-b-bios.h +++ b/tests/migration/s390x/a-b-bios.h @@ -6,10 +6,10 @@ unsigned char s390x_elf[] = { 0x7f, 0x45, 0x4c, 0x46, 0x02, 0x02, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x16, 0x00, 0x00, 0x00, 0x01, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x78, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x80, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xa8, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x07, 0x00, 0x40, - 0x00, 0x0c, 0x00, 0x0b, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x04, + 0x00, 0x0d, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x88, 0x00, 0x00, 0x00, 0x00, @@ -21,140 +21,154 @@ unsigned char s390x_elf[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x0c, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x0c, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xac, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xac, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x06, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x98, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x17, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x01, 0x18, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, + 0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0xb8, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x01, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x64, 0x74, 0xe5, 0x51, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x64, 0x74, 0xe5, 0x52, 0x00, 0x00, 0x00, 0x04, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x10, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x10, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xb8, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x17, 0xb8, 0x00, 0x
[PATCH v2 1/3] i386/a-b-bootblock: factor test memory addresses out into constants
So that we have less magic numbers to deal with. This also allows us to reuse these in the following commits. Signed-off-by: Daniil Tatianin Reviewed-by: Peter Xu --- tests/migration/i386/a-b-bootblock.S | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 3d464c7568..036216e4a7 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -34,6 +34,10 @@ start: # at 0x7c00 ? mov $16,%eax mov %eax,%ds +# Start from 1MB +.set TEST_MEM_START, (1024*1024) +.set TEST_MEM_END, (100*1024*1024) + mov $65,%ax mov $0x3f8,%dx outb %al,%dx @@ -41,12 +45,11 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl mainloop: -# Start from 1MB -mov $(1024*1024),%eax +mov $TEST_MEM_START,%eax innerloop: incb (%eax) add $4096,%eax -cmp $(100*1024*1024),%eax +cmp $TEST_MEM_END,%eax jl innerloop inc %bl -- 2.34.1
Re: [PATCH v1 0/2] i386/a-b-bootblock: zero the first byte of each page on start
ping 07.09.2023, 22:31, "Daniil Tatianin" :This series fixes an issue where the outcome of the migration qtestrelies on the initial memory contents all being the same across thefirst 100MiB of RAM, which is a very fragile invariant.We fix this by making sure we zero the first byte of every testable pagein range beforehand.Daniil Tatianin (2): i386/a-b-bootblock: factor test memory addresses out into constants i386/a-b-bootblock: zero the first byte of each page on start tests/migration/i386/a-b-bootblock.S | 18 +++--- tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 23 insertions(+), 11 deletions(-) --2.34.1
[PATCH v1 0/2] i386/a-b-bootblock: zero the first byte of each page on start
This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant. We fix this by making sure we zero the first byte of every testable page in range beforehand. Daniil Tatianin (2): i386/a-b-bootblock: factor test memory addresses out into constants i386/a-b-bootblock: zero the first byte of each page on start tests/migration/i386/a-b-bootblock.S | 18 +++--- tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 23 insertions(+), 11 deletions(-) -- 2.34.1
[PATCH v1 2/2] i386/a-b-bootblock: zero the first byte of each page on start
The migration qtest all the way up to this point used to work by sheer luck relying on the contents of all pages from 1MiB to 100MiB to contain the same one value in the first byte initially. This easily breaks if we reduce the amount of RAM for the test instances from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the HighMemory allocator since commit dc88f9b72df ("malloc: use large ZoneHigh when there is enough memory"). This would result in the following errors: 12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 2.74s killed by signal 6 SIGABRT stderr: Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb current = 9e hit_edge = 1 Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb current = 89 hit_edge = 1 Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb current = 23 hit_edge = 1 Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb current = 31 hit_edge = 1 Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb current = 70 hit_edge = 1 Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb current = ff hit_edge = 1 Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb current = 54 hit_edge = 1 Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb current = 64 hit_edge = 1 Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb current = 1d hit_edge = 1 Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb current = 1a hit_edge = 1 and in another 26 pages** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) Fix this by always zeroing the first byte of each page in the range so that we get consistent results no matter the initial contents. Fixes: ea0c6d62391 ("test: Postcopy") Signed-off-by: Daniil Tatianin --- tests/migration/i386/a-b-bootblock.S | 9 + tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 036216e4a7..6bbd60 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -44,6 +44,15 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl + +pre_zero: +mov $TEST_MEM_START,%eax +do_zero: +movb $0, (%eax) +add $4096,%eax +cmp $TEST_MEM_END,%eax +jl do_zero + mainloop: mov $TEST_MEM_START,%eax innerloop: diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index b7b0fce2ee..5b523917ce 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,18 +4,18 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10, - 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, - 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, - 0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, - 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, - 0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, + 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, + 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, + 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, + 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, + 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, -- 2.34.1
[PATCH v1 1/2] i386/a-b-bootblock: factor test memory addresses out into constants
So that we have less magic numbers to deal with. This also allows us to reuse these in the following commits. Signed-off-by: Daniil Tatianin --- tests/migration/i386/a-b-bootblock.S | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 3d464c7568..036216e4a7 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -34,6 +34,10 @@ start: # at 0x7c00 ? mov $16,%eax mov %eax,%ds +# Start from 1MB +.set TEST_MEM_START, (1024*1024) +.set TEST_MEM_END, (100*1024*1024) + mov $65,%ax mov $0x3f8,%dx outb %al,%dx @@ -41,12 +45,11 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl mainloop: -# Start from 1MB -mov $(1024*1024),%eax +mov $TEST_MEM_START,%eax innerloop: incb (%eax) add $4096,%eax -cmp $(100*1024*1024),%eax +cmp $TEST_MEM_END,%eax jl innerloop inc %bl -- 2.34.1
[PATCH 2/2] i386/a-b-bootblock: zero the first byte of each page on start
The migration qtest all the way up to this point used to work by sheer luck relying on the contents of all pages from 1MiB to 100MiB to contain the same one value in the first byte initially. This easily breaks if we reduce the amount of RAM for the test instances from 150MiB to e.g 110MiB since that makes SeaBIOS dirty some of the pages starting at about 0x5dd2000 (~93 MiB) as it reuses those for the HighMemory allocator since commit dc88f9b72df ("malloc: use large ZoneHigh when there is enough memory"). This would result in the following errors: 12/60 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 2.74s killed by signal 6 SIGABRT stderr: Memory content inconsistency at 5dd2000 first_byte = cc last_byte = cb current = 9e hit_edge = 1 Memory content inconsistency at 5dd3000 first_byte = cc last_byte = cb current = 89 hit_edge = 1 Memory content inconsistency at 5dd4000 first_byte = cc last_byte = cb current = 23 hit_edge = 1 Memory content inconsistency at 5dd5000 first_byte = cc last_byte = cb current = 31 hit_edge = 1 Memory content inconsistency at 5dd6000 first_byte = cc last_byte = cb current = 70 hit_edge = 1 Memory content inconsistency at 5dd7000 first_byte = cc last_byte = cb current = ff hit_edge = 1 Memory content inconsistency at 5dd8000 first_byte = cc last_byte = cb current = 54 hit_edge = 1 Memory content inconsistency at 5dd9000 first_byte = cc last_byte = cb current = 64 hit_edge = 1 Memory content inconsistency at 5dda000 first_byte = cc last_byte = cb current = 1d hit_edge = 1 Memory content inconsistency at 5ddb000 first_byte = cc last_byte = cb current = 1a hit_edge = 1 and in another 26 pages** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) Fix this by always zeroing the first byte of each page in the range so that we get consistent results no matter the initial contents. Fixes: ea0c6d62391 ("test: Postcopy") Signed-off-by: Daniil Tatianin --- tests/migration/i386/a-b-bootblock.S | 9 + tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 036216e4a7..6bbd60 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -44,6 +44,15 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl + +pre_zero: +mov $TEST_MEM_START,%eax +do_zero: +movb $0, (%eax) +add $4096,%eax +cmp $TEST_MEM_END,%eax +jl do_zero + mainloop: mov $TEST_MEM_START,%eax innerloop: diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h index b7b0fce2ee..5b523917ce 100644 --- a/tests/migration/i386/a-b-bootblock.h +++ b/tests/migration/i386/a-b-bootblock.h @@ -4,18 +4,18 @@ * the header and the assembler differences in your patch submission. */ unsigned char x86_bootsect[] = { - 0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, + 0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00, 0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02, 0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10, - 0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, - 0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, - 0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, - 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, - 0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xc6, 0x00, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, + 0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05, + 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe, + 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba, + 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, + 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, -- 2.34.1
[PATCH 1/2] i386/a-b-bootblock: factor test memory addresses out into constants
So that we have less magic numbers to deal with. This also allows us to reuse these in the following commits. Signed-off-by: Daniil Tatianin --- tests/migration/i386/a-b-bootblock.S | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S index 3d464c7568..036216e4a7 100644 --- a/tests/migration/i386/a-b-bootblock.S +++ b/tests/migration/i386/a-b-bootblock.S @@ -34,6 +34,10 @@ start: # at 0x7c00 ? mov $16,%eax mov %eax,%ds +# Start from 1MB +.set TEST_MEM_START, (1024*1024) +.set TEST_MEM_END, (100*1024*1024) + mov $65,%ax mov $0x3f8,%dx outb %al,%dx @@ -41,12 +45,11 @@ start: # at 0x7c00 ? # bl keeps a counter so we limit the output speed mov $0, %bl mainloop: -# Start from 1MB -mov $(1024*1024),%eax +mov $TEST_MEM_START,%eax innerloop: incb (%eax) add $4096,%eax -cmp $(100*1024*1024),%eax +cmp $TEST_MEM_END,%eax jl innerloop inc %bl -- 2.34.1
[PATCH 0/2] i386/a-b-bootblock: zero the first byte of each page on start
This series fixes an issue where the outcome of the migration qtest relies on the initial memory contents all being the same across the first 100MiB of RAM, which is a very fragile invariant. We fix this by making sure we zero the first byte of every testable page in range beforehand. Daniil Tatianin (2): i386/a-b-bootblock: factor test memory addresses out into constants i386/a-b-bootblock: zero the first byte of each page on start tests/migration/i386/a-b-bootblock.S | 18 +++--- tests/migration/i386/a-b-bootblock.h | 16 2 files changed, 23 insertions(+), 11 deletions(-) -- 2.34.1
Re: [PATCH] replication: compile out some staff when replication is not configured
Just a few minor nits On 4/11/23 5:51 PM, Vladimir Sementsov-Ogievskiy wrote: Don't compile-in replication-related files when replication is disabled in config. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi all! I'm unsure, should it be actually separate --disable-colo / --enable-colo options or it's really used only together with replication staff.. So, I decided to start with simpler variant. You probably meant 'stuff' and not 'staff' in the commit message and here as well? block/meson.build | 2 +- migration/meson.build | 6 -- net/meson.build | 8 qapi/migration.json | 6 -- stubs/colo.c | 46 +++ stubs/meson.build | 1 + 6 files changed, 60 insertions(+), 9 deletions(-) create mode 100644 stubs/colo.c diff --git a/block/meson.build b/block/meson.build index 382bec0e7d..b9a72e219b 100644 --- a/block/meson.build +++ b/block/meson.build @@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c') block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit]) block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c')) block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c')) -if not get_option('replication').disabled() +if get_option('replication').allowed() block_ss.add(files('replication.c')) endif block_ss.add(when: libaio, if_true: files('linux-aio.c')) diff --git a/migration/meson.build b/migration/meson.build index 0d1bb9f96e..8180eaea7b 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -13,8 +13,6 @@ softmmu_ss.add(files( 'block-dirty-bitmap.c', 'channel.c', 'channel-block.c', - 'colo-failover.c', - 'colo.c', 'exec.c', 'fd.c', 'global_state.c', @@ -29,6 +27,10 @@ softmmu_ss.add(files( 'threadinfo.c', ), gnutls) +if get_option('replication').allowed() + softmmu_ss.add(files('colo.c', 'colo-failover.c')) +endif + softmmu_ss.add(when: rdma, if_true: files('rdma.c')) if get_option('live_block_migration').allowed() softmmu_ss.add(files('block.c')) diff --git a/net/meson.build b/net/meson.build index 87afca3e93..634ab71cc6 100644 --- a/net/meson.build +++ b/net/meson.build @@ -1,13 +1,9 @@ softmmu_ss.add(files( 'announce.c', 'checksum.c', - 'colo-compare.c', - 'colo.c', 'dump.c', 'eth.c', 'filter-buffer.c', - 'filter-mirror.c', - 'filter-rewriter.c', 'filter.c', 'hub.c', 'net-hmp-cmds.c', @@ -19,6 +15,10 @@ softmmu_ss.add(files( 'util.c', )) +if get_option('replication').allowed() + softmmu_ss.add(files('colo-compare.c', 'colo.c', 'filter-rewriter.c', 'filter-mirror.c')) +endif + softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c')) if have_l2tpv3 diff --git a/qapi/migration.json b/qapi/migration.json index c84fa10e86..5b81e09369 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1685,7 +1685,8 @@ ## { 'struct': 'COLOStatus', 'data': { 'mode': 'COLOMode', 'last-mode': 'COLOMode', -'reason': 'COLOExitReason' } } +'reason': 'COLOExitReason' }, + 'if': 'CONFIG_REPLICATION' } ## # @query-colo-status: @@ -1702,7 +1703,8 @@ # Since: 3.1 ## { 'command': 'query-colo-status', - 'returns': 'COLOStatus' } + 'returns': 'COLOStatus', + 'if': 'CONFIG_REPLICATION' } ## # @migrate-recover: diff --git a/stubs/colo.c b/stubs/colo.c new file mode 100644 index 00..5a02540baa --- /dev/null +++ b/stubs/colo.c @@ -0,0 +1,46 @@ +#include "qemu/osdep.h" +#include "qemu/notify.h" +#include "net/colo-compare.h" +#include "migration/colo.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-migration.h" + +void colo_compare_cleanup(void) +{ +abort(); +} + +void colo_shutdown(void) +{ +abort(); +} + +void *colo_process_incoming_thread(void *opaque) +{ +abort(); +} + +void colo_checkpoint_notify(void *opaque) +{ +abort(); +} + +void migrate_start_colo_process(MigrationState *s) +{ +abort(); +} + +bool migration_in_colo_state(void) +{ +return false; +} + +bool migration_incoming_in_colo_state(void) +{ +return false; +} + +void qmp_x_colo_lost_heartbeat(Error **errp) +{ +error_setg(errp, "COLO support is not built in"); Maybe 'built-in' with a dash for consistency with usb-dev-stub? +} diff --git a/stubs/meson.build b/stubs/meson.build index b2b5956d97..8412cad15f 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -45,6 +45,7 @@ stub_ss.add(files('target-get-monitor-def.c')) stub_ss.add(files('target-monitor-defs.c')) stub_ss.add(files('trace-control.c')) stub_ss.add(files('uuid.c')) +stub_ss.add(files('colo.c')) stub_ss.add(files('vmstate.c')) stub_ss.add(files('vm-stop.c')) stub_ss.add(files('win32-kbd-hook.c'))
Re: [PATCH 0/2] virtio-scsi: stop using aio_disable_external() during unplug
On 3/23/23 9:56 PM, Stefan Hajnoczi wrote: The aio_disable_external() API is a solution for stopping I/O during critical sections. The newer BlockDevOps->drained_begin/end/poll() callbacks offer a cleaner solution that supports the upcoming multi-queue block layer. This series removes aio_disable_external() from the virtio-scsi emulation code. Patch 1 is a fix for something I noticed when reading the code. Patch 2 replaces aio_disable_external() with functionality for safe hot unplug that's mostly already present in the SCSI emulation code. Stefan Hajnoczi (2): virtio-scsi: avoid race between unplug and transport event virtio-scsi: stop using aio_disable_external() during unplug hw/scsi/scsi-bus.c| 3 ++- hw/scsi/scsi-disk.c | 1 + hw/scsi/virtio-scsi.c | 21 + 3 files changed, 12 insertions(+), 13 deletions(-) For both patches: Reviewed-by: Daniil Tatianin
Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout
On 1/23/23 4:47 PM, Daniel P. Berrangé wrote: On Mon, Jan 23, 2023 at 04:30:03PM +0300, Daniil Tatianin wrote: On 1/23/23 11:57 AM, David Hildenbrand wrote: On 20.01.23 14:47, Daniil Tatianin wrote: This series introduces new qemu_prealloc_mem_with_timeout() api, which allows limiting the maximum amount of time to be spent on memory preallocation. It also adds prealloc statistics collection that is exposed via an optional timeout handler. This new api is then utilized by hostmem for guest RAM preallocation controlled via new object properties called 'prealloc-timeout' and 'prealloc-timeout-fatal'. This is useful for limiting VM startup time on systems with unpredictable page allocation delays due to memory fragmentation or the backing storage. The timeout can be configured to either simply emit a warning and continue VM startup without having preallocated the entire guest RAM or just abort startup entirely if that is not acceptable for a specific use case. The major use case for preallocation is memory resources that cannot be overcommitted (hugetlb, file blocks, ...), to avoid running out of such resources later, while the guest is already running, and crashing it. Wouldn't you say that preallocating memory for the sake of speeding up guest kernel startup & runtime is a valid use case of prealloc? This way we can avoid expensive (for a multitude of reasons) page faults that will otherwise slow down the guest significantly at runtime and affect the user experience. Allocating only a fraction "because it takes too long" looks quite useless in that (main use-case) context. We shouldn't encourage QEMU users to play with fire in such a way. IOW, there should be no way around "prealloc-timeout-fatal". Either preallocation succeeded and the guest can run, or it failed, and the guest can't run. Here we basically accept the fact that e.g with fragmented memory the kernel might take a while in a page fault handler especially for hugetlb because of page compaction that has to run for every fault. This way we can prefault at least some number of pages and let the guest fault the rest on demand later on during runtime even if it's slow and would cause a noticeable lag. Rather than treat this as a problem that needs a timeout, can we restate it as situations need synchronous vs asynchronous preallocation ? For the case where we need synchronous prealloc, current QEMU deals with that. If it doesn't work quickly enough, mgmt can just kill QEMU already today. For the case where you would like some prealloc, but don't mind if it runs without full prealloc, then why not just treat it as an entirely asynchronous task ? Instead of calling qemu_prealloc_mem and waiting for it to complete, just spawn a thread to run qemu_prealloc_mem, so it doesn't block QEMU startup. This will have minimal maint burden on the existing code, and will avoid need for mgmt apps to think about what timeout value to give, which is good because timeouts are hard to get right. Most of the time that async background prealloc will still finish before the guest even gets out of the firmware phase, but if it takes longer it is no big deal. You don't need to quit the prealloc job early, you just need it to not delay the guest OS boot IIUC. This impl could be done with the 'prealloc' property turning from a boolean on/off, to a enum on/async/off, where 'on' == sync prealloc. Or add a separate 'prealloc-async' bool property I like this idea, but I'm not sure how we would go about writing to live guest memory. Is that something that can be done safely without racing with the guest? With regards, Daniel
Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout
On 1/23/23 11:57 AM, David Hildenbrand wrote: On 20.01.23 14:47, Daniil Tatianin wrote: This series introduces new qemu_prealloc_mem_with_timeout() api, which allows limiting the maximum amount of time to be spent on memory preallocation. It also adds prealloc statistics collection that is exposed via an optional timeout handler. This new api is then utilized by hostmem for guest RAM preallocation controlled via new object properties called 'prealloc-timeout' and 'prealloc-timeout-fatal'. This is useful for limiting VM startup time on systems with unpredictable page allocation delays due to memory fragmentation or the backing storage. The timeout can be configured to either simply emit a warning and continue VM startup without having preallocated the entire guest RAM or just abort startup entirely if that is not acceptable for a specific use case. The major use case for preallocation is memory resources that cannot be overcommitted (hugetlb, file blocks, ...), to avoid running out of such resources later, while the guest is already running, and crashing it. Wouldn't you say that preallocating memory for the sake of speeding up guest kernel startup & runtime is a valid use case of prealloc? This way we can avoid expensive (for a multitude of reasons) page faults that will otherwise slow down the guest significantly at runtime and affect the user experience. Allocating only a fraction "because it takes too long" looks quite useless in that (main use-case) context. We shouldn't encourage QEMU users to play with fire in such a way. IOW, there should be no way around "prealloc-timeout-fatal". Either preallocation succeeded and the guest can run, or it failed, and the guest can't run. Here we basically accept the fact that e.g with fragmented memory the kernel might take a while in a page fault handler especially for hugetlb because of page compaction that has to run for every fault. This way we can prefault at least some number of pages and let the guest fault the rest on demand later on during runtime even if it's slow and would cause a noticeable lag. ... but then, management tools can simply start QEMU with "-S", start an own timer, and zap QEMU if it didn't manage to come up in time, and simply start a new QEMU instance without preallocation enabled. The "good" thing about that approach is that it will also cover any implicit memory preallocation, like using mlock() or VFIO, that don't run in ordinary per-hostmem preallocation context. If setting QEMU up takes to long, you might want to try on a different hypervisor in your cluster instead. This approach definitely works too but again it assumes that we always want 'prealloc-timeout-fatal' to be on, which is, for the most part only the case for working around issues that might be caused by overcommit. I don't immediately see why we want to make our preallcoation+hostmem implementation in QEMU more complicated for such a use case.
[PATCH 3/4] backends/hostmem: add an ability to specify prealloc timeout
Use the new qemu_prealloc_mem_with_timeout api so that we can limit the maximum amount of time to be spent preallocating guest RAM. We also emit a warning from the timeout handler detailing the current prealloc progress and letting the user know that it was exceeded. The timeout is set to zero (no timeout) by default, and can be configured via the new 'prealloc-timeout' property. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 48 ++-- include/sysemu/hostmem.h | 2 ++ qapi/qom.json| 4 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 842bfa9eb7..be9af7515e 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -34,6 +34,19 @@ QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND); QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE); #endif +static void +host_memory_on_prealloc_timeout(void *opaque, +const PreallocStats *stats) +{ +HostMemoryBackend *backend = opaque; + +backend->prealloc_did_timeout = true; +warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, " +"allocated %zu/%zu (%zu byte) pages (%d threads)", +(uint64_t)stats->seconds_elapsed, stats->allocated_pages, +stats->total_pages, stats->page_size, stats->threads); +} + char * host_memory_backend_get_name(HostMemoryBackend *backend) { @@ -223,8 +236,26 @@ static bool do_prealloc_mr(HostMemoryBackend *backend, Error **errp) void *ptr = memory_region_get_ram_ptr(>mr); uint64_t sz = memory_region_size(>mr); -qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, - backend->prealloc_context, _err); +if (backend->prealloc_timeout) { +PreallocTimeout timeout = { +.seconds = (time_t)backend->prealloc_timeout, +.user = backend, +.on_timeout = host_memory_on_prealloc_timeout, +}; + +qemu_prealloc_mem_with_timeout(fd, ptr, sz, backend->prealloc_threads, + backend->prealloc_context, , + _err); +if (local_err && backend->prealloc_did_timeout) { +error_free(local_err); +local_err = NULL; +} +} else { +qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, + backend->prealloc_context, _err); +} + + if (local_err) { error_propagate(errp, local_err); return false; @@ -277,6 +308,13 @@ static void host_memory_backend_set_prealloc_threads(Object *obj, Visitor *v, backend->prealloc_threads = value; } +static void host_memory_backend_get_set_prealloc_timeout(Object *obj, +Visitor *v, const char *name, void *opaque, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(obj); +visit_type_uint32(v, name, >prealloc_timeout, errp); +} + static void host_memory_backend_init(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -516,6 +554,12 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) object_property_allow_set_link, OBJ_PROP_LINK_STRONG); object_class_property_set_description(oc, "prealloc-context", "Context to use for creating CPU threads for preallocation"); +object_class_property_add(oc, "prealloc-timeout", "int", +host_memory_backend_get_set_prealloc_timeout, +host_memory_backend_get_set_prealloc_timeout, +NULL, NULL); +object_class_property_set_description(oc, "prealloc-timeout", +"Maximum memory preallocation timeout in seconds"); object_class_property_add(oc, "size", "int", host_memory_backend_get_size, host_memory_backend_set_size, diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h index 39326f1d4f..21910f3b45 100644 --- a/include/sysemu/hostmem.h +++ b/include/sysemu/hostmem.h @@ -66,7 +66,9 @@ struct HostMemoryBackend { uint64_t size; bool merge, dump, use_canonical_path; bool prealloc, is_mapped, share, reserve; +bool prealloc_did_timeout; uint32_t prealloc_threads; +uint32_t prealloc_timeout; ThreadContext *prealloc_context; DECLARE_BITMAP(host_nodes, MAX_NODES + 1); HostMemPolicy policy; diff --git a/qapi/qom.json b/qapi/qom.json index 30e76653ad..9149c064b8 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -581,6 +581,9 @@ # @prealloc-context: thread context to use for creation of preallocation threads #(default: none) (since 7.2) # +# @prealloc-timeout: Maximum memory preallocation timeout in seconds +#(default: 0) (since 7.3) +# # @share: if false, the memory is private to QEMU; if true
[PATCH 2/4] backends/hostmem: move memory region preallocation logic into a helper
...so that we don't have to duplicate it in multiple places throughout the file. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 38 -- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 747e7838c0..842bfa9eb7 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -216,10 +216,26 @@ static bool host_memory_backend_get_prealloc(Object *obj, Error **errp) return backend->prealloc; } +static bool do_prealloc_mr(HostMemoryBackend *backend, Error **errp) +{ +Error *local_err = NULL; +int fd = memory_region_get_fd(>mr); +void *ptr = memory_region_get_ram_ptr(>mr); +uint64_t sz = memory_region_size(>mr); + +qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, + backend->prealloc_context, _err); +if (local_err) { +error_propagate(errp, local_err); +return false; +} + +return true; +} + static void host_memory_backend_set_prealloc(Object *obj, bool value, Error **errp) { -Error *local_err = NULL; HostMemoryBackend *backend = MEMORY_BACKEND(obj); if (!backend->reserve && value) { @@ -233,17 +249,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, } if (value && !backend->prealloc) { -int fd = memory_region_get_fd(>mr); -void *ptr = memory_region_get_ram_ptr(>mr); -uint64_t sz = memory_region_size(>mr); - -qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, - backend->prealloc_context, _err); -if (local_err) { -error_propagate(errp, local_err); -return; -} -backend->prealloc = true; +backend->prealloc = do_prealloc_mr(backend, errp); } } @@ -399,12 +405,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) * specified NUMA policy in place. */ if (backend->prealloc) { -qemu_prealloc_mem(memory_region_get_fd(>mr), ptr, sz, - backend->prealloc_threads, - backend->prealloc_context, _err); -if (local_err) { -goto out; -} +do_prealloc_mr(backend, errp); +return; } } out: -- 2.25.1
[PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout
This series introduces new qemu_prealloc_mem_with_timeout() api, which allows limiting the maximum amount of time to be spent on memory preallocation. It also adds prealloc statistics collection that is exposed via an optional timeout handler. This new api is then utilized by hostmem for guest RAM preallocation controlled via new object properties called 'prealloc-timeout' and 'prealloc-timeout-fatal'. This is useful for limiting VM startup time on systems with unpredictable page allocation delays due to memory fragmentation or the backing storage. The timeout can be configured to either simply emit a warning and continue VM startup without having preallocated the entire guest RAM or just abort startup entirely if that is not acceptable for a specific use case. Daniil Tatianin (4): oslib: introduce new qemu_prealloc_mem_with_timeout() api backends/hostmem: move memory region preallocation logic into a helper backends/hostmem: add an ability to specify prealloc timeout backends/hostmem: add an ability to make prealloc timeout fatal backends/hostmem.c | 112 +++--- include/qemu/osdep.h | 19 +++ include/sysemu/hostmem.h | 3 ++ qapi/qom.json| 8 +++ util/oslib-posix.c | 114 +++ util/oslib-win32.c | 9 6 files changed, 238 insertions(+), 27 deletions(-) -- 2.25.1
[PATCH 4/4] backends/hostmem: add an ability to make prealloc timeout fatal
This is controlled via the new 'prealloc-timeout-fatal' property and can be useful for cases when we cannot afford to not preallocate all guest pages while being time constrained. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 38 ++ include/sysemu/hostmem.h | 1 + qapi/qom.json| 4 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index be9af7515e..0808dc6951 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -39,12 +39,21 @@ host_memory_on_prealloc_timeout(void *opaque, const PreallocStats *stats) { HostMemoryBackend *backend = opaque; +const char *msg = "HostMemory preallocation timeout %"PRIu64"s exceeded, " + "allocated %zu/%zu (%zu byte) pages (%d threads)"; + +if (backend->prealloc_timeout_fatal) { +error_report(msg, (uint64_t)stats->seconds_elapsed, + stats->allocated_pages, stats->total_pages, + stats->page_size, stats->threads); +exit(1); + +} backend->prealloc_did_timeout = true; -warn_report("HostMemory preallocation timeout %"PRIu64"s exceeded, " -"allocated %zu/%zu (%zu byte) pages (%d threads)", -(uint64_t)stats->seconds_elapsed, stats->allocated_pages, -stats->total_pages, stats->page_size, stats->threads); +warn_report(msg, (uint64_t)stats->seconds_elapsed, +stats->allocated_pages, stats->total_pages, +stats->page_size, stats->threads); } char * @@ -315,6 +324,22 @@ static void host_memory_backend_get_set_prealloc_timeout(Object *obj, visit_type_uint32(v, name, >prealloc_timeout, errp); } +static bool host_memory_backend_get_prealloc_timeout_fatal( +Object *obj, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(obj); + +return backend->prealloc_timeout_fatal; +} + +static void host_memory_backend_set_prealloc_timeout_fatal( +Object *obj, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(obj); + +backend->prealloc_timeout_fatal = value; +} + static void host_memory_backend_init(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -560,6 +585,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "prealloc-timeout", "Maximum memory preallocation timeout in seconds"); +object_class_property_add_bool(oc, "prealloc-timeout-fatal", +host_memory_backend_get_prealloc_timeout_fatal, +host_memory_backend_set_prealloc_timeout_fatal); +object_class_property_set_description(oc, "prealloc-timeout-fatal", +"Consider preallocation timeout a fatal error"); object_class_property_add(oc, "size", "int", host_memory_backend_get_size, host_memory_backend_set_size, diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h index 21910f3b45..b501b5eff2 100644 --- a/include/sysemu/hostmem.h +++ b/include/sysemu/hostmem.h @@ -67,6 +67,7 @@ struct HostMemoryBackend { bool merge, dump, use_canonical_path; bool prealloc, is_mapped, share, reserve; bool prealloc_did_timeout; +bool prealloc_timeout_fatal; uint32_t prealloc_threads; uint32_t prealloc_timeout; ThreadContext *prealloc_context; diff --git a/qapi/qom.json b/qapi/qom.json index 9149c064b8..70644d714b 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -584,6 +584,9 @@ # @prealloc-timeout: Maximum memory preallocation timeout in seconds #(default: 0) (since 7.3) # +# @prealloc-timeout-fatal: Consider preallocation timeout a fatal error +# (default: false) (since 7.3) +# # @share: if false, the memory is private to QEMU; if true, it is shared # (default: false) # @@ -616,6 +619,7 @@ '*prealloc-threads': 'uint32', '*prealloc-context': 'str', '*prealloc-timeout': 'uint32', +'*prealloc-timeout-fatal': 'bool', '*share': 'bool', '*reserve': 'bool', 'size': 'size', -- 2.25.1
[PATCH 1/4] oslib: introduce new qemu_prealloc_mem_with_timeout() api
This helper allows limiting the maximum amount time to be spent preallocating a block of memory, which is important on systems that might have unpredictable page allocation delays because of possible fragmentation or other reasons specific to the backend. It also exposes a way to register a callback that is invoked in case the specified timeout is exceeded. The callback is provided with a PreallocStats structure that includes a bunch of statistics about the progress including total & allocated number of pages, as well as page size and number of allocation threads. The win32 implementation is currently a stub that just calls into the old qemu_prealloc_mem api. Signed-off-by: Daniil Tatianin --- include/qemu/osdep.h | 19 util/oslib-posix.c | 114 +++ util/oslib-win32.c | 9 3 files changed, 133 insertions(+), 9 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index bd23a08595..21757e5144 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -595,6 +595,25 @@ typedef struct ThreadContext ThreadContext; void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, ThreadContext *tc, Error **errp); +typedef struct PreallocStats { +size_t page_size; +size_t total_pages; +size_t allocated_pages; +int threads; +time_t seconds_elapsed; +} PreallocStats; + +typedef struct PreallocTimeout { +time_t seconds; +void *user; +void (*on_timeout)(void *user, const PreallocStats *stats); +} PreallocTimeout; + +void qemu_prealloc_mem_with_timeout(int fd, char *area, size_t sz, +int max_threads, ThreadContext *tc, +const PreallocTimeout *timeout, +Error **errp); + /** * qemu_get_pid_name: * @pid: pid of a process diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 59a891b6a8..570fca601f 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -74,6 +74,7 @@ typedef struct MemsetContext { bool any_thread_failed; struct MemsetThread *threads; int num_threads; +PreallocStats stats; } MemsetContext; struct MemsetThread { @@ -83,6 +84,7 @@ struct MemsetThread { QemuThread pgthread; sigjmp_buf env; MemsetContext *context; +size_t touched_pages; }; typedef struct MemsetThread MemsetThread; @@ -373,6 +375,7 @@ static void *do_touch_pages(void *arg) */ *(volatile char *)addr = *addr; addr += hpagesize; +qatomic_inc(_args->touched_pages); } } pthread_sigmask(SIG_SETMASK, , NULL); @@ -396,6 +399,11 @@ static void *do_madv_populate_write_pages(void *arg) if (size && qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE)) { ret = -errno; } + +if (!ret) { +qatomic_set(_args->touched_pages, memset_args->numpages); +} + return (void *)(uintptr_t)ret; } @@ -418,8 +426,68 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, return ret; } +static int do_join_memset_threads_with_timeout(MemsetContext *context, + time_t seconds) +{ +struct timespec ts; +int i = 0; + +if (clock_gettime(CLOCK_REALTIME, ) < 0) { +return i; +} +ts.tv_sec += seconds; + +for (; i < context->num_threads; ++i) { +if (pthread_timedjoin_np(context->threads[i].pgthread.thread, + NULL, )) { +break; +} +} + +return i; +} + +static void memset_stats_count_pages(MemsetContext *context) +{ +int i; + +for (i = 0; i < context->num_threads; ++i) { +size_t pages = qatomic_load_acquire( +>threads[i].touched_pages); +context->stats.allocated_pages += pages; +} +} + +static int timed_join_memset_threads(MemsetContext *context, + const PreallocTimeout *timeout) +{ +int i, off; +PreallocStats *stats = >stats; +off = do_join_memset_threads_with_timeout(context, timeout->seconds); + +if (off != context->num_threads && timeout->on_timeout) { +memset_stats_count_pages(context); + +/* + * Guard against possible races if preallocation finishes right + * after the timeout is exceeded. + */ +if (stats->allocated_pages < stats->total_pages) { +stats->seconds_elapsed = timeout->seconds; +timeout->on_timeout(timeout->user, stats); +} +} + +for (i = off; i < context->num_threads; ++i) { +pthread_cancel(context->threads[i].pgthread.thread); +} + +return off; +} + static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features
Ping :) On 9/6/22 10:31 AM, Daniil Tatianin wrote: This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. It also improves the virtio core by introducing new common code that can be used by a virtio device to calculate its config space size. In particular it adds the following things: - Common virtio code for deducing the required device config size based on provided host features. - Ability to disable modern virtio-blk features like discard/write-zeroes for vhost-user-blk. - Dynamic configuration space resizing based on enabled features, by reusing the common code introduced in the earlier commits. - Cleans up the VHostUserBlk structure by reusing parent fields. Changes since v1 (mostly addresses Stefan's feedback): - Introduce VirtIOConfigSizeParams & virtio_get_config_size - Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c only hold the virtio-blk config size parameters. - Reuse parent fields in vhost-user-blk instead of introducing new ones. Changes since v2: - Squash the first four commits into one - Set .min_size for virtio-net as well - Move maintainer/meson user-blk bits to the last commit Daniil Tatianin (5): virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size virtio-blk: move config size params to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +++- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 28 +++ hw/net/virtio-net.c | 9 +-- hw/virtio/virtio.c| 10 --- include/hw/virtio/vhost-user-blk.h| 1 - include/hw/virtio/virtio-blk-common.h | 20 ++ include/hw/virtio/virtio.h| 10 +-- 10 files changed, 105 insertions(+), 49 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h
Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features
On 10/11/22 10:20 AM, Daniil Tatianin wrote: Ping :) Oops, didn't see the pull request. Disregard this. On 9/6/22 10:31 AM, Daniil Tatianin wrote: This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. It also improves the virtio core by introducing new common code that can be used by a virtio device to calculate its config space size. In particular it adds the following things: - Common virtio code for deducing the required device config size based on provided host features. - Ability to disable modern virtio-blk features like discard/write-zeroes for vhost-user-blk. - Dynamic configuration space resizing based on enabled features, by reusing the common code introduced in the earlier commits. - Cleans up the VHostUserBlk structure by reusing parent fields. Changes since v1 (mostly addresses Stefan's feedback): - Introduce VirtIOConfigSizeParams & virtio_get_config_size - Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c only hold the virtio-blk config size parameters. - Reuse parent fields in vhost-user-blk instead of introducing new ones. Changes since v2: - Squash the first four commits into one - Set .min_size for virtio-net as well - Move maintainer/meson user-blk bits to the last commit Daniil Tatianin (5): virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size virtio-blk: move config size params to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +++- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 28 +++ hw/net/virtio-net.c | 9 +-- hw/virtio/virtio.c | 10 --- include/hw/virtio/vhost-user-blk.h | 1 - include/hw/virtio/virtio-blk-common.h | 20 ++ include/hw/virtio/virtio.h | 10 +-- 10 files changed, 105 insertions(+), 49 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h
Re: [PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features
Thanks for reviewing! Could you send a Pull request? Or do we need an ack from someone else? On 9/7/22 7:02 AM, Raphael Norwitz wrote: Thanks for the changes. For the whole series: Reviewed-by: Raphael Norwitz On Tue, Sep 06, 2022 at 10:31:06AM +0300, Daniil Tatianin wrote: This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. It also improves the virtio core by introducing new common code that can be used by a virtio device to calculate its config space size. In particular it adds the following things: - Common virtio code for deducing the required device config size based on provided host features. - Ability to disable modern virtio-blk features like discard/write-zeroes for vhost-user-blk. - Dynamic configuration space resizing based on enabled features, by reusing the common code introduced in the earlier commits. - Cleans up the VHostUserBlk structure by reusing parent fields. Changes since v1 (mostly addresses Stefan's feedback): - Introduce VirtIOConfigSizeParams & virtio_get_config_size - Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c only hold the virtio-blk config size parameters. - Reuse parent fields in vhost-user-blk instead of introducing new ones. Changes since v2: - Squash the first four commits into one - Set .min_size for virtio-net as well - Move maintainer/meson user-blk bits to the last commit Daniil Tatianin (5): virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size virtio-blk: move config size params to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +++- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 28 +++ hw/net/virtio-net.c | 9 +-- hw/virtio/virtio.c| 10 --- include/hw/virtio/vhost-user-blk.h| 1 - include/hw/virtio/virtio-blk-common.h | 20 ++ include/hw/virtio/virtio.h| 10 +-- 10 files changed, 105 insertions(+), 49 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h -- 2.25.1
[PATCH v3 0/5] vhost-user-blk: dynamically resize config space based on features
This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. It also improves the virtio core by introducing new common code that can be used by a virtio device to calculate its config space size. In particular it adds the following things: - Common virtio code for deducing the required device config size based on provided host features. - Ability to disable modern virtio-blk features like discard/write-zeroes for vhost-user-blk. - Dynamic configuration space resizing based on enabled features, by reusing the common code introduced in the earlier commits. - Cleans up the VHostUserBlk structure by reusing parent fields. Changes since v1 (mostly addresses Stefan's feedback): - Introduce VirtIOConfigSizeParams & virtio_get_config_size - Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c only hold the virtio-blk config size parameters. - Reuse parent fields in vhost-user-blk instead of introducing new ones. Changes since v2: - Squash the first four commits into one - Set .min_size for virtio-net as well - Move maintainer/meson user-blk bits to the last commit Daniil Tatianin (5): virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size virtio-blk: move config size params to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +++- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 28 +++ hw/net/virtio-net.c | 9 +-- hw/virtio/virtio.c| 10 --- include/hw/virtio/vhost-user-blk.h| 1 - include/hw/virtio/virtio-blk-common.h | 20 ++ include/hw/virtio/virtio.h| 10 +-- 10 files changed, 105 insertions(+), 49 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h -- 2.25.1
[PATCH v3 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard
It is useful to have the ability to disable these features for compatibility with older VMs that don't have these implemented. Signed-off-by: Daniil Tatianin Reviewed-by: Raphael Norwitz --- hw/block/vhost-user-blk.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..4c9727e08c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -259,8 +259,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE); virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -virtio_add_feature(, VIRTIO_BLK_F_DISCARD); -virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES); if (s->config_wce) { virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); @@ -592,6 +590,10 @@ static Property vhost_user_blk_properties[] = { VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_DISCARD, true), +DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_WRITE_ZEROES, true), DEFINE_PROP_END_OF_LIST(), }; -- 2.25.1
[PATCH v3 4/5] vhost-user-blk: make 'config_wce' part of 'host_features'
No reason to have this be a separate field. This also makes it more akin to what the virtio-blk device does. Signed-off-by: Daniil Tatianin Reviewed-by: Raphael Norwitz --- hw/block/vhost-user-blk.c | 6 ++ include/hw/virtio/vhost-user-blk.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 4c9727e08c..0d916edefa 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -260,9 +260,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -if (s->config_wce) { -virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); -} if (s->num_queues > 1) { virtio_add_feature(, VIRTIO_BLK_F_MQ); } @@ -589,7 +586,8 @@ static Property vhost_user_blk_properties[] = { DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), -DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("config-wce", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_CONFIG_WCE, true), DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features, VIRTIO_BLK_F_DISCARD, true), DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features, diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h index 7c91f15040..ea085ee1ed 100644 --- a/include/hw/virtio/vhost-user-blk.h +++ b/include/hw/virtio/vhost-user-blk.h @@ -34,7 +34,6 @@ struct VHostUserBlk { struct virtio_blk_config blkcfg; uint16_t num_queues; uint32_t queue_size; -uint32_t config_wce; struct vhost_dev dev; struct vhost_inflight *inflight; VhostUserState vhost_user; -- 2.25.1
[PATCH v3 2/5] virtio-blk: move config size params to virtio-blk-common
This way we can reuse it for other virtio-blk devices, e.g vhost-user-blk, which currently does not control its config space size dynamically. Signed-off-by: Daniil Tatianin Reviewed-by: Raphael Norwitz --- MAINTAINERS | 2 ++ hw/block/meson.build | 2 +- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 24 ++--- include/hw/virtio/virtio-blk-common.h | 20 ++ 5 files changed, 64 insertions(+), 23 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h diff --git a/MAINTAINERS b/MAINTAINERS index 5ce4227ff6..2858577c5b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2030,8 +2030,10 @@ virtio-blk M: Stefan Hajnoczi L: qemu-bl...@nongnu.org S: Supported +F: hw/block/virtio-blk-common.c F: hw/block/virtio-blk.c F: hw/block/dataplane/* +F: include/hw/virtio/virtio-blk-common.h F: tests/qtest/virtio-blk-test.c T: git https://github.com/stefanha/qemu.git block diff --git a/hw/block/meson.build b/hw/block/meson.build index 2389326112..8ee1f1f850 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) +specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) subdir('dataplane') diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c new file mode 100644 index 00..ac52d7c176 --- /dev/null +++ b/hw/block/virtio-blk-common.c @@ -0,0 +1,39 @@ +/* + * Virtio Block Device common helpers + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" + +#include "standard-headers/linux/virtio_blk.h" +#include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-blk-common.h" + +/* Config size before the discard support (hide associated config fields) */ +#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ + max_discard_sectors) + +/* + * Starting from the discard feature, we can use this array to properly + * set the config size depending on the features enabled. + */ +static const VirtIOFeature feature_sizes[] = { +{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, + .end = endof(struct virtio_blk_config, discard_sector_alignment)}, +{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, + .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, +{} +}; + +const VirtIOConfigSizeParams virtio_blk_cfg_size_params = { +.min_size = VIRTIO_BLK_CFG_SIZE, +.max_size = sizeof(struct virtio_blk_config), +.feature_sizes = feature_sizes +}; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 10c47c2934..8131ec2dbc 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -32,29 +32,9 @@ #include "hw/virtio/virtio-bus.h" #include "migration/qemu-file-types.h" #include "hw/virtio/virtio-access.h" +#include "hw/virtio/virtio-blk-common.h" #include "qemu/coroutine.h" -/* Config size before the discard support (hide associated config fields) */ -#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ - max_discard_sectors) -/* - * Starting from the discard feature, we can use this array to properly - * set the config size depending on the features enabled. - */ -static const VirtIOFeature feature_sizes[] = { -{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, - .end = endof(struct virtio_blk_config, discard_sector_alignment)}, -{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, - .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, -{} -}; - -static const VirtIOConfigSizeParams cfg_size_params = { -.min_size = VIRTIO_BLK_CFG_SIZE, -.max_size = sizeof(struct virtio_blk_config), -.feature_sizes = feature_sizes -}; - static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq, VirtIOBlockReq *req) { @@ -1202,7 +1182,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } -s->config_size = virtio_get_config_size(_size_params, +s->config_size = virtio_get_config_size(_blk_cfg_size_params, s->host_features); virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); diff --git a/include/hw/virti
[PATCH v3 5/5] vhost-user-blk: dynamically resize config space based on features
Make vhost-user-blk backwards compatible when migrating from older VMs running with modern features turned off, the same way it was done for virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the features enabled") It's currently impossible to migrate from an older VM with vhost-user-blk (with disable-legacy=off) because of errors like this: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 device: 1 cmask: ff wmask: 80 w1cmask:0 qemu-system-x86_64: Failed to load PCIDevice:config qemu-system-x86_64: Failed to load virtio-blk:virtio qemu-system-x86_64: error while loading state for instance 0x0 of device ':00:05.0:00.0:02.0/virtio-blk' qemu-system-x86_64: load of migration failed: Invalid argument This is caused by the newer (destination) VM requiring a bigger BAR0 alignment because it has to cover a bigger configuration space, which isn't actually needed since those additional config fields are not active (write-zeroes/discard). Signed-off-by: Daniil Tatianin Reviewed-by: Raphael Norwitz --- MAINTAINERS | 2 ++ hw/block/meson.build | 2 +- hw/block/vhost-user-blk.c | 17 ++--- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2858577c5b..a7d3914735 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2273,11 +2273,13 @@ S: Maintained F: contrib/vhost-user-blk/ F: contrib/vhost-user-scsi/ F: hw/block/vhost-user-blk.c +F: hw/block/virtio-blk-common.c F: hw/scsi/vhost-user-scsi.c F: hw/virtio/vhost-user-blk-pci.c F: hw/virtio/vhost-user-scsi-pci.c F: include/hw/virtio/vhost-user-blk.h F: include/hw/virtio/vhost-user-scsi.h +F: include/hw/virtio/virtio-blk-common.h vhost-user-gpu M: Marc-André Lureau diff --git a/hw/block/meson.build b/hw/block/meson.build index 8ee1f1f850..1908abd45c 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -17,6 +17,6 @@ softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) -specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) +specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c', 'virtio-blk-common.c')) subdir('dataplane') diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 0d916edefa..8dd063eb96 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -23,6 +23,7 @@ #include "hw/qdev-core.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "hw/virtio/virtio-blk-common.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user-blk.h" #include "hw/virtio/virtio.h" @@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config) /* Our num_queues overrides the device backend */ virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues); -memcpy(config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(config, >blkcfg, vdev->config_len); } static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t *config) @@ -92,12 +93,12 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) { int ret; struct virtio_blk_config blkcfg; +VirtIODevice *vdev = dev->vdev; VHostUserBlk *s = VHOST_USER_BLK(dev->vdev); Error *local_err = NULL; ret = vhost_dev_get_config(dev, (uint8_t *), - sizeof(struct virtio_blk_config), - _err); + vdev->config_len, _err); if (ret < 0) { error_report_err(local_err); return ret; @@ -106,7 +107,7 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) /* valid for resize only */ if (blkcfg.capacity != s->blkcfg.capacity) { s->blkcfg.capacity = blkcfg.capacity; -memcpy(dev->vdev->config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(dev->vdev->config, >blkcfg, vdev->config_len); virtio_notify_config(dev->vdev); } @@ -442,7 +443,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, Error **errp) assert(s->connected); ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, - sizeof(struct virtio_blk_config), errp); + s->parent_obj.config_len, errp); if (ret < 0) { qemu_chr_fe_disconnect(>chardev); vhost_dev_cleanup(>dev); @@ -457,6 +458,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) ERRP_GUARD(); VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserBlk *s = VHOST_USER_BLK(vdev); +size_t co
[PATCH v3 1/5] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
This is the first step towards moving all device config size calculation logic into the virtio core code. In particular, this adds a struct that contains all the necessary information for common virtio code to be able to calculate the final config size for a device. This is expected to be used with the new virtio_get_config_size helper, which calculates the final length based on the provided host features. This builds on top of already existing code like VirtIOFeature and virtio_feature_get_config_size(), but adds additional fields, as well as sanity checking so that device-specifc code doesn't have to duplicate it. An example usage would be: static const VirtIOFeature dev_features[] = { {.flags = 1ULL << FEATURE_1_BIT, .end = endof(struct virtio_dev_config, feature_1)}, {.flags = 1ULL << FEATURE_2_BIT, .end = endof(struct virtio_dev_config, feature_2)}, {} }; static const VirtIOConfigSizeParams dev_cfg_size_params = { .min_size = DEV_BASE_CONFIG_SIZE, .max_size = sizeof(struct virtio_dev_config), .feature_sizes = dev_features }; // code inside my_dev_device_realize() size_t config_size = virtio_get_config_size(_cfg_size_params, host_features); virtio_init(vdev, VIRTIO_ID_MYDEV, config_size); Currently every device is expected to write its own boilerplate from the example above in device_realize(), however, the next step of this transition is moving VirtIOConfigSizeParams into VirtioDeviceClass, so that it can be done automatically by the virtio initialization code. All of the users of virtio_feature_get_config_size have been converted to use virtio_get_config_size so it's no longer needed and is removed with this commit. Signed-off-by: Daniil Tatianin --- hw/block/virtio-blk.c | 16 +++- hw/net/virtio-net.c| 9 +++-- hw/virtio/virtio.c | 10 ++ include/hw/virtio/virtio.h | 10 -- 4 files changed, 28 insertions(+), 17 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e9ba752f6b..10c47c2934 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -49,13 +49,11 @@ static const VirtIOFeature feature_sizes[] = { {} }; -static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features) -{ -s->config_size = MAX(VIRTIO_BLK_CFG_SIZE, -virtio_feature_get_config_size(feature_sizes, host_features)); - -assert(s->config_size <= sizeof(struct virtio_blk_config)); -} +static const VirtIOConfigSizeParams cfg_size_params = { +.min_size = VIRTIO_BLK_CFG_SIZE, +.max_size = sizeof(struct virtio_blk_config), +.feature_sizes = feature_sizes +}; static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq, VirtIOBlockReq *req) @@ -1204,8 +1202,8 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } -virtio_blk_set_config_size(s, s->host_features); - +s->config_size = virtio_get_config_size(_size_params, +s->host_features); virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); s->blk = conf->conf.blk; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index dd0d056fde..78b003a158 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -106,6 +106,12 @@ static const VirtIOFeature feature_sizes[] = { {} }; +static const VirtIOConfigSizeParams cfg_size_params = { +.min_size = endof(struct virtio_net_config, mac), +.max_size = sizeof(struct virtio_net_config), +.feature_sizes = feature_sizes +}; + static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc) { VirtIONet *n = qemu_get_nic_opaque(nc); @@ -3246,8 +3252,7 @@ static void virtio_net_set_config_size(VirtIONet *n, uint64_t host_features) { virtio_add_feature(_features, VIRTIO_NET_F_MAC); -n->config_size = virtio_feature_get_config_size(feature_sizes, -host_features); +n->config_size = virtio_get_config_size(_size_params, host_features); } void virtio_net_set_netclient_name(VirtIONet *n, const char *name, diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 5d607aeaa0..c0bf8dd6fd 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -2999,11 +2999,12 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val) return ret; } -size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes, - uint64_t host_features) +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, + uint64_t host_features) { -size_t config_size = 0; -int i; +size_t config_size = params->min_size; +const VirtIOFeature *feature_sizes = params->feature_sizes; +size_t i
Re: [PATCH v2 3/8] virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size
On 9/2/22 8:54 PM, Raphael Norwitz wrote: On Fri, Aug 26, 2022 at 05:32:43PM +0300, Daniil Tatianin wrote: Use the new common helper. As an added bonus this also makes use of config size sanity checking via the 'max_size' field. Signed-off-by: Daniil Tatianin --- hw/net/virtio-net.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index dd0d056fde..dfc8dd8562 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -106,6 +106,11 @@ static const VirtIOFeature feature_sizes[] = { {} }; +static const VirtIOConfigSizeParams cfg_size_params = { Can we have a zero length virtio-net config size? The both the v1.0 and v1.1 section 5.1.4 say “The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set)” so we should probably set min_size to offset_of status? It currently hardcodes VIRTIO_NET_F_MAC as always on, but I guess it doesn't hurt to be more explicit about it. Will add that in the next version. (resending because forgot to reply-all last time) Otherwise LGTM. +.max_size = sizeof(struct virtio_net_config), +.feature_sizes = feature_sizes +}; + static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc) { VirtIONet *n = qemu_get_nic_opaque(nc); @@ -3246,8 +3251,7 @@ static void virtio_net_set_config_size(VirtIONet *n, uint64_t host_features) { virtio_add_feature(_features, VIRTIO_NET_F_MAC); -n->config_size = virtio_feature_get_config_size(feature_sizes, -host_features); +n->config_size = virtio_get_config_size(_size_params, host_features); } void virtio_net_set_netclient_name(VirtIONet *n, const char *name, -- 2.25.1
Re: [PATCH v2 5/8] virtio-blk: move config size params to virtio-blk-common
On 9/2/22 8:57 PM, Raphael Norwitz wrote: The vhost-user-blk bits in meson.build and Maintainers should probably be in patch 8? You're totally right, thanks. Otherwise LGTM. On Fri, Aug 26, 2022 at 05:32:45PM +0300, Daniil Tatianin wrote: This way we can reuse it for other virtio-blk devices, e.g vhost-user-blk, which currently does not control its config space size dynamically. Signed-off-by: Daniil Tatianin --- MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 24 ++--- include/hw/virtio/virtio-blk-common.h | 20 ++ 5 files changed, 67 insertions(+), 24 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h i.e. move this. @@ -2271,11 +2273,13 @@ S: Maintained F: contrib/vhost-user-blk/ F: contrib/vhost-user-scsi/ F: hw/block/vhost-user-blk.c +F: hw/block/virtio-blk-common.c F: hw/scsi/vhost-user-scsi.c F: hw/virtio/vhost-user-blk-pci.c F: hw/virtio/vhost-user-scsi-pci.c F: include/hw/virtio/vhost-user-blk.h F: include/hw/virtio/vhost-user-scsi.h +F: include/hw/virtio/virtio-blk-common.h vhost-user-gpu M: Marc-André Lureau diff --git a/hw/block/meson.build b/hw/block/meson.build index 2389326112..1908abd45c 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) -specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) +specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) And this +specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c', 'virtio-blk-common.c'))
Re: [PATCH v2 1/8] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
On 9/2/22 8:52 PM, Raphael Norwitz wrote: I feel like it would be easier to review if the first 4 patches were squashed together, but that’s not a big deal. Yeah, I think that's fair although I initially thought that maybe that was a bit too big of a change to put in one single commit. I can squash the first four if that would be better. For this one: Reviewed-by: Raphael Norwitz On Fri, Aug 26, 2022 at 05:32:41PM +0300, Daniil Tatianin wrote: This is the first step towards moving all device config size calculation logic into the virtio core code. In particular, this adds a struct that contains all the necessary information for common virtio code to be able to calculate the final config size for a device. This is expected to be used with the new virtio_get_config_size helper, which calculates the final length based on the provided host features. This builds on top of already existing code like VirtIOFeature and virtio_feature_get_config_size(), but adds additional fields, as well as sanity checking so that device-specifc code doesn't have to duplicate it. An example usage would be: static const VirtIOFeature dev_features[] = { {.flags = 1ULL << FEATURE_1_BIT, .end = endof(struct virtio_dev_config, feature_1)}, {.flags = 1ULL << FEATURE_2_BIT, .end = endof(struct virtio_dev_config, feature_2)}, {} }; static const VirtIOConfigSizeParams dev_cfg_size_params = { .min_size = DEV_BASE_CONFIG_SIZE, .max_size = sizeof(struct virtio_dev_config), .feature_sizes = dev_features }; // code inside my_dev_device_realize() size_t config_size = virtio_get_config_size(_cfg_size_params, host_features); virtio_init(vdev, VIRTIO_ID_MYDEV, config_size); Currently every device is expected to write its own boilerplate from the example above in device_realize(), however, the next step of this transition is moving VirtIOConfigSizeParams into VirtioDeviceClass, so that it can be done automatically by the virtio initialization code. Signed-off-by: Daniil Tatianin --- hw/virtio/virtio.c | 17 + include/hw/virtio/virtio.h | 9 + 2 files changed, 26 insertions(+) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 5d607aeaa0..8518382025 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3014,6 +3014,23 @@ size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes, return config_size; } +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, + uint64_t host_features) +{ +size_t config_size = params->min_size; +const VirtIOFeature *feature_sizes = params->feature_sizes; +size_t i; + +for (i = 0; feature_sizes[i].flags != 0; i++) { +if (host_features & feature_sizes[i].flags) { +config_size = MAX(feature_sizes[i].end, config_size); +} +} + +assert(config_size <= params->max_size); +return config_size; +} + int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id) { int i, ret; diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index db1c0ddf6b..1991c58d9b 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -44,6 +44,15 @@ typedef struct VirtIOFeature { size_t end; } VirtIOFeature; +typedef struct VirtIOConfigSizeParams { +size_t min_size; +size_t max_size; +const VirtIOFeature *feature_sizes; +} VirtIOConfigSizeParams; + +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, + uint64_t host_features); + size_t virtio_feature_get_config_size(const VirtIOFeature *features, uint64_t host_features); -- 2.25.1
Re: [PATCH v2 0/8] vhost-user-blk: dynamically resize config space based on features
ping
[PATCH v2 3/8] virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size
Use the new common helper. As an added bonus this also makes use of config size sanity checking via the 'max_size' field. Signed-off-by: Daniil Tatianin --- hw/net/virtio-net.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index dd0d056fde..dfc8dd8562 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -106,6 +106,11 @@ static const VirtIOFeature feature_sizes[] = { {} }; +static const VirtIOConfigSizeParams cfg_size_params = { +.max_size = sizeof(struct virtio_net_config), +.feature_sizes = feature_sizes +}; + static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc) { VirtIONet *n = qemu_get_nic_opaque(nc); @@ -3246,8 +3251,7 @@ static void virtio_net_set_config_size(VirtIONet *n, uint64_t host_features) { virtio_add_feature(_features, VIRTIO_NET_F_MAC); -n->config_size = virtio_feature_get_config_size(feature_sizes, -host_features); +n->config_size = virtio_get_config_size(_size_params, host_features); } void virtio_net_set_netclient_name(VirtIONet *n, const char *name, -- 2.25.1
[PATCH v2 4/8] virtio: remove the virtio_feature_get_config_size helper
This has no more users and is superseded by virtio_get_config_size. Signed-off-by: Daniil Tatianin --- hw/virtio/virtio.c | 15 --- include/hw/virtio/virtio.h | 3 --- 2 files changed, 18 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 8518382025..c0bf8dd6fd 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -2999,21 +2999,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val) return ret; } -size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes, - uint64_t host_features) -{ -size_t config_size = 0; -int i; - -for (i = 0; feature_sizes[i].flags != 0; i++) { -if (host_features & feature_sizes[i].flags) { -config_size = MAX(feature_sizes[i].end, config_size); -} -} - -return config_size; -} - size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, uint64_t host_features) { diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 1991c58d9b..f3ff6710d5 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -53,9 +53,6 @@ typedef struct VirtIOConfigSizeParams { size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, uint64_t host_features); -size_t virtio_feature_get_config_size(const VirtIOFeature *features, - uint64_t host_features); - typedef struct VirtQueue VirtQueue; #define VIRTQUEUE_MAX_SIZE 1024 -- 2.25.1
[PATCH v2 2/8] virtio-blk: utilize VirtIOConfigSizeParams & virtio_get_config_size
Use the common helper instead of duplicating the same logic. Signed-off-by: Daniil Tatianin --- hw/block/virtio-blk.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e9ba752f6b..10c47c2934 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -49,13 +49,11 @@ static const VirtIOFeature feature_sizes[] = { {} }; -static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features) -{ -s->config_size = MAX(VIRTIO_BLK_CFG_SIZE, -virtio_feature_get_config_size(feature_sizes, host_features)); - -assert(s->config_size <= sizeof(struct virtio_blk_config)); -} +static const VirtIOConfigSizeParams cfg_size_params = { +.min_size = VIRTIO_BLK_CFG_SIZE, +.max_size = sizeof(struct virtio_blk_config), +.feature_sizes = feature_sizes +}; static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq, VirtIOBlockReq *req) @@ -1204,8 +1202,8 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } -virtio_blk_set_config_size(s, s->host_features); - +s->config_size = virtio_get_config_size(_size_params, +s->host_features); virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); s->blk = conf->conf.blk; -- 2.25.1
[PATCH v2 1/8] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
This is the first step towards moving all device config size calculation logic into the virtio core code. In particular, this adds a struct that contains all the necessary information for common virtio code to be able to calculate the final config size for a device. This is expected to be used with the new virtio_get_config_size helper, which calculates the final length based on the provided host features. This builds on top of already existing code like VirtIOFeature and virtio_feature_get_config_size(), but adds additional fields, as well as sanity checking so that device-specifc code doesn't have to duplicate it. An example usage would be: static const VirtIOFeature dev_features[] = { {.flags = 1ULL << FEATURE_1_BIT, .end = endof(struct virtio_dev_config, feature_1)}, {.flags = 1ULL << FEATURE_2_BIT, .end = endof(struct virtio_dev_config, feature_2)}, {} }; static const VirtIOConfigSizeParams dev_cfg_size_params = { .min_size = DEV_BASE_CONFIG_SIZE, .max_size = sizeof(struct virtio_dev_config), .feature_sizes = dev_features }; // code inside my_dev_device_realize() size_t config_size = virtio_get_config_size(_cfg_size_params, host_features); virtio_init(vdev, VIRTIO_ID_MYDEV, config_size); Currently every device is expected to write its own boilerplate from the example above in device_realize(), however, the next step of this transition is moving VirtIOConfigSizeParams into VirtioDeviceClass, so that it can be done automatically by the virtio initialization code. Signed-off-by: Daniil Tatianin --- hw/virtio/virtio.c | 17 + include/hw/virtio/virtio.h | 9 + 2 files changed, 26 insertions(+) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 5d607aeaa0..8518382025 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3014,6 +3014,23 @@ size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes, return config_size; } +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, + uint64_t host_features) +{ +size_t config_size = params->min_size; +const VirtIOFeature *feature_sizes = params->feature_sizes; +size_t i; + +for (i = 0; feature_sizes[i].flags != 0; i++) { +if (host_features & feature_sizes[i].flags) { +config_size = MAX(feature_sizes[i].end, config_size); +} +} + +assert(config_size <= params->max_size); +return config_size; +} + int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id) { int i, ret; diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index db1c0ddf6b..1991c58d9b 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -44,6 +44,15 @@ typedef struct VirtIOFeature { size_t end; } VirtIOFeature; +typedef struct VirtIOConfigSizeParams { +size_t min_size; +size_t max_size; +const VirtIOFeature *feature_sizes; +} VirtIOConfigSizeParams; + +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, + uint64_t host_features); + size_t virtio_feature_get_config_size(const VirtIOFeature *features, uint64_t host_features); -- 2.25.1
[PATCH v2 6/8] vhost-user-blk: make it possible to disable write-zeroes/discard
It is useful to have the ability to disable these features for compatibility with older VMs that don't have these implemented. Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..4c9727e08c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -259,8 +259,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE); virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -virtio_add_feature(, VIRTIO_BLK_F_DISCARD); -virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES); if (s->config_wce) { virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); @@ -592,6 +590,10 @@ static Property vhost_user_blk_properties[] = { VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_DISCARD, true), +DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_WRITE_ZEROES, true), DEFINE_PROP_END_OF_LIST(), }; -- 2.25.1
[PATCH v2 0/8] vhost-user-blk: dynamically resize config space based on features
This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. It also improves the virtio core by introducing new common code that can be used by a virtio device to calculate its config space size. In particular it adds the following things: - Common virtio code for deducing the required device config size based on provided host features. - Ability to disable modern virtio-blk features like discard/write-zeroes for vhost-user-blk. - Dynamic configuration space resizing based on enabled features, by reusing the common code introduced in the earlier commits. - Cleans up the VHostUserBlk structure by reusing parent fields. Changes since v1 (mostly addresses Stefan's feedback): - Introduce VirtIOConfigSizeParams & virtio_get_config_size - Remove virtio_blk_set_config_size altogether, make virtio-blk-common.c only hold the virtio-blk config size parameters. - Reuse parent fields in vhost-user-blk instead of introducing new ones. Daniil Tatianin (8): virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size virtio-blk: utilize VirtIOConfigSizeParams & virtio_get_config_size virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size virtio: remove the virtio_feature_get_config_size helper virtio-blk: move config size params to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +++- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 28 +++ hw/net/virtio-net.c | 8 -- hw/virtio/virtio.c| 10 --- include/hw/virtio/vhost-user-blk.h| 1 - include/hw/virtio/virtio-blk-common.h | 20 ++ include/hw/virtio/virtio.h| 10 +-- 10 files changed, 104 insertions(+), 49 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h -- 2.25.1
[PATCH v2 7/8] vhost-user-blk: make 'config_wce' part of 'host_features'
No reason to have this be a separate field. This also makes it more akin to what the virtio-blk device does. Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 6 ++ include/hw/virtio/vhost-user-blk.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 4c9727e08c..0d916edefa 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -260,9 +260,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -if (s->config_wce) { -virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); -} if (s->num_queues > 1) { virtio_add_feature(, VIRTIO_BLK_F_MQ); } @@ -589,7 +586,8 @@ static Property vhost_user_blk_properties[] = { DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), -DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("config-wce", VHostUserBlk, parent_obj.host_features, + VIRTIO_BLK_F_CONFIG_WCE, true), DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features, VIRTIO_BLK_F_DISCARD, true), DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features, diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h index 7c91f15040..ea085ee1ed 100644 --- a/include/hw/virtio/vhost-user-blk.h +++ b/include/hw/virtio/vhost-user-blk.h @@ -34,7 +34,6 @@ struct VHostUserBlk { struct virtio_blk_config blkcfg; uint16_t num_queues; uint32_t queue_size; -uint32_t config_wce; struct vhost_dev dev; struct vhost_inflight *inflight; VhostUserState vhost_user; -- 2.25.1
[PATCH v2 8/8] vhost-user-blk: dynamically resize config space based on features
Make vhost-user-blk backwards compatible when migrating from older VMs running with modern features turned off, the same way it was done for virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the features enabled") It's currently impossible to migrate from an older VM with vhost-user-blk (with disable-legacy=off) because of errors like this: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 device: 1 cmask: ff wmask: 80 w1cmask:0 qemu-system-x86_64: Failed to load PCIDevice:config qemu-system-x86_64: Failed to load virtio-blk:virtio qemu-system-x86_64: error while loading state for instance 0x0 of device ':00:05.0:00.0:02.0/virtio-blk' qemu-system-x86_64: load of migration failed: Invalid argument This is caused by the newer (destination) VM requiring a bigger BAR0 alignment because it has to cover a bigger configuration space, which isn't actually needed since those additional config fields are not active (write-zeroes/discard). Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 0d916edefa..8dd063eb96 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -23,6 +23,7 @@ #include "hw/qdev-core.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "hw/virtio/virtio-blk-common.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user-blk.h" #include "hw/virtio/virtio.h" @@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config) /* Our num_queues overrides the device backend */ virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues); -memcpy(config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(config, >blkcfg, vdev->config_len); } static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t *config) @@ -92,12 +93,12 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) { int ret; struct virtio_blk_config blkcfg; +VirtIODevice *vdev = dev->vdev; VHostUserBlk *s = VHOST_USER_BLK(dev->vdev); Error *local_err = NULL; ret = vhost_dev_get_config(dev, (uint8_t *), - sizeof(struct virtio_blk_config), - _err); + vdev->config_len, _err); if (ret < 0) { error_report_err(local_err); return ret; @@ -106,7 +107,7 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) /* valid for resize only */ if (blkcfg.capacity != s->blkcfg.capacity) { s->blkcfg.capacity = blkcfg.capacity; -memcpy(dev->vdev->config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(dev->vdev->config, >blkcfg, vdev->config_len); virtio_notify_config(dev->vdev); } @@ -442,7 +443,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, Error **errp) assert(s->connected); ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, - sizeof(struct virtio_blk_config), errp); + s->parent_obj.config_len, errp); if (ret < 0) { qemu_chr_fe_disconnect(>chardev); vhost_dev_cleanup(>dev); @@ -457,6 +458,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) ERRP_GUARD(); VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserBlk *s = VHOST_USER_BLK(vdev); +size_t config_size; int retries; int i, ret; @@ -487,8 +489,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) return; } -virtio_init(vdev, VIRTIO_ID_BLOCK, -sizeof(struct virtio_blk_config)); +config_size = virtio_get_config_size(_blk_cfg_size_params, + vdev->host_features); +virtio_init(vdev, VIRTIO_ID_BLOCK, config_size); s->virtqs = g_new(VirtQueue *, s->num_queues); for (i = 0; i < s->num_queues; i++) { -- 2.25.1
[PATCH v2 5/8] virtio-blk: move config size params to virtio-blk-common
This way we can reuse it for other virtio-blk devices, e.g vhost-user-blk, which currently does not control its config space size dynamically. Signed-off-by: Daniil Tatianin --- MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/virtio-blk-common.c | 39 +++ hw/block/virtio-blk.c | 24 ++--- include/hw/virtio/virtio-blk-common.h | 20 ++ 5 files changed, 67 insertions(+), 24 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h diff --git a/MAINTAINERS b/MAINTAINERS index 5ce4227ff6..a7d3914735 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2030,8 +2030,10 @@ virtio-blk M: Stefan Hajnoczi L: qemu-bl...@nongnu.org S: Supported +F: hw/block/virtio-blk-common.c F: hw/block/virtio-blk.c F: hw/block/dataplane/* +F: include/hw/virtio/virtio-blk-common.h F: tests/qtest/virtio-blk-test.c T: git https://github.com/stefanha/qemu.git block @@ -2271,11 +2273,13 @@ S: Maintained F: contrib/vhost-user-blk/ F: contrib/vhost-user-scsi/ F: hw/block/vhost-user-blk.c +F: hw/block/virtio-blk-common.c F: hw/scsi/vhost-user-scsi.c F: hw/virtio/vhost-user-blk-pci.c F: hw/virtio/vhost-user-scsi-pci.c F: include/hw/virtio/vhost-user-blk.h F: include/hw/virtio/vhost-user-scsi.h +F: include/hw/virtio/virtio-blk-common.h vhost-user-gpu M: Marc-André Lureau diff --git a/hw/block/meson.build b/hw/block/meson.build index 2389326112..1908abd45c 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) -specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) +specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) +specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c', 'virtio-blk-common.c')) subdir('dataplane') diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c new file mode 100644 index 00..ac52d7c176 --- /dev/null +++ b/hw/block/virtio-blk-common.c @@ -0,0 +1,39 @@ +/* + * Virtio Block Device common helpers + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" + +#include "standard-headers/linux/virtio_blk.h" +#include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-blk-common.h" + +/* Config size before the discard support (hide associated config fields) */ +#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ + max_discard_sectors) + +/* + * Starting from the discard feature, we can use this array to properly + * set the config size depending on the features enabled. + */ +static const VirtIOFeature feature_sizes[] = { +{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, + .end = endof(struct virtio_blk_config, discard_sector_alignment)}, +{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, + .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, +{} +}; + +const VirtIOConfigSizeParams virtio_blk_cfg_size_params = { +.min_size = VIRTIO_BLK_CFG_SIZE, +.max_size = sizeof(struct virtio_blk_config), +.feature_sizes = feature_sizes +}; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 10c47c2934..8131ec2dbc 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -32,29 +32,9 @@ #include "hw/virtio/virtio-bus.h" #include "migration/qemu-file-types.h" #include "hw/virtio/virtio-access.h" +#include "hw/virtio/virtio-blk-common.h" #include "qemu/coroutine.h" -/* Config size before the discard support (hide associated config fields) */ -#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ - max_discard_sectors) -/* - * Starting from the discard feature, we can use this array to properly - * set the config size depending on the features enabled. - */ -static const VirtIOFeature feature_sizes[] = { -{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, - .end = endof(struct virtio_blk_config, discard_sector_alignment)}, -{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, - .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, -{} -}; - -static const VirtIOConfigSizeParams cfg_size_params = { -.min_size = VIRTIO_BLK_CFG_SIZE, -.max_size = sizeof(struct virtio_blk_config), -.feature_sizes = feature_sizes -}; - static void virtio_blk_init_requ
Re: [PATCH v1 2/5] virtio-blk: move config space sizing code to virtio-blk-common
On 8/24/22 9:13 PM, Stefan Hajnoczi wrote: On Wed, Aug 24, 2022 at 12:18:34PM +0300, Daniil Tatianin wrote: +size_t virtio_blk_common_get_config_size(uint64_t host_features) +{ +size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE, +virtio_feature_get_config_size(feature_sizes, host_features)); + +assert(config_size <= sizeof(struct virtio_blk_config)); +return config_size; +} This logic is common to all VIRTIO devices and I think it can be moved to virtio_feature_get_config_size(). Then virtio_blk_common_get_config_size() is no longer necessary and the generic virtio_feature_get_config_size() can be called directly. The only virtio-blk common part would be the virtio_feature_get_config_size() parameter struct that describes the minimum and maximum config space size, as well as how the feature bits affect the size: size = virtio_feature_get_config_size(virtio_blk_config_size_params, host_features) where virtio_blk_config_size_params is: const VirtIOConfigSizeParams virtio_blk_config_size_params = { .min_size = offsetof(struct virtio_blk_config, max_discard_sectors), .max_size = sizeof(struct virtio_blk_config), .features = { {.flags = 1ULL << VIRTIO_BLK_F_DISCARD, .end = endof(struct virtio_blk_config, discard_sector_alignment)}, ..., }, }; Then virtio-blk-common.h just needs to define: extern const VirtIOConfigSizeParams virtio_blk_config_size_params; Taking it one step further, maybe VirtioDeviceClass should include a const VirtIOConfigSizeParams *config_size_params field so vdev->config_size can be computed by common VIRTIO code and the devices only need to describe the parameters. I think that's a great idea! Do you think it should be done automatically in 'virtio_init' if this field is not NULL? One problem I see with that is that you would have to make all virtio devices use 'parent_obj.host_features' for feature properties, which is currently far from true, but then again this is very much opt-in. Another thing you could do is add a separate helper for that, which maybe defeats the purpose a little bit? Stefan
Re: [PATCH v1 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard
On 8/24/22 9:00 PM, Stefan Hajnoczi wrote: On Wed, Aug 24, 2022 at 12:18:35PM +0300, Daniil Tatianin wrote: diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..e89164c358 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -251,6 +251,8 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, { VHostUserBlk *s = VHOST_USER_BLK(vdev); +features |= s->host_features; I think you can eliminate this if you use vdev->host_features in the qdev properties instead of adding a separate s->host_features field. That will simplify the code. Indeed, thanks for spotting that. I wonder why every virtio device implementation I've looked at has chosen to add their own host_features field (net/blk)?
[PATCH v1 1/5] virtio-blk: decouple config size determination code from VirtIOBlock
Make it more stand-alone so that we can reuse it for other virtio-blk devices that are not VirtIOBlock in the future commits. Signed-off-by: Daniil Tatianin --- hw/block/virtio-blk.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e9ba752f6b..a4162dbbf2 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -49,12 +49,13 @@ static const VirtIOFeature feature_sizes[] = { {} }; -static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features) +static size_t virtio_blk_common_get_config_size(uint64_t host_features) { -s->config_size = MAX(VIRTIO_BLK_CFG_SIZE, +size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE, virtio_feature_get_config_size(feature_sizes, host_features)); -assert(s->config_size <= sizeof(struct virtio_blk_config)); +assert(config_size <= sizeof(struct virtio_blk_config)); +return config_size; } static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq, @@ -1204,7 +1205,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } -virtio_blk_set_config_size(s, s->host_features); +s->config_size = virtio_blk_common_get_config_size(s->host_features); virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); -- 2.25.1
[PATCH v1 2/5] virtio-blk: move config space sizing code to virtio-blk-common
This way we can reuse it for other virtio-blk devices, e.g vhost-user-blk, which currently does not control its config space size dynamically. Signed-off-by: Daniil Tatianin --- MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/virtio-blk-common.c | 42 +++ hw/block/virtio-blk.c | 24 +-- include/hw/virtio/virtio-blk-common.h | 21 ++ 5 files changed, 70 insertions(+), 25 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h diff --git a/MAINTAINERS b/MAINTAINERS index 5ce4227ff6..a7d3914735 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2030,8 +2030,10 @@ virtio-blk M: Stefan Hajnoczi L: qemu-bl...@nongnu.org S: Supported +F: hw/block/virtio-blk-common.c F: hw/block/virtio-blk.c F: hw/block/dataplane/* +F: include/hw/virtio/virtio-blk-common.h F: tests/qtest/virtio-blk-test.c T: git https://github.com/stefanha/qemu.git block @@ -2271,11 +2273,13 @@ S: Maintained F: contrib/vhost-user-blk/ F: contrib/vhost-user-scsi/ F: hw/block/vhost-user-blk.c +F: hw/block/virtio-blk-common.c F: hw/scsi/vhost-user-scsi.c F: hw/virtio/vhost-user-blk-pci.c F: hw/virtio/vhost-user-scsi-pci.c F: include/hw/virtio/vhost-user-blk.h F: include/hw/virtio/vhost-user-scsi.h +F: include/hw/virtio/virtio-blk-common.h vhost-user-gpu M: Marc-André Lureau diff --git a/hw/block/meson.build b/hw/block/meson.build index 2389326112..1908abd45c 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) -specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) +specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) +specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c', 'virtio-blk-common.c')) subdir('dataplane') diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c new file mode 100644 index 00..ac54568eb6 --- /dev/null +++ b/hw/block/virtio-blk-common.c @@ -0,0 +1,42 @@ +/* + * Virtio Block Device common helpers + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" + +#include "standard-headers/linux/virtio_blk.h" +#include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-blk-common.h" + +/* Config size before the discard support (hide associated config fields) */ +#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ + max_discard_sectors) + +/* + * Starting from the discard feature, we can use this array to properly + * set the config size depending on the features enabled. + */ +static VirtIOFeature feature_sizes[] = { +{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, + .end = endof(struct virtio_blk_config, discard_sector_alignment)}, +{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, + .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, +{} +}; + +size_t virtio_blk_common_get_config_size(uint64_t host_features) +{ +size_t config_size = MAX(VIRTIO_BLK_CFG_SIZE, +virtio_feature_get_config_size(feature_sizes, host_features)); + +assert(config_size <= sizeof(struct virtio_blk_config)); +return config_size; +} diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index a4162dbbf2..4ca6d0f211 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -32,31 +32,9 @@ #include "hw/virtio/virtio-bus.h" #include "migration/qemu-file-types.h" #include "hw/virtio/virtio-access.h" +#include "hw/virtio/virtio-blk-common.h" #include "qemu/coroutine.h" -/* Config size before the discard support (hide associated config fields) */ -#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \ - max_discard_sectors) -/* - * Starting from the discard feature, we can use this array to properly - * set the config size depending on the features enabled. - */ -static const VirtIOFeature feature_sizes[] = { -{.flags = 1ULL << VIRTIO_BLK_F_DISCARD, - .end = endof(struct virtio_blk_config, discard_sector_alignment)}, -{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, - .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)}, -{} -}; - -static size_t virtio_blk_common_get_config_size(uint64_t host_features) -{ -size_t config_size = MAX(VIRTIO_BLK_CFG_SIZ
[PATCH v1 0/5] vhost-user-blk: dynamically resize config space based on features
This patch set attempts to align vhost-user-blk with virtio-blk in terms of backward compatibility and flexibility. In particular it adds the following things: - Ability to disable modern features like discard/write-zeroes. - Dynamic configuration space resizing based on enabled features, by reusing the code, which was already present in virtio-blk. - Makes the VHostUserBlk structure a bit less clunky by using the 'host_features' field to represent enabled features, as opposed to using a separate field per feature. This was already done for virtio-blk a long time ago. Daniil Tatianin (5): virtio-blk: decouple config size determination code from VirtIOBlock virtio-blk: move config space sizing code to virtio-blk-common vhost-user-blk: make it possible to disable write-zeroes/discard vhost-user-blk: make 'config_wce' part of 'host_features' vhost-user-blk: dynamically resize config space based on features MAINTAINERS | 4 +++ hw/block/meson.build | 4 +-- hw/block/vhost-user-blk.c | 29 +- hw/block/virtio-blk-common.c | 42 +++ hw/block/virtio-blk.c | 25 ++-- include/hw/virtio/vhost-user-blk.h| 4 ++- include/hw/virtio/virtio-blk-common.h | 21 ++ 7 files changed, 90 insertions(+), 39 deletions(-) create mode 100644 hw/block/virtio-blk-common.c create mode 100644 include/hw/virtio/virtio-blk-common.h -- 2.25.1
[PATCH v1 5/5] vhost-user-blk: dynamically resize config space based on features
Make vhost-user-blk backwards compatible when migrating from older VMs running with modern features turned off, the same way it was done for virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the features enabled") It's currently impossible to migrate from an older VM with vhost-user-blk (with disable-legacy=off) because of errors like this: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 device: 1 cmask: ff wmask: 80 w1cmask:0 qemu-system-x86_64: Failed to load PCIDevice:config qemu-system-x86_64: Failed to load virtio-blk:virtio qemu-system-x86_64: error while loading state for instance 0x0 of device ':00:05.0:00.0:02.0/virtio-blk' qemu-system-x86_64: load of migration failed: Invalid argument This is caused by the newer (destination) VM requiring a bigger BAR0 alignment because it has to cover a bigger configuration space, which isn't actually needed since those additional config fields are not active (write-zeroes/discard). Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 15 --- include/hw/virtio/vhost-user-blk.h | 1 + 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 64f3457373..d18a7a2cd4 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -23,6 +23,7 @@ #include "hw/qdev-core.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" +#include "hw/virtio/virtio-blk-common.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user-blk.h" #include "hw/virtio/virtio.h" @@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config) /* Our num_queues overrides the device backend */ virtio_stw_p(vdev, >blkcfg.num_queues, s->num_queues); -memcpy(config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(config, >blkcfg, s->config_size); } static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t *config) @@ -96,8 +97,7 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) Error *local_err = NULL; ret = vhost_dev_get_config(dev, (uint8_t *), - sizeof(struct virtio_blk_config), - _err); + s->config_size, _err); if (ret < 0) { error_report_err(local_err); return ret; @@ -106,7 +106,7 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) /* valid for resize only */ if (blkcfg.capacity != s->blkcfg.capacity) { s->blkcfg.capacity = blkcfg.capacity; -memcpy(dev->vdev->config, >blkcfg, sizeof(struct virtio_blk_config)); +memcpy(dev->vdev->config, >blkcfg, s->config_size); virtio_notify_config(dev->vdev); } @@ -444,7 +444,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk *s, Error **errp) assert(s->connected); ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, - sizeof(struct virtio_blk_config), errp); + s->config_size, errp); if (ret < 0) { qemu_chr_fe_disconnect(>chardev); vhost_dev_cleanup(>dev); @@ -489,8 +489,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) return; } -virtio_init(vdev, VIRTIO_ID_BLOCK, -sizeof(struct virtio_blk_config)); +s->config_size = virtio_blk_common_get_config_size(s->host_features); + +virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); s->virtqs = g_new(VirtQueue *, s->num_queues); for (i = 0; i < s->num_queues; i++) { diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h index 6252095c45..b7810360b9 100644 --- a/include/hw/virtio/vhost-user-blk.h +++ b/include/hw/virtio/vhost-user-blk.h @@ -52,6 +52,7 @@ struct VHostUserBlk { bool started_vu; uint64_t host_features; +size_t config_size; }; #endif -- 2.25.1
[PATCH v1 3/5] vhost-user-blk: make it possible to disable write-zeroes/discard
It is useful to have the ability to disable these features for compatibility with older VMs that don't have these implemented. Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 8 ++-- include/hw/virtio/vhost-user-blk.h | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..e89164c358 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -251,6 +251,8 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, { VHostUserBlk *s = VHOST_USER_BLK(vdev); +features |= s->host_features; + /* Turn on pre-defined features */ virtio_add_feature(, VIRTIO_BLK_F_SIZE_MAX); virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX); @@ -259,8 +261,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_BLK_SIZE); virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -virtio_add_feature(, VIRTIO_BLK_F_DISCARD); -virtio_add_feature(, VIRTIO_BLK_F_WRITE_ZEROES); if (s->config_wce) { virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); @@ -592,6 +592,10 @@ static Property vhost_user_blk_properties[] = { VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("discard", VHostUserBlk, host_features, + VIRTIO_BLK_F_DISCARD, true), +DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, host_features, + VIRTIO_BLK_F_WRITE_ZEROES, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h index 7c91f15040..20573dd586 100644 --- a/include/hw/virtio/vhost-user-blk.h +++ b/include/hw/virtio/vhost-user-blk.h @@ -51,6 +51,8 @@ struct VHostUserBlk { bool connected; /* vhost_user_blk_start/vhost_user_blk_stop */ bool started_vu; + +uint64_t host_features; }; #endif -- 2.25.1
[PATCH v1 4/5] vhost-user-blk: make 'config_wce' part of 'host_features'
No reason to have this be a separate field. This also makes it more akin to what the virtio-blk device does. Signed-off-by: Daniil Tatianin --- hw/block/vhost-user-blk.c | 6 ++ include/hw/virtio/vhost-user-blk.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index e89164c358..64f3457373 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -262,9 +262,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, virtio_add_feature(, VIRTIO_BLK_F_FLUSH); virtio_add_feature(, VIRTIO_BLK_F_RO); -if (s->config_wce) { -virtio_add_feature(, VIRTIO_BLK_F_CONFIG_WCE); -} if (s->num_queues > 1) { virtio_add_feature(, VIRTIO_BLK_F_MQ); } @@ -591,7 +588,8 @@ static Property vhost_user_blk_properties[] = { DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, VHOST_USER_BLK_AUTO_NUM_QUEUES), DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128), -DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true), +DEFINE_PROP_BIT64("config-wce", VHostUserBlk, host_features, + VIRTIO_BLK_F_CONFIG_WCE, true), DEFINE_PROP_BIT64("discard", VHostUserBlk, host_features, VIRTIO_BLK_F_DISCARD, true), DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, host_features, diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h index 20573dd586..6252095c45 100644 --- a/include/hw/virtio/vhost-user-blk.h +++ b/include/hw/virtio/vhost-user-blk.h @@ -34,7 +34,6 @@ struct VHostUserBlk { struct virtio_blk_config blkcfg; uint16_t num_queues; uint32_t queue_size; -uint32_t config_wce; struct vhost_dev dev; struct vhost_inflight *inflight; VhostUserState vhost_user; -- 2.25.1
Re: [PATCH v1 2/2] osdep: support mempolicy for preallocation in os_mem_prealloc
I believe you're right. Looking at the implementation of shmem_alloc_page, it uses the inode policy, which is set viavma->set_policy (from the mbind() call in this case). set_mempolicy is both useless and redundant here, as thread'spolicy is only ever used in case vma->get_policy returns NULL (which it doesn't in our case).Sorry for the confusion.Thanks,Daniil 07.12.2021, 11:13, "David Hildenbrand" :On 07.12.21 08:06, Daniil Tatianin wrote: This is needed for cases where we want to make sure that a shared memory region gets allocated from a specific NUMA node. This is impossible to do with mbind(2) because it ignores the policy for memory mapped with MAP_SHARED. We work around this by calling set_mempolicy from prealloc threads instead.That's not quite true I think, how were you able to observe this? Do youhave a reproducer?While the man page says:"The specified policy will be ignored for any MAP_SHARED mappings inthe specified memory range. Rather the pages will be allocatedaccording to the memory policy of the thread that caused the page to beallocated. Again, this may not be the thread that called mbind()."What it really means is that as long as we access that memory via the*VMA* for which we called mbind(), which is the case when *not* usingfallocate() to preallocate memory, we end up using the correct policy.I did experiments a while ago with hugetlbfs shared memory and itproperly allocated from the right node. So I'd be curious how youtrigger this. --Thanks,David / dhildenb
[PATCH v1 1/2] hostmem: use a static size for maxnode, validate policy everywhere
Previously we would calculate the last set bit in the mask, and add 2 to that value to get the maxnode value. This is unnecessary since the mbind syscall allows the bitmap to be any (reasonable) size as long as all the unused bits are clear. This also adds policy validation in multiple places so that it's guaranteed to be valid when we call mbind. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 64 +++--- 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 4c05862ed5..392026efe6 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -38,6 +38,29 @@ host_memory_backend_get_name(HostMemoryBackend *backend) return object_get_canonical_path(OBJECT(backend)); } +static bool +validate_policy(HostMemPolicy policy, bool nodes_empty, Error **errp) +{ +/* + * check for invalid host-nodes and policies and give more verbose + * error messages than mbind(). + */ +if (!nodes_empty && policy == MPOL_DEFAULT) { +error_setg(errp, "host-nodes must be empty for policy default," + " or you should explicitly specify a policy other" + " than default"); +return false; +} + +if (nodes_empty && policy != MPOL_DEFAULT) { +error_setg(errp, "host-nodes must be set for policy %s", + HostMemPolicy_str(policy)); +return false; +} + +return true; +} + static void host_memory_backend_get_size(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) @@ -110,6 +133,7 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor *v, const char *name, #ifdef CONFIG_NUMA HostMemoryBackend *backend = MEMORY_BACKEND(obj); uint16List *l, *host_nodes = NULL; +bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); visit_type_uint16List(v, name, _nodes, errp); @@ -118,6 +142,13 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor *v, const char *name, error_setg(errp, "Invalid host-nodes value: %d", l->value); goto out; } + +nodes_empty = false; +} + +if (host_memory_backend_mr_inited(backend) && +!validate_policy(backend->policy, nodes_empty, errp)) { +goto out; } for (l = host_nodes; l; l = l->next) { @@ -142,6 +173,13 @@ static void host_memory_backend_set_policy(Object *obj, int policy, Error **errp) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); +bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); + +if (host_memory_backend_mr_inited(backend) && +!validate_policy(policy, nodes_empty, errp)) { +return; +} + backend->policy = policy; #ifndef CONFIG_NUMA @@ -347,24 +385,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP); } #ifdef CONFIG_NUMA -unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); -/* lastbit == MAX_NODES means maxnode = 0 */ -unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); -/* ensure policy won't be ignored in case memory is preallocated - * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so - * this doesn't catch hugepage case. */ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; - -/* check for invalid host-nodes and policies and give more verbose - * error messages than mbind(). */ -if (maxnode && backend->policy == MPOL_DEFAULT) { -error_setg(errp, "host-nodes must be empty for policy default," - " or you should explicitly specify a policy other" - " than default"); -return; -} else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { -error_setg(errp, "host-nodes must be set for policy %s", - HostMemPolicy_str(backend->policy)); +bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); +if (!validate_policy(backend->policy, nodes_empty, errp)) { return; } @@ -373,12 +396,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) * cuts off the last specified node. This means backend->host_nodes * must have MAX_NODES+1 bits available. */ -assert(sizeof(backend->host_nodes) >= +QEMU_BUILD_BUG_ON(sizeof(backend->host_nodes) < BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); -assert(maxnode <= MAX_NODES); -if (maxnode && -mbind(ptr, sz, backend->policy, backend->host_nodes, max
[PATCH v1 2/2] osdep: support mempolicy for preallocation in os_mem_prealloc
This is needed for cases where we want to make sure that a shared memory region gets allocated from a specific NUMA node. This is impossible to do with mbind(2) because it ignores the policy for memory mapped with MAP_SHARED. We work around this by calling set_mempolicy from prealloc threads instead. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 6 -- include/qemu/osdep.h | 3 ++- util/meson.build | 2 ++ util/oslib-posix.c | 29 ++--- util/oslib-win32.c | 3 ++- 5 files changed, 36 insertions(+), 7 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 392026efe6..0c508ed9df 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -269,7 +269,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(>mr); uint64_t sz = memory_region_size(>mr); -os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, _err); +os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, backend->policy, +backend->host_nodes, MAX_NODES + 1, _err); if (local_err) { error_propagate(errp, local_err); return; @@ -415,7 +416,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { os_mem_prealloc(memory_region_get_fd(>mr), ptr, sz, -backend->prealloc_threads, _err); +backend->prealloc_threads, backend->policy, +backend->host_nodes, MAX_NODES + 1, _err); if (local_err) { goto out; } diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 60718fc342..abf88aeb0e 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -688,7 +688,8 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus, - Error **errp); + int policy, unsigned long *node_bitmap, + unsigned long max_node, Error **errp); /** * qemu_get_pid_name: diff --git a/util/meson.build b/util/meson.build index 05b593055a..25f9fca379 100644 --- a/util/meson.build +++ b/util/meson.build @@ -87,3 +87,5 @@ if have_block if_false: files('filemonitor-stub.c')) util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c')) endif + +util_ss.add(when: 'CONFIG_NUMA', if_true: numa) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index e8bdb02e1d..bca25698c5 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -38,11 +38,13 @@ #include "qemu/sockets.h" #include "qemu/thread.h" #include +#include "qemu/bitmap.h" #include "qemu/cutils.h" #include "qemu/compiler.h" #ifdef CONFIG_LINUX #include +#include #endif #ifdef __FreeBSD__ @@ -79,6 +81,9 @@ struct MemsetThread { size_t hpagesize; QemuThread pgthread; sigjmp_buf env; +int policy; +unsigned long *node_bitmap; +unsigned long max_node; }; typedef struct MemsetThread MemsetThread; @@ -464,6 +469,18 @@ static void *do_touch_pages(void *arg) } qemu_mutex_unlock(_mutex); +#ifdef CONFIG_NUMA +if (memset_args->max_node && +!bitmap_empty(memset_args->node_bitmap, memset_args->max_node)) { +long ret = set_mempolicy(memset_args->policy, memset_args->node_bitmap, + memset_args->max_node); +if (ret < 0) { +memset_thread_failed = true; +return NULL; +} +} +#endif + /* unblock SIGBUS */ sigemptyset(); sigaddset(, SIGBUS); @@ -510,7 +527,8 @@ static inline int get_memset_num_threads(int smp_cpus) } static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, -int smp_cpus) +int smp_cpus, int policy, +unsigned long *node_bitmap, unsigned long max_node) { static gsize initialized = 0; size_t numpages_per_thread, leftover; @@ -533,6 +551,9 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, memset_thread[i].addr = addr; memset_thread[i].numpages = numpages_per_thread + (i < leftover); memset_thread[i].hpagesize = hpagesize; +memset_thread[i].policy = policy; +memset_thread[i].node_bitmap = node_bitmap; +memset_thread[i].max_node = max_node; qemu_thread_create(_thread[i].pgthread, "touch_pages", do_touch_pages, _thread[i], QEMU_THREAD_JOINABLE); @@ -554,7 +575,8 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, } voi
[PATCH v1] hw/smbios: verify header type for file before using it
Signed-off-by: Daniil Tatianin --- hw/smbios/smbios.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c index 7397e56737..c55f77368a 100644 --- a/hw/smbios/smbios.c +++ b/hw/smbios/smbios.c @@ -1163,6 +1163,12 @@ void smbios_entry_add(QemuOpts *opts, Error **errp) return; } +if (header->type > SMBIOS_MAX_TYPE) { +error_setg(errp, + "invalid header type %d!", header->type); +return; +} + if (test_bit(header->type, have_fields_bitmap)) { error_setg(errp, "can't load type %d struct, fields already specified!", -- 2.25.1
[PATCH v1 2/2] hw/scsi/vhost-scsi: don't double close vhostfd on error
vhost_dev_init calls vhost_dev_cleanup on error, which closes vhostfd, don't double close it. Signed-off-by: Daniil Tatianin --- hw/scsi/vhost-scsi.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index efb3e14d9e..778f43e4c1 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -222,6 +222,11 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) ret = vhost_dev_init(>dev, (void *)(uintptr_t)vhostfd, VHOST_BACKEND_TYPE_KERNEL, 0, errp); if (ret < 0) { +/* + * vhost_dev_init calls vhost_dev_cleanup on error, which closes + * vhostfd, don't double close it. + */ +vhostfd = -1; goto free_vqs; } @@ -242,7 +247,9 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) error_free(vsc->migration_blocker); virtio_scsi_common_unrealize(dev); close_fd: -close(vhostfd); +if (vhostfd >= 0) { +close(vhostfd); +} return; } -- 2.25.1
[PATCH v1 1/2] hw/scsi/vhost-scsi: don't leak vqs on error
vhost_dev_init calls vhost_dev_cleanup in case of an error during initialization, which zeroes out the entire vsc->dev as well as the vsc->dev.vqs pointer. This prevents us from properly freeing it in free_vqs. Keep a local copy of the pointer so we can free it later. Signed-off-by: Daniil Tatianin --- hw/scsi/vhost-scsi.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 039caf2614..efb3e14d9e 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -170,6 +170,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) Error *err = NULL; int vhostfd = -1; int ret; +struct vhost_virtqueue *vqs = NULL; if (!vs->conf.wwpn) { error_setg(errp, "vhost-scsi: missing wwpn"); @@ -213,7 +214,8 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) } vsc->dev.nvqs = VHOST_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; -vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs); +vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs); +vsc->dev.vqs = vqs; vsc->dev.vq_index = 0; vsc->dev.backend_features = 0; @@ -232,7 +234,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) return; free_vqs: -g_free(vsc->dev.vqs); +g_free(vqs); if (!vsc->migratable) { migrate_del_blocker(vsc->migration_blocker); } -- 2.25.1
[PATCH v1] virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup
In case of an error during initialization in vhost_dev_init, vhostfd is closed in vhost_dev_cleanup. Remove close from err_virtio as it's both redundant and causes a double close on vhostfd. Signed-off-by: Daniil Tatianin --- hw/virtio/vhost-vsock.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c index 478c0c9a87..433d42d897 100644 --- a/hw/virtio/vhost-vsock.c +++ b/hw/virtio/vhost-vsock.c @@ -171,6 +171,10 @@ static void vhost_vsock_device_realize(DeviceState *dev, Error **errp) ret = vhost_dev_init(>vhost_dev, (void *)(uintptr_t)vhostfd, VHOST_BACKEND_TYPE_KERNEL, 0, errp); if (ret < 0) { +/* + * vhostfd is closed by vhost_dev_cleanup, which is called + * by vhost_dev_init on initialization error. + */ goto err_virtio; } @@ -183,15 +187,10 @@ static void vhost_vsock_device_realize(DeviceState *dev, Error **errp) return; err_vhost_dev: -vhost_dev_cleanup(>vhost_dev); /* vhost_dev_cleanup() closes the vhostfd passed to vhost_dev_init() */ -vhostfd = -1; +vhost_dev_cleanup(>vhost_dev); err_virtio: vhost_vsock_common_unrealize(vdev); -if (vhostfd >= 0) { -close(vhostfd); -} -return; } static void vhost_vsock_device_unrealize(DeviceState *dev) -- 2.25.1
[PATCH v1] hw/i386/amd_iommu: clean up broken event logging
- Don't create evt buffer in every function where we want to log, instead make amdvi_log_event construct the buffer in-place using the arguments it was given. - Correctly place address & info in the event buffer. Previously both would get shifted to incorrect bit offsets leading to evt containing uninitialized event type and address. - Reduce buffer size from 32 to 16 bytes, which is the amount of bytes actually needed for event storage. - Remove amdvi_encode_event & amdvi_setevent_bits, as they are no longer needed. Signed-off-by: Daniil Tatianin --- hw/i386/amd_iommu.c | 62 ++--- 1 file changed, 14 insertions(+), 48 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index fd75cae024..44b995be91 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -164,8 +164,14 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s) } } -static void amdvi_log_event(AMDVIState *s, uint64_t *evt) +static void amdvi_log_event(AMDVIState *s, uint16_t devid, +hwaddr addr, uint16_t info) { +uint64_t evt[2] = { +devid | ((uint64_t)info << 48), +addr +}; + /* event logging not enabled */ if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_OVF)) { @@ -190,27 +196,6 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt) amdvi_generate_msi_interrupt(s); } -static void amdvi_setevent_bits(uint64_t *buffer, uint64_t value, int start, -int length) -{ -int index = start / 64, bitpos = start % 64; -uint64_t mask = MAKE_64BIT_MASK(start, length); -buffer[index] &= ~mask; -buffer[index] |= (value << bitpos) & mask; -} -/* - * AMDVi event structure - *0:15 -> DeviceID - *55:63 -> event type + miscellaneous info - *63:127 -> related address - */ -static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr, - uint16_t info) -{ -amdvi_setevent_bits(evt, devid, 0, 16); -amdvi_setevent_bits(evt, info, 55, 8); -amdvi_setevent_bits(evt, addr, 63, 64); -} /* log an error encountered during a page walk * * @addr: virtual address in translation request @@ -218,11 +203,8 @@ static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr, static void amdvi_page_fault(AMDVIState *s, uint16_t devid, hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_IOPF_I | AMDVI_EVENT_IOPF; -amdvi_encode_event(evt, devid, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -232,14 +214,10 @@ static void amdvi_page_fault(AMDVIState *s, uint16_t devid, * @info : error flags */ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid, - hwaddr devtab, uint16_t info) + hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_DEV_TAB_HW_ERROR; - -amdvi_encode_event(evt, devid, devtab, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -248,10 +226,7 @@ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid, */ static void amdvi_log_command_error(AMDVIState *s, hwaddr addr) { -uint64_t evt[4], info = AMDVI_EVENT_COMMAND_HW_ERROR; - -amdvi_encode_event(evt, 0, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, 0, addr, AMDVI_EVENT_COMMAND_HW_ERROR); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -261,11 +236,8 @@ static void amdvi_log_command_error(AMDVIState *s, hwaddr addr) static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info, hwaddr addr) { -uint64_t evt[4]; - info |= AMDVI_EVENT_ILLEGAL_COMMAND_ERROR; -amdvi_encode_event(evt, 0, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, 0, addr, info); } /* log an error accessing device table * @@ -276,11 +248,8 @@ static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info, static void amdvi_log_illegaldevtab_error(AMDVIState *s, uint16_t devid, hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY; -amdvi_encode_event(evt, devid, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); } /* log an error accessing a PTE entry * @addr : address that couldn't be accessed @@ -288,11 +257,8 @@ static void amdvi_log_illegaldevtab_e
[PATCH v1] chardev/wctable: don't free the instance in wctablet_chr_finalize
Object is supposed to be freed by invoking obj->free, and not obj->instance_finalize. This would lead to use-after-free followed by double free in object_unref/object_finalize. Signed-off-by: Daniil Tatianin --- chardev/wctablet.c | 1 - 1 file changed, 1 deletion(-) diff --git a/chardev/wctablet.c b/chardev/wctablet.c index e9cb7ca710..fa3c9be04e 100644 --- a/chardev/wctablet.c +++ b/chardev/wctablet.c @@ -318,7 +318,6 @@ static void wctablet_chr_finalize(Object *obj) TabletChardev *tablet = WCTABLET_CHARDEV(obj); qemu_input_handler_unregister(tablet->hs); -g_free(tablet); } static void wctablet_chr_open(Chardev *chr, -- 2.25.1