Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout
Hello, On 23.01.2023 19:14, Daniil Tatianin wrote: On 1/23/23 4:47 PM, Daniel P. Berrangé wrote: On Mon, Jan 23, 2023 at 04:30:03PM +0300, Daniil Tatianin wrote: On 1/23/23 11:57 AM, David Hildenbrand wrote: On 20.01.23 14:47, Daniil Tatianin wrote: This series introduces new qemu_prealloc_mem_with_timeout() api, which allows limiting the maximum amount of time to be spent on memory preallocation. It also adds prealloc statistics collection that is exposed via an optional timeout handler. This new api is then utilized by hostmem for guest RAM preallocation controlled via new object properties called 'prealloc-timeout' and 'prealloc-timeout-fatal'. This is useful for limiting VM startup time on systems with unpredictable page allocation delays due to memory fragmentation or the backing storage. The timeout can be configured to either simply emit a warning and continue VM startup without having preallocated the entire guest RAM or just abort startup entirely if that is not acceptable for a specific use case. The major use case for preallocation is memory resources that cannot be overcommitted (hugetlb, file blocks, ...), to avoid running out of such resources later, while the guest is already running, and crashing it. Wouldn't you say that preallocating memory for the sake of speeding up guest kernel startup & runtime is a valid use case of prealloc? This way we can avoid expensive (for a multitude of reasons) page faults that will otherwise slow down the guest significantly at runtime and affect the user experience. Allocating only a fraction "because it takes too long" looks quite useless in that (main use-case) context. We shouldn't encourage QEMU users to play with fire in such a way. IOW, there should be no way around "prealloc-timeout-fatal". Either preallocation succeeded and the guest can run, or it failed, and the guest can't run. Here we basically accept the fact that e.g with fragmented memory the kernel might take a while in a page fault handler especially for hugetlb because of page compaction that has to run for every fault. This way we can prefault at least some number of pages and let the guest fault the rest on demand later on during runtime even if it's slow and would cause a noticeable lag. Rather than treat this as a problem that needs a timeout, can we restate it as situations need synchronous vs asynchronous preallocation ? For the case where we need synchronous prealloc, current QEMU deals with that. If it doesn't work quickly enough, mgmt can just kill QEMU already today. For the case where you would like some prealloc, but don't mind if it runs without full prealloc, then why not just treat it as an entirely asynchronous task ? Instead of calling qemu_prealloc_mem and waiting for it to complete, just spawn a thread to run qemu_prealloc_mem, so it doesn't block QEMU startup. This will have minimal maint burden on the existing code, and will avoid need for mgmt apps to think about what timeout value to give, which is good because timeouts are hard to get right. Most of the time that async background prealloc will still finish before the guest even gets out of the firmware phase, but if it takes longer it is no big deal. You don't need to quit the prealloc job early, you just need it to not delay the guest OS boot IIUC. This impl could be done with the 'prealloc' property turning from a boolean on/off, to a enum on/async/off, where 'on' == sync prealloc. Or add a separate 'prealloc-async' bool property I like this idea, but I'm not sure how we would go about writing to live guest memory. Is that something that can be done safely without racing with the guest? Don't forget that prealloc threads will need some CPUs to run, which would likely result in increased steal time during preallocation for the guest. Likely not a big deal, but something to keep in mind. Best, Valentine
Re: [PATCH v1] hw/i386/amd_iommu: clean up broken event logging
Hi, On 17.11.2021 19:46, Daniil Tatianin wrote: - Don't create evt buffer in every function where we want to log, instead make amdvi_log_event construct the buffer in-place using the arguments it was given. - Correctly place address & info in the event buffer. Previously both would get shifted to incorrect bit offsets leading to evt containing uninitialized event type and address. - Reduce buffer size from 32 to 16 bytes, which is the amount of bytes actually needed for event storage. - Remove amdvi_encode_event & amdvi_setevent_bits, as they are no longer needed. Signed-off-by: Daniil Tatianin --- hw/i386/amd_iommu.c | 62 ++--- 1 file changed, 14 insertions(+), 48 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index fd75cae024..44b995be91 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -164,8 +164,14 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s) } } -static void amdvi_log_event(AMDVIState *s, uint64_t *evt) +static void amdvi_log_event(AMDVIState *s, uint16_t devid, +hwaddr addr, uint16_t info) { +uint64_t evt[2] = { +devid | ((uint64_t)info << 48), +addr +}; + /* event logging not enabled */ if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_OVF)) { @@ -190,27 +196,6 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt) amdvi_generate_msi_interrupt(s); } -static void amdvi_setevent_bits(uint64_t *buffer, uint64_t value, int start, -int length) -{ -int index = start / 64, bitpos = start % 64; -uint64_t mask = MAKE_64BIT_MASK(start, length); -buffer[index] &= ~mask; -buffer[index] |= (value << bitpos) & mask; -} -/* - * AMDVi event structure - *0:15 -> DeviceID - *55:63 -> event type + miscellaneous info - *63:127 -> related address Did you mean 64:127? Fields shouldn't overlap, and 65-bit address looks suspicious. - */ -static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr, - uint16_t info) -{ -amdvi_setevent_bits(evt, devid, 0, 16); -amdvi_setevent_bits(evt, info, 55, 8); -amdvi_setevent_bits(evt, addr, 63, 64); -} /* log an error encountered during a page walk * * @addr: virtual address in translation request @@ -218,11 +203,8 @@ static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr, static void amdvi_page_fault(AMDVIState *s, uint16_t devid, hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_IOPF_I | AMDVI_EVENT_IOPF; -amdvi_encode_event(evt, devid, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -232,14 +214,10 @@ static void amdvi_page_fault(AMDVIState *s, uint16_t devid, * @info : error flags */ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid, - hwaddr devtab, uint16_t info) + hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_DEV_TAB_HW_ERROR; - -amdvi_encode_event(evt, devid, devtab, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -248,10 +226,7 @@ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid, */ static void amdvi_log_command_error(AMDVIState *s, hwaddr addr) { -uint64_t evt[4], info = AMDVI_EVENT_COMMAND_HW_ERROR; - -amdvi_encode_event(evt, 0, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, 0, addr, AMDVI_EVENT_COMMAND_HW_ERROR); pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS, PCI_STATUS_SIG_TARGET_ABORT); } @@ -261,11 +236,8 @@ static void amdvi_log_command_error(AMDVIState *s, hwaddr addr) static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info, hwaddr addr) { -uint64_t evt[4]; - info |= AMDVI_EVENT_ILLEGAL_COMMAND_ERROR; -amdvi_encode_event(evt, 0, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, 0, addr, info); } /* log an error accessing device table * @@ -276,11 +248,8 @@ static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info, static void amdvi_log_illegaldevtab_error(AMDVIState *s, uint16_t devid, hwaddr addr, uint16_t info) { -uint64_t evt[4]; - info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY; -amdvi_encode_event(evt, devid, addr, info); -amdvi_log_event(s, evt); +amdvi_log_event(s, devid, addr, info); } /* log an error
Re: [PATCH v1] hw/i386/amd_iommu: clean up broken event logging
On 18.11.2021 00:03, Roman Kagan wrote: On Wed, Nov 17, 2021 at 11:13:27PM +0500, Valentin Sinitsyn wrote: On 17.11.2021 19:46, Daniil Tatianin wrote: -/* - * AMDVi event structure - *0:15 -> DeviceID - *55:63 -> event type + miscellaneous info - *63:127 -> related address Did you mean 64:127? Fields shouldn't overlap, and 65-bit address looks suspicious. These lines are being removed by this patch. And yes, they were wrong WRT the spec. Oops I missed a minus. Sorry for that. Valentin
Re: [BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization
On 01.04.2021 14:21, Denis Plotnikov wrote: This is a series fixing a bug in host-user-blk. More specifically, it's not just a bug but crasher. Valentine Is there any chance for it to be considered for the next rc? Thanks! Denis On 29.03.2021 16:44, Denis Plotnikov wrote: ping! On 25.03.2021 18:12, Denis Plotnikov wrote: v3: * 0003: a new patch added fixing the problem on vm shutdown I stumbled on this bug after v2 sending. * 0001: gramma fixing (Raphael) * 0002: commit message fixing (Raphael) v2: * split the initial patch into two (Raphael) * rename init to realized (Raphael) * remove unrelated comment (Raphael) When the vhost-user-blk device lose the connection to the daemon during the initialization phase it kills qemu because of the assert in the code. The series fixes the bug. 0001 is preparation for the fix 0002 fixes the bug, patch description has the full motivation for the series 0003 (added in v3) fix bug on vm shutdown Denis Plotnikov (3): vhost-user-blk: use different event handlers on initialization vhost-user-blk: perform immediate cleanup if disconnect on initialization vhost-user-blk: add immediate cleanup on shutdown hw/block/vhost-user-blk.c | 79 --- 1 file changed, 48 insertions(+), 31 deletions(-)