Re: [PATCH v0 0/4] backends/hostmem: add an ability to specify prealloc timeout

2023-01-23 Thread Valentin Sinitsyn

Hello,

On 23.01.2023 19:14, Daniil Tatianin wrote:

On 1/23/23 4:47 PM, Daniel P. Berrangé wrote:

On Mon, Jan 23, 2023 at 04:30:03PM +0300, Daniil Tatianin wrote:

On 1/23/23 11:57 AM, David Hildenbrand wrote:

On 20.01.23 14:47, Daniil Tatianin wrote:

This series introduces new qemu_prealloc_mem_with_timeout() api,
which allows limiting the maximum amount of time to be spent on memory
preallocation. It also adds prealloc statistics collection that is
exposed via an optional timeout handler.

This new api is then utilized by hostmem for guest RAM preallocation
controlled via new object properties called 'prealloc-timeout' and
'prealloc-timeout-fatal'.

This is useful for limiting VM startup time on systems with
unpredictable page allocation delays due to memory fragmentation or 
the

backing storage. The timeout can be configured to either simply emit a
warning and continue VM startup without having preallocated the entire
guest RAM or just abort startup entirely if that is not acceptable for
a specific use case.


The major use case for preallocation is memory resources that cannot be
overcommitted (hugetlb, file blocks, ...), to avoid running out of such
resources later, while the guest is already running, and crashing it.


Wouldn't you say that preallocating memory for the sake of speeding 
up guest
kernel startup & runtime is a valid use case of prealloc? This way we 
can
avoid expensive (for a multitude of reasons) page faults that will 
otherwise
slow down the guest significantly at runtime and affect the user 
experience.



Allocating only a fraction "because it takes too long" looks quite
useless in that (main use-case) context. We shouldn't encourage QEMU
users to play with fire in such a way. IOW, there should be no way
around "prealloc-timeout-fatal". Either preallocation succeeded and the
guest can run, or it failed, and the guest can't run.


Here we basically accept the fact that e.g with fragmented memory the 
kernel
might take a while in a page fault handler especially for hugetlb 
because of

page compaction that has to run for every fault.

This way we can prefault at least some number of pages and let the guest
fault the rest on demand later on during runtime even if it's slow 
and would

cause a noticeable lag.


Rather than treat this as a problem that needs a timeout, can we
restate it as situations need synchronous vs asynchronous
preallocation ?

For the case where we need synchronous prealloc, current QEMU deals
with that. If it doesn't work quickly enough, mgmt can just kill
QEMU already today.

For the case where you would like some prealloc, but don't mind
if it runs without full prealloc, then why not just treat it as an
entirely asynchronous task ? Instead of calling qemu_prealloc_mem
and waiting for it to complete, just spawn a thread to run
qemu_prealloc_mem, so it doesn't block QEMU startup. This will
have minimal maint burden on the existing code, and will avoid
need for mgmt apps to think about what timeout value to give,
which is good because timeouts are hard to get right.

Most of the time that async background prealloc will still finish
before the guest even gets out of the firmware phase, but if it
takes longer it is no big deal. You don't need to quit the prealloc
job early, you just need it to not delay the guest OS boot IIUC.

This impl could be done with the 'prealloc' property turning from
a boolean on/off, to a enum  on/async/off, where 'on' == sync
prealloc. Or add a separate 'prealloc-async' bool property


I like this idea, but I'm not sure how we would go about writing to live 
guest memory. Is that something that can be done safely without racing 
with the guest?


Don't forget that prealloc threads will need some CPUs to run, which 
would likely result in increased steal time during preallocation for the 
guest.


Likely not a big deal, but something to keep in mind.

Best,
Valentine




Re: [PATCH v1] hw/i386/amd_iommu: clean up broken event logging

2021-11-17 Thread Valentin Sinitsyn

Hi,

On 17.11.2021 19:46, Daniil Tatianin wrote:

- Don't create evt buffer in every function where we want to log,
   instead make amdvi_log_event construct the buffer in-place using the
   arguments it was given.

- Correctly place address & info in the event buffer. Previously both
   would get shifted to incorrect bit offsets leading to evt containing
   uninitialized event type and address.

- Reduce buffer size from 32 to 16 bytes, which is the amount of bytes
   actually needed for event storage.

- Remove amdvi_encode_event & amdvi_setevent_bits, as they are no longer
   needed.

Signed-off-by: Daniil Tatianin 
---
  hw/i386/amd_iommu.c | 62 ++---
  1 file changed, 14 insertions(+), 48 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index fd75cae024..44b995be91 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -164,8 +164,14 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
  }
  }
  
-static void amdvi_log_event(AMDVIState *s, uint64_t *evt)

+static void amdvi_log_event(AMDVIState *s, uint16_t devid,
+hwaddr addr, uint16_t info)
  {
+uint64_t evt[2] = {
+devid | ((uint64_t)info << 48),
+addr
+};
+
  /* event logging not enabled */
  if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS,
  AMDVI_MMIO_STATUS_EVT_OVF)) {
@@ -190,27 +196,6 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
  amdvi_generate_msi_interrupt(s);
  }
  
-static void amdvi_setevent_bits(uint64_t *buffer, uint64_t value, int start,

-int length)
-{
-int index = start / 64, bitpos = start % 64;
-uint64_t mask = MAKE_64BIT_MASK(start, length);
-buffer[index] &= ~mask;
-buffer[index] |= (value << bitpos) & mask;
-}
-/*
- * AMDVi event structure
- *0:15   -> DeviceID
- *55:63  -> event type + miscellaneous info
- *63:127 -> related address
Did you mean 64:127? Fields shouldn't overlap, and 65-bit address looks 
suspicious.



- */
-static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr,
-   uint16_t info)
-{
-amdvi_setevent_bits(evt, devid, 0, 16);
-amdvi_setevent_bits(evt, info, 55, 8);
-amdvi_setevent_bits(evt, addr, 63, 64);
-}
  /* log an error encountered during a page walk
   *
   * @addr: virtual address in translation request
@@ -218,11 +203,8 @@ static void amdvi_encode_event(uint64_t *evt, uint16_t 
devid, uint64_t addr,
  static void amdvi_page_fault(AMDVIState *s, uint16_t devid,
   hwaddr addr, uint16_t info)
  {
-uint64_t evt[4];
-
  info |= AMDVI_EVENT_IOPF_I | AMDVI_EVENT_IOPF;
-amdvi_encode_event(evt, devid, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
  pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
  PCI_STATUS_SIG_TARGET_ABORT);
  }
@@ -232,14 +214,10 @@ static void amdvi_page_fault(AMDVIState *s, uint16_t 
devid,
   *  @info : error flags
   */
  static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid,
-   hwaddr devtab, uint16_t info)
+   hwaddr addr, uint16_t info)
  {
-uint64_t evt[4];
-
  info |= AMDVI_EVENT_DEV_TAB_HW_ERROR;
-
-amdvi_encode_event(evt, devid, devtab, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
  pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
  PCI_STATUS_SIG_TARGET_ABORT);
  }
@@ -248,10 +226,7 @@ static void amdvi_log_devtab_error(AMDVIState *s, uint16_t 
devid,
   */
  static void amdvi_log_command_error(AMDVIState *s, hwaddr addr)
  {
-uint64_t evt[4], info = AMDVI_EVENT_COMMAND_HW_ERROR;
-
-amdvi_encode_event(evt, 0, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, 0, addr, AMDVI_EVENT_COMMAND_HW_ERROR);
  pci_word_test_and_set_mask(s->pci.dev.config + PCI_STATUS,
  PCI_STATUS_SIG_TARGET_ABORT);
  }
@@ -261,11 +236,8 @@ static void amdvi_log_command_error(AMDVIState *s, hwaddr 
addr)
  static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info,
 hwaddr addr)
  {
-uint64_t evt[4];
-
  info |= AMDVI_EVENT_ILLEGAL_COMMAND_ERROR;
-amdvi_encode_event(evt, 0, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, 0, addr, info);
  }
  /* log an error accessing device table
   *
@@ -276,11 +248,8 @@ static void amdvi_log_illegalcom_error(AMDVIState *s, 
uint16_t info,
  static void amdvi_log_illegaldevtab_error(AMDVIState *s, uint16_t devid,
hwaddr addr, uint16_t info)
  {
-uint64_t evt[4];
-
  info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY;
-amdvi_encode_event(evt, devid, addr, info);
-amdvi_log_event(s, evt);
+amdvi_log_event(s, devid, addr, info);
  }
  /* log an error 

Re: [PATCH v1] hw/i386/amd_iommu: clean up broken event logging

2021-11-17 Thread Valentin Sinitsyn

On 18.11.2021 00:03, Roman Kagan wrote:

On Wed, Nov 17, 2021 at 11:13:27PM +0500, Valentin Sinitsyn wrote:

On 17.11.2021 19:46, Daniil Tatianin wrote:

-/*
- * AMDVi event structure
- *0:15   -> DeviceID
- *55:63  -> event type + miscellaneous info
- *63:127 -> related address

Did you mean 64:127? Fields shouldn't overlap, and 65-bit address looks
suspicious.


These lines are being removed by this patch.  And yes, they were wrong
WRT the spec.

Oops I missed a minus. Sorry for that.

Valentin



Re: [BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization

2021-04-01 Thread Valentin Sinitsyn

On 01.04.2021 14:21, Denis Plotnikov wrote:

This is a series fixing a bug in host-user-blk.

More specifically, it's not just a bug but crasher.

Valentine


Is there any chance for it to be considered for the next rc?

Thanks!

Denis

On 29.03.2021 16:44, Denis Plotnikov wrote:


ping!

On 25.03.2021 18:12, Denis Plotnikov wrote:

v3:
   * 0003: a new patch added fixing the problem on vm shutdown
 I stumbled on this bug after v2 sending.
   * 0001: gramma fixing (Raphael)
   * 0002: commit message fixing (Raphael)

v2:
   * split the initial patch into two (Raphael)
   * rename init to realized (Raphael)
   * remove unrelated comment (Raphael)

When the vhost-user-blk device lose the connection to the daemon during
the initialization phase it kills qemu because of the assert in the code.
The series fixes the bug.

0001 is preparation for the fix
0002 fixes the bug, patch description has the full motivation for the series
0003 (added in v3) fix bug on vm shutdown

Denis Plotnikov (3):
   vhost-user-blk: use different event handlers on initialization
   vhost-user-blk: perform immediate cleanup if disconnect on
 initialization
   vhost-user-blk: add immediate cleanup on shutdown

  hw/block/vhost-user-blk.c | 79 ---
  1 file changed, 48 insertions(+), 31 deletions(-)