Setting a high maxmem value seriously degrades the guest's boot
time, from 3 seconds for 1T up to more than 3 minutes for 8T.
All this time is spent during initial machine setup and CAS,
preventing use of the QEMU monitor in the meantime.

Profiling reveals:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 85.48     24.08    24.08   566117     0.00     0.00  
object_get_canonical_path_component
 13.67     27.93     3.85 57623944     0.00     0.00  strstart

-----------------------------------------------
                0.00    0.00       1/566117      host_memory_backend_get_name 
[270]
                1.41    0.22   33054/566117      drc_realize <cycle 1> [23]
               22.67    3.51  533062/566117      object_get_canonical_path 
<cycle 1> [3]
[2]     98.7   24.08    3.73  566117         
object_get_canonical_path_component [2]
                3.73    0.00 55802324/57623944     strstart [19]
-----------------------------------------------
                                  12             object_property_set_link 
<cycle 1> [1267]
                               33074             device_set_realized <cycle 1> 
[138]
                              231378             object_get_child_property 
<cycle 1> [652]
[3]     93.0    0.01   26.18  264464         object_get_canonical_path <cycle 
1> [3]
               22.67    3.51  533062/566117      
object_get_canonical_path_component [2]
                              264464             object_get_root <cycle 1> [629]
-----------------------------------------------

This is because an 8T maxmem means QEMU can create up to 32768
LMB DRC objects, each tracking the hot-plug/unplug state of 256M
of contiguous RAM. These objects are all created during machine
init for the machine lifetime. Their realize path involves
several calls to object_get_canonical_path_component(), which
itself traverses all properties of the parent node. This results
in a quadratic operation. Worse, the full list of DRCs is traversed
7 times during the boot process, eg. to populate the device tree,
calling object_get_canonical_path_component() on each DRC again.
Yet even more costly quadratic traversals.

Modeling DR connectors as individual devices raises some
concerns, as already discussed a year ago in this thread:

https://patchew.org/QEMU/20191017205953.13122-1-chel...@linux.vnet.ibm.com/

First, having so many devices to track the DRC states is excessive
and can cause scalability issues in various ways. This bites again
with this quadratic traversal issue. Second, DR connectors are really
PAPR internals that shouldn't be exposed at all in the composition
tree.

This series converts DR connectors to be simple unparented
objects tracked in a separate hash table, rather than
actual devices exposed in the QOM tree. This doesn't address
the overall concern on scalability, but this brings linear
traversal of the DR connectors. The time penalty with a
8T maxmem is reduced to less than 1 second, and we get
a much shorter 'info qom-tree' output.

This is transparent to migration.

Greg Kurz (6):
  spapr: Call spapr_drc_reset() for all DRCs at CAS
  spapr: Fix reset of transient DR connectors
  spapr: Introduce spapr_drc_reset_all()
  spapr: Use spapr_drc_reset_all() at machine reset
  spapr: Add drc_ prefix to the DRC realize and unrealize functions
  spapr: Model DR connectors as simple objects

 include/hw/ppc/spapr_drc.h |  18 +++-
 hw/ppc/spapr.c             |  15 +--
 hw/ppc/spapr_drc.c         | 181 +++++++++++++++++--------------------
 hw/ppc/spapr_hcall.c       |  33 ++-----
 hw/ppc/spapr_pci.c         |   2 +-
 5 files changed, 106 insertions(+), 143 deletions(-)

-- 
2.26.2



Reply via email to