On Mon, 20 Oct 2025 10:51:06 -0300 Gustavo Romero <[email protected]> wrote:
Hi Gustavo, Thanks for the review and sorry for the delay. I am planning to go through all of the reviews soon. A quick note on your comment for the cover letter INLINE. > Hi Alireza, > > I've started to review your series and have some questions/comments on it, > please see them inline in the patches. > > > On 8/27/25 11:21, Alireza Sanaee wrote: > > Specifying the cache layout in virtual machines is useful for > > applications and operating systems to fetch accurate information about > > the cache structure and make appropriate adjustments. Enforcing correct > > Since this series applies only to TCG and all cache management > instruction are modeled as NOPs I'm wondering what would be a use case > for being able to specify the cache layout. I'm not saying I'm against it, > specially because it's for the virt machine and we have sort of freedom to > "customize" it in general. We tested with TCG and KVM, and the only change related to TCG was the increase of cache levels for cpu=max. This set applies to KVM indeed, because not all cache information come from registers. The sharing bit, in particular topology, must be described somehow, and we ENABLE that. Thanks, Alireza > > > Cheers, > Gustavo > > > sharing information can lead to better optimizations. Patches that allow > > for an interface to express caches was landed in the prior cycles. This > > patchset uses the interface as a foundation. Thus, the device tree and > > ACPI/PPTT table, and device tree are populated based on > > user-provided information and CPU topology. > > > > Example: > > > > +----------------+ +----------------+ > > | Socket 0 | | Socket 1 | > > | (L3 Cache) | | (L3 Cache) | > > +--------+-------+ +--------+-------+ > > | | > > +--------+--------+ +--------+--------+ > > | Cluster 0 | | Cluster 0 | > > | (L2 Cache) | | (L2 Cache) | > > +--------+--------+ +--------+--------+ > > | | > > +--------+--------+ +--------+--------+ +--------+--------+ > > +--------+----+ > > | Core 0 | | Core 1 | | Core 0 | | Core 1 > > | > > | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, > > L1d)| > > +--------+--------+ +--------+--------+ +--------+--------+ > > +--------+----+ > > | | | | > > +--------+ +--------+ +--------+ > > +--------+ > > |Thread 0| |Thread 1| |Thread 1| |Thread > > 0| > > +--------+ +--------+ +--------+ > > +--------+ > > |Thread 1| |Thread 0| |Thread 0| |Thread > > 1| > > +--------+ +--------+ +--------+ > > +--------+ > > > > > > The following command will represent the system relying on **ACPI PPTT > > tables**. > > > > ./qemu-system-aarch64 \ > > -machine > > virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket > > \ > > -cpu max \ > > -m 2048 \ > > -smp sockets=2,clusters=1,cores=2,threads=2 \ > > -kernel ./Image.gz \ > > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ > > -initrd rootfs.cpio.gz \ > > -bios ./edk2-aarch64-code.fd \ > > -nographic > > > > The following command will represent the system relying on **the device > > tree**. > > > > ./qemu-system-aarch64 \ > > -machine > > virt,acpi=off,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket > > \ > > -cpu max \ > > -m 2048 \ > > -smp sockets=2,clusters=1,cores=2,threads=2 \ > > -kernel ./Image.gz \ > > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \ > > -initrd rootfs.cpio.gz \ > > -nographic > > > > Failure cases: > > 1) There are scenarios where caches exist in systems' registers but > > left unspecified by users. In this case qemu returns failure. > > > > 2) SMT threads cannot share caches which is not very common. More > > discussions here [1]. > > > > Currently only three levels of caches are supported to be specified from > > the command line. However, increasing the value does not require > > significant changes. Further, this patch assumes l2 and l3 unified > > caches and does not allow l(2/3)(i/d). The level terminology is > > thread/core/cluster/socket. Hierarchy assumed in this patch: > > Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3; > > > > Possible future enhancements: > > 1) Separated data and instruction cache at L2 and L3. > > 2) Additional cache controls. e.g. size of L3 may not want to just > > match the underlying system, because only some of the associated host > > CPUs may be bound to this VM. > > > > [1] > > https://lore.kernel.org/devicetree-spec/[email protected]/ > > > > Change Log: > > v15->v16: > > * Rebase to e771ba98de25c9f43959f79fc7099cf7fbba44cc (Open 10.2 > > development tree) > > * v15: > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > v14->v15: > > * Introduced a separate patch for loongarch64 build_pptt function. > > * Made sure loongarch64 tests pass. > > * Downgraded to V2 for ACPI PPTT. Removed PPTT IDs as was not necessary. > > * Removed dependency as it's been merged in the recent cycle. > > -- [email protected] > > * Fixed styling issues and removed irrelevant changes. > > * Moved cache headers to core/cpu.h to be used in both acpi and virt. > > * v14: > > https://lore.kernel.org/qemu-devel/[email protected]/ > > # Thanks to Jonathan and Zhao for their comments. > > > > v13->v14: > > * Rebased on latest staging. > > * Made some naming changes to machine-smp.c, addd docs added to the > > same file. > > > > v12->v13: > > * Applied comments from Zhao. > > * Introduced a new patch for machine specific cache topology functions. > > * Base: bc98ffdc7577e55ab8373c579c28fe24d600c40f. > > > > v11->v12: > > * Patch #4 couldn't not merge properly as the main file diverged. Now > > it is fixed (hopefully). > > * Loonarch build_pptt function updated. > > * Rebased on 09be8a511a2e278b45729d7b065d30c68dd699d0. > > > > v10->v11: > > * Fix some coding style issues. > > * Rename some variables. > > > > v9->v10: > > * PPTT rev down to 2. > > > > v8->v9: > > * rebase to 10 > > * Fixed a bug in device-tree generation related to a scenario when > > caches are shared at core in higher levels than 1. > > v7->v8: > > * rebase: Merge tag 'pull-nbd-2024-08-26' of > > https://repo.or.cz/qemu/ericb into staging > > * I mis-included a file in patch #4 and I removed it in this one. > > > > v6->v7: > > * Intel stuff got pulled up, so rebase. > > * added some discussions on device tree. > > > > v5->v6: > > * Minor bug fix. > > * rebase based on new Intel patchset. > > - > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > v4->v5: > > * Added Reviewed-by tags. > > * Applied some comments. > > > > v3->v4: > > * Device tree added. > > > > Alireza Sanaee (8): > > target/arm/tcg: increase cache level for cpu=max > > hw/core/machine: topology functions capabilities added > > hw/arm/virt: add cache hierarchy to device tree > > bios-tables-test: prepare to change ARM ACPI virt PPTT > > acpi: add caches to ACPI build_pptt table function > > hw/acpi: add cache hierarchy to pptt table > > tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology > > Update the ACPI tables based on new aml-build.c > > > > hw/acpi/aml-build.c | 203 +++++++++- > > hw/arm/virt-acpi-build.c | 8 +- > > hw/arm/virt.c | 412 ++++++++++++++++++++- > > hw/core/machine-smp.c | 56 +++ > > hw/loongarch/virt-acpi-build.c | 4 +- > > include/hw/acpi/aml-build.h | 4 +- > > include/hw/acpi/cpu.h | 10 + > > include/hw/arm/virt.h | 7 +- > > include/hw/boards.h | 5 + > > include/hw/core/cpu.h | 12 + > > target/arm/tcg/cpu64.c | 13 + > > tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 516 bytes > > tests/qtest/bios-tables-test.c | 4 + > > 13 files changed, 723 insertions(+), 15 deletions(-) > > >
