On 5/13/21 8:20 PM, Alex Bennée wrote: > > Andrey Shinkevich <andrey.shinkev...@huawei.com> writes: > >> Dear colleagues, >> >> Thank you all very much for your responses. Let me reply with one message. >> >> I configured QEMU for AARCH64 guest: >> $ ./configure --target-list=aarch64-softmmu >> >> When I start QEMU with GICv3 on an x86 host: >> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3 > > Hmm are you sure you are running your built QEMU? For me the following > works fine:
No doubt I run my built QEMU because I am debugging it and watching the run of it with gcc. > > ./aarch64-softmmu/qemu-system-aarch64 -machine > virt-6.0,gic-version=3,accel=tcg -cpu max -serial mon:stdio -nic > user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci > -device scsi-hd,drive=hd0 -blockdev > driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 > -kernel > ~/lsrc/linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append > "console=ttyAMA0 root=/dev/sda2" -display none -m 8G,maxmem=8G -smp 12 > > Which source code are you using for building your QEMU? Would you please send me the link if it is a source other than github.com/qemu/qemu? I downloaded and pulled the latest commit 3e9f48bcdabe57f8f and applied the series "[PATCH v3 0/8] GICv3 LPI and ITS feature implementation" ONLY. Did you do the same? I have NOT applied the series "[PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path" yet because it is old and the manual applying takes more time (will do it later). Is it a possible reason that my guest hangs with locks at start? Andrey >> >> QEMU reports this error from hw/pci/msix.c: >> error_setg(errp, "MSI-X is not supported by interrupt controller"); >> >> Probably, the variable 'msi_nonbroken' would be initialized in >> hw/intc/arm_gicv3_its_common.c: >> gicv3_its_init_mmio(..) >> >> I guess that it works with KVM acceleration only rather than with TCG. >> >> The error persists after applying the series: >> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html >> "GICv3 LPI and ITS feature implementation" >> (special thanks for referring me to that) >> >> Please, make me clear and advise ideas how that error can be fixed? >> Should the MSI-X support be implemented with GICv3 extra? >> >> When successful, I would like to test QEMU for a maximum number of cores >> to get the best MTTCG performance. >> Probably, we will get just some percentage of performance enhancement >> with the BQL series applied, won't we? I will test it as well. >> >> Best regards, >> Andrey Shinkevich >> >> >> On 5/12/21 6:43 PM, Alex Bennée wrote: >>> >>> Andrey Shinkevich <andrey.shinkev...@huawei.com> writes: >>> >>>> Dear colleagues, >>>> >>>> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host. >>>> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8: >>>> >>>> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8 >>>> >>>> The version 3 of the Generic Interrupt Controller (GICv3) is not >>>> supported in QEMU for some reason unknown to me. It would allow to >>>> increase the limit of CPUs and accelerate the MTTCG performance on a >>>> multiple core hypervisor. >>> >>> It is supported, you just need to select it. >>> >>>> I have got an idea to implement the Interrupt Translation Service (ITS) >>>> for using by MTTCG for ARM architecture. >>> >>> There is some work to support ITS under TCG already posted: >>> >>> Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation >>> Date: Thu, 29 Apr 2021 19:41:53 -0400 >>> Message-Id: <20210429234201.125565-1-shashi.mall...@linaro.org> >>> >>> please do review and test. >>> >>>> Do you find that idea useful and feasible? >>>> If yes, how much time do you estimate for such a project to complete by >>>> one developer? >>>> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU? >>> >>> As far as MTTCG performance is concerned there is a degree of >>> diminishing returns to be expected as the synchronisation cost between >>> threads will eventually outweigh the gains of additional threads. >>> >>> There are a number of parts that could improve this performance. The >>> first would be picking up the BQL reduction series from your FutureWei >>> colleges who worked on the problem when they were Linaro assignees: >>> >>> Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from >>> cpu_handle_interrupt/exception path >>> Date: Wed, 19 Aug 2020 14:28:49 -0400 >>> Message-Id: <20200819182856.4893-1-robert.fo...@linaro.org> >>> >>> There was also a longer series moving towards per-CPU locks: >>> >>> Subject: [PATCH v10 00/73] per-CPU locks >>> Date: Wed, 17 Jun 2020 17:01:18 -0400 >>> Message-Id: <20200617210231.4393-1-robert.fo...@linaro.org> >>> >>> I believe the initial measurements showed that the BQL cost started to >>> edge up with GIC interactions. We did discuss approaches for this and I >>> think one idea was use non-BQL locking for the GIC. You would need to >>> revert: >>> >>> Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field >>> Date: Thu, 6 Aug 2020 17:07:26 +0200 >>> Message-Id: <20200806150726.962-1-phi...@redhat.com> >>> >>> and then implement a more fine tuned locking in the GIC emulation >>> itself. However I think the BQL and per-CPU locks are lower hanging >>> fruit to tackle first. >>> >>>> >>>> Best regards, >>>> Andrey Shinkevich >>> >>> > >