Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
On 2024/2/19 下午3:16, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:18 PM maobibo wrote: On 2024/2/19 上午10:45, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: On LoongArch system, ipi hw uses iocsr registers, there is one iocsr register access on ipi sending, and two iocsr access on ipi receiving which is ipi interrupt handler. On VM mode all iocsr registers accessing will cause VM to trap into hypervisor. So with ipi hw notification once there will be three times of trap. This patch adds pv ipi support for VM, hypercall instruction is used to ipi sender, and hypervisor will inject SWI on the VM. During SWI interrupt handler, only estat CSR register is written to clear irq. Estat CSR register access will not trap into hypervisor. So with pv ipi supported, pv ipi sender will trap into hypervsor one time, pv ipi revicer will not trap, there is only one time of trap. Also this patch adds ipi multicast support, the method is similar with x86. With ipi multicast support, ipi notification can be sent to at most 128 vcpus at one time. It reduces trap times into hypervisor greatly. Signed-off-by: Bibo Mao --- arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/kvm_host.h | 1 + arch/loongarch/include/asm/kvm_para.h | 124 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/irq.c| 2 +- arch/loongarch/kernel/paravirt.c | 113 ++ arch/loongarch/kernel/smp.c| 2 +- arch/loongarch/kvm/exit.c | 73 ++- arch/loongarch/kvm/vcpu.c | 1 + 9 files changed, 314 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..8a611843c1f0 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t messages cacheline_aligned_in_smp; Do we really need atomic_t? A plain "unsigned int" can reduce cost significantly. For IPI, there are multiple senders and one receiver, the sender uses atomic_fetch_or(action, >messages) and the receiver uses atomic_xchg(>messages, 0) to clear message. There needs sync mechanism between senders and receiver, atomic is the most simple method. At least from receiver side, the native IPI doesn't need atomic for read and clear: static u32 ipi_read_clear(int cpu) { u32 action; /* Load the ipi register to figure out what we're supposed to do */ action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS); /* Clear the ipi register to clear the interrupt */ iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR); wbflush(); It is because on physical hardware it is two IOCSR registers and also there is no method to use atomic read and clear method for IOCSR registers. However if ipi message is stored on ddr memory, atomic read/clear can used. Your can compare price of one iocsr_read32 + one iocsr_write32 + wbflush with one atomic_xchg(>messages, 0) Regards Bibo Mao return action; } } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h index 57399d7cf8b7..1bf927e2bfac 100644 --- a/arch/loongarch/include/asm/kvm_host.h +++ b/arch/loongarch/include/asm/kvm_host.h @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { u64 idle_exits; u64 cpucfg_exits; u64 signal_exits; + u64 hvcl_exits; hypercall_exits is better. yeap, hypercall_exits is better, will fix in next version. }; #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 41200e922a82..a25a84e372b9 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -9,6 +9,10 @@ #define HYPERVISOR_VENDOR_SHIFT8 #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) +#define KVM_HC_CODE_SERVICE0 +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HC_CODE_SERVICE) +#define KVM_HC_FUNC_IPI 1 Change HC to HCALL is better. will modify in next version. + /* * LoongArch hypcall return code */ @@ -16,6 +20,126 @@ #define KVM_HC_INVALID_CODE-1UL #define KVM_HC_INVALID_PARAMETER -2UL +/* + * Hypercalls interface for KVM hypervisor + * + * a0: function identifier + * a1-a6: args + * Return value will be placed in v0. + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6. + */ +static __always_inline long kvm_hypercall(u64 fid) +{ + register long ret asm("v0"); +
Re: [syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok
On Sun, Feb 18, 2024 at 09:06:18PM -0800, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218 > kernel config: https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0 > dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) > 2.40 > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com > > Key type pkcs7_test registered > Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239) > io scheduler mq-deadline registered > io scheduler kyber registered > io scheduler bfq registered > input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 > ACPI: button: Power Button [PWRF] > input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1 > ACPI: button: Sleep Button [SLPF] > ioatdma: Intel(R) QuickData Technology Driver 5.00 > ACPI: \_SB_.LNKC: Enabled at IRQ 11 > virtio-pci :00:03.0: virtio_pci: leaving for legacy driver > ACPI: \_SB_.LNKD: Enabled at IRQ 10 > virtio-pci :00:04.0: virtio_pci: leaving for legacy driver > ACPI: \_SB_.LNKB: Enabled at IRQ 10 > virtio-pci :00:06.0: virtio_pci: leaving for legacy driver > virtio-pci :00:07.0: virtio_pci: leaving for legacy driver > N_HDLC line discipline registered with maxframe=4096 > Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A > 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A > 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A > 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A > Non-volatile memory driver v1.3 > Linux agpgart interface v0.103 > ACPI: bus type drm_connector registered > [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0 > [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1 > Console: switching to colour frame buffer device 128x48 > platform vkms: [drm] fb0: vkmsdrmfb frame buffer device > usbcore: registered new interface driver udl > brd: module loaded > loop: module loaded > zram: Added device: zram0 > null_blk: disk nullb0 created > null_blk: module loaded > Guest personality initialized and is inactive > VMCI host device registered (name=vmci, major=10, minor=118) > Initialized host personality > usbcore: registered new interface driver rtsx_usb > usbcore: registered new interface driver viperboard > usbcore: registered new interface driver dln2 > usbcore: registered new interface driver pn533_usb > nfcsim 0.2 initialized > usbcore: registered new interface driver port100 > usbcore: registered new interface driver nfcmrvl > Loading iSCSI transport class v2.0-870. > virtio_scsi virtio0: 1/0/0 default/read/poll queues > [ cut here ] > refcount_t: decrement hit 0; leaking memory. > WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 > lib/refcount.c:31 > Modules linked in: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/25/2024 > RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 > Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 c6 > 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb d9 e8 > 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7 > RSP: :c9066e10 EFLAGS: 00010246 > RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000 > RDX: RSI: RDI: > RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc > R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0 > R13: ea000502fdc8 R14: 1d4000a05fb9 R15: > FS: () GS:8880b940() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 88823000 CR3: 0df32000 CR4: 003506f0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > > reset_page_owner include/linux/page_owner.h:24 [inline] > free_pages_prepare mm/page_alloc.c:1140 [inline] > __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269 > make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847 > vring_alloc_queue drivers/virtio/virtio_ring.c:319
Re: [RFC PATCH v2 1/6] dt-bindings: mfd: add entry for Marvell 88PM886 PMIC
On 18/02/2024 16:10, Karel Balej wrote: > Rob Herring, 2024-02-15T08:20:52-06:00: >>> .../bindings/mfd/marvell,88pm88x.yaml | 74 +++ >> >> Filename should match the compatible. >> >> In general, drop the 'x' wildcard. > > By "in general", do you mean for the drivers code also? No, not driver. The rules for wildcard, that they are discouraged, are DT binding rules. > > As I have mentioned in the commit message for the driver, the other > device is very similar and if the support for it was ever to be added > (which I personally currently have no interest in), I believe it would > make sense to extend this driver. Is it then still prefered to call it > all just 88pm886 now? Extend the driver, it's unrelated. Binding still should be named like compatible, because that extension might never happen. > >>> +properties: >>> + compatible: >>> +const: marvell,88pm886-a1 > > So the file should be called marvell,88pm886-a1.yaml, correct? Again, is > it prefered to call it like this even if the other revision could > eventually be added (again, I am not interested in that right now If you already add two devices, flexible name would be fine. But you do not add it now and you might never add, so keep the filename=compatible. It is fine if it has also other compatibles later. We already accepted many bindings like that. Best regards, Krzysztof
Re: [PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll
On 18/02/2024 21:57, Luca Weiss wrote: > Follow the updated bindings and use a QCS404-specific compatible for the > HFPLL on this SoC. > > Signed-off-by: Luca Weiss > --- > Please note that this patch should only land after the patch for the > clock driver. > --- This patch should go in the next cycle, after clock driver is merged to mainline, to preserve bisectability. Best regards, Krzysztof
Re: [PATCH v2 1/3] dt-bindings: clock: qcom,hfpll: Convert to YAML
On 18/02/2024 21:57, Luca Weiss wrote: > Convert the .txt documentation to .yaml with some adjustments. > > * APQ8064/IPQ8064/MSM8960 compatibles are dropped since their HFPLLs are > a part of GCC so there is no need for a separate compat entry. > * Change the MSM8974 compatible to follow the updated naming schema. > Theis compatible is not used upstream yet. > * Add qcs404-hfpll. QCS404 currently uses qcom,hfpll. Mark that as > deprecated since every SoC appears to need different driver data so > "qcom,hfpll" makes no sense to keep > > Signed-off-by: Luca Weiss > --- Reviewed-by: Krzysztof Kozlowski Best regards, Krzysztof
Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
On Mon, Feb 19, 2024 at 12:18 PM maobibo wrote: > > > > On 2024/2/19 上午10:45, Huacai Chen wrote: > > Hi, Bibo, > > > > On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: > >> > >> On LoongArch system, ipi hw uses iocsr registers, there is one iocsr > >> register access on ipi sending, and two iocsr access on ipi receiving > >> which is ipi interrupt handler. On VM mode all iocsr registers > >> accessing will cause VM to trap into hypervisor. So with ipi hw > >> notification once there will be three times of trap. > >> > >> This patch adds pv ipi support for VM, hypercall instruction is used > >> to ipi sender, and hypervisor will inject SWI on the VM. During SWI > >> interrupt handler, only estat CSR register is written to clear irq. > >> Estat CSR register access will not trap into hypervisor. So with pv ipi > >> supported, pv ipi sender will trap into hypervsor one time, pv ipi > >> revicer will not trap, there is only one time of trap. > >> > >> Also this patch adds ipi multicast support, the method is similar with > >> x86. With ipi multicast support, ipi notification can be sent to at most > >> 128 vcpus at one time. It reduces trap times into hypervisor greatly. > >> > >> Signed-off-by: Bibo Mao > >> --- > >> arch/loongarch/include/asm/hardirq.h | 1 + > >> arch/loongarch/include/asm/kvm_host.h | 1 + > >> arch/loongarch/include/asm/kvm_para.h | 124 + > >> arch/loongarch/include/asm/loongarch.h | 1 + > >> arch/loongarch/kernel/irq.c| 2 +- > >> arch/loongarch/kernel/paravirt.c | 113 ++ > >> arch/loongarch/kernel/smp.c| 2 +- > >> arch/loongarch/kvm/exit.c | 73 ++- > >> arch/loongarch/kvm/vcpu.c | 1 + > >> 9 files changed, 314 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/loongarch/include/asm/hardirq.h > >> b/arch/loongarch/include/asm/hardirq.h > >> index 9f0038e19c7f..8a611843c1f0 100644 > >> --- a/arch/loongarch/include/asm/hardirq.h > >> +++ b/arch/loongarch/include/asm/hardirq.h > >> @@ -21,6 +21,7 @@ enum ipi_msg_type { > >> typedef struct { > >> unsigned int ipi_irqs[NR_IPI]; > >> unsigned int __softirq_pending; > >> + atomic_t messages cacheline_aligned_in_smp; > > Do we really need atomic_t? A plain "unsigned int" can reduce cost > > significantly. > For IPI, there are multiple senders and one receiver, the sender uses > atomic_fetch_or(action, >messages) and the receiver uses > atomic_xchg(>messages, 0) to clear message. > > There needs sync mechanism between senders and receiver, atomic is the > most simple method. At least from receiver side, the native IPI doesn't need atomic for read and clear: static u32 ipi_read_clear(int cpu) { u32 action; /* Load the ipi register to figure out what we're supposed to do */ action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS); /* Clear the ipi register to clear the interrupt */ iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR); wbflush(); return action; } > > > >> } cacheline_aligned irq_cpustat_t; > >> > >> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > >> diff --git a/arch/loongarch/include/asm/kvm_host.h > >> b/arch/loongarch/include/asm/kvm_host.h > >> index 57399d7cf8b7..1bf927e2bfac 100644 > >> --- a/arch/loongarch/include/asm/kvm_host.h > >> +++ b/arch/loongarch/include/asm/kvm_host.h > >> @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { > >> u64 idle_exits; > >> u64 cpucfg_exits; > >> u64 signal_exits; > >> + u64 hvcl_exits; > > hypercall_exits is better. > yeap, hypercall_exits is better, will fix in next version. > > > >> }; > >> > >> #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) > >> diff --git a/arch/loongarch/include/asm/kvm_para.h > >> b/arch/loongarch/include/asm/kvm_para.h > >> index 41200e922a82..a25a84e372b9 100644 > >> --- a/arch/loongarch/include/asm/kvm_para.h > >> +++ b/arch/loongarch/include/asm/kvm_para.h > >> @@ -9,6 +9,10 @@ > >> #define HYPERVISOR_VENDOR_SHIFT8 > >> #define HYPERCALL_CODE(vendor, code) ((vendor << > >> HYPERVISOR_VENDOR_SHIFT) + code) > >> > >> +#define KVM_HC_CODE_SERVICE0 > >> +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, > >> KVM_HC_CODE_SERVICE) > >> +#define KVM_HC_FUNC_IPI 1 > > Change HC to HCALL is better. > will modify in next version. > > > >> + > >> /* > >>* LoongArch hypcall return code > >>*/ > >> @@ -16,6 +20,126 @@ > >> #define KVM_HC_INVALID_CODE-1UL > >> #define KVM_HC_INVALID_PARAMETER -2UL > >> > >> +/* > >> + * Hypercalls interface for KVM hypervisor > >> + * > >> + * a0: function identifier > >> + * a1-a6: args > >> + * Return value will be placed in v0. > >> + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6. > >> + */ > >> +static __always_inline long kvm_hypercall(u64 fid) > >>
[syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok
Hello, syzbot found the following issue on: HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218 kernel config: https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0 dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz kernel image: https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com Key type pkcs7_test registered Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239) io scheduler mq-deadline registered io scheduler kyber registered io scheduler bfq registered input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 ACPI: button: Power Button [PWRF] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1 ACPI: button: Sleep Button [SLPF] ioatdma: Intel(R) QuickData Technology Driver 5.00 ACPI: \_SB_.LNKC: Enabled at IRQ 11 virtio-pci :00:03.0: virtio_pci: leaving for legacy driver ACPI: \_SB_.LNKD: Enabled at IRQ 10 virtio-pci :00:04.0: virtio_pci: leaving for legacy driver ACPI: \_SB_.LNKB: Enabled at IRQ 10 virtio-pci :00:06.0: virtio_pci: leaving for legacy driver virtio-pci :00:07.0: virtio_pci: leaving for legacy driver N_HDLC line discipline registered with maxframe=4096 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A Non-volatile memory driver v1.3 Linux agpgart interface v0.103 ACPI: bus type drm_connector registered [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0 [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1 Console: switching to colour frame buffer device 128x48 platform vkms: [drm] fb0: vkmsdrmfb frame buffer device usbcore: registered new interface driver udl brd: module loaded loop: module loaded zram: Added device: zram0 null_blk: disk nullb0 created null_blk: module loaded Guest personality initialized and is inactive VMCI host device registered (name=vmci, major=10, minor=118) Initialized host personality usbcore: registered new interface driver rtsx_usb usbcore: registered new interface driver viperboard usbcore: registered new interface driver dln2 usbcore: registered new interface driver pn533_usb nfcsim 0.2 initialized usbcore: registered new interface driver port100 usbcore: registered new interface driver nfcmrvl Loading iSCSI transport class v2.0-870. virtio_scsi virtio0: 1/0/0 default/read/poll queues [ cut here ] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 c6 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb d9 e8 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7 RSP: :c9066e10 EFLAGS: 00010246 RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000 RDX: RSI: RDI: RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0 R13: ea000502fdc8 R14: 1d4000a05fb9 R15: FS: () GS:8880b940() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 88823000 CR3: 0df32000 CR4: 003506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1140 [inline] __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269 make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847 vring_alloc_queue drivers/virtio/virtio_ring.c:319 [inline] vring_alloc_queue_split+0x20a/0x600 drivers/virtio/virtio_ring.c:1108 vring_create_virtqueue_split+0xc6/0x310 drivers/virtio/virtio_ring.c:1158 vring_create_virtqueue+0xca/0x110 drivers/virtio/virtio_ring.c:2683 setup_vq+0xe9/0x2d0
Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
On 2024/2/19 上午10:45, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: On LoongArch system, ipi hw uses iocsr registers, there is one iocsr register access on ipi sending, and two iocsr access on ipi receiving which is ipi interrupt handler. On VM mode all iocsr registers accessing will cause VM to trap into hypervisor. So with ipi hw notification once there will be three times of trap. This patch adds pv ipi support for VM, hypercall instruction is used to ipi sender, and hypervisor will inject SWI on the VM. During SWI interrupt handler, only estat CSR register is written to clear irq. Estat CSR register access will not trap into hypervisor. So with pv ipi supported, pv ipi sender will trap into hypervsor one time, pv ipi revicer will not trap, there is only one time of trap. Also this patch adds ipi multicast support, the method is similar with x86. With ipi multicast support, ipi notification can be sent to at most 128 vcpus at one time. It reduces trap times into hypervisor greatly. Signed-off-by: Bibo Mao --- arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/kvm_host.h | 1 + arch/loongarch/include/asm/kvm_para.h | 124 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/irq.c| 2 +- arch/loongarch/kernel/paravirt.c | 113 ++ arch/loongarch/kernel/smp.c| 2 +- arch/loongarch/kvm/exit.c | 73 ++- arch/loongarch/kvm/vcpu.c | 1 + 9 files changed, 314 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..8a611843c1f0 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t messages cacheline_aligned_in_smp; Do we really need atomic_t? A plain "unsigned int" can reduce cost significantly. For IPI, there are multiple senders and one receiver, the sender uses atomic_fetch_or(action, >messages) and the receiver uses atomic_xchg(>messages, 0) to clear message. There needs sync mechanism between senders and receiver, atomic is the most simple method. } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h index 57399d7cf8b7..1bf927e2bfac 100644 --- a/arch/loongarch/include/asm/kvm_host.h +++ b/arch/loongarch/include/asm/kvm_host.h @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { u64 idle_exits; u64 cpucfg_exits; u64 signal_exits; + u64 hvcl_exits; hypercall_exits is better. yeap, hypercall_exits is better, will fix in next version. }; #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 41200e922a82..a25a84e372b9 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -9,6 +9,10 @@ #define HYPERVISOR_VENDOR_SHIFT8 #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) +#define KVM_HC_CODE_SERVICE0 +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HC_CODE_SERVICE) +#define KVM_HC_FUNC_IPI 1 Change HC to HCALL is better. will modify in next version. + /* * LoongArch hypcall return code */ @@ -16,6 +20,126 @@ #define KVM_HC_INVALID_CODE-1UL #define KVM_HC_INVALID_PARAMETER -2UL +/* + * Hypercalls interface for KVM hypervisor + * + * a0: function identifier + * a1-a6: args + * Return value will be placed in v0. + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6. + */ +static __always_inline long kvm_hypercall(u64 fid) +{ + register long ret asm("v0"); + register unsigned long fun asm("a0") = fid; + + __asm__ __volatile__( + "hvcl "__stringify(KVM_HC_SERVICE) + : "=r" (ret) + : "r" (fun) + : "memory" + ); + + return ret; +} + +static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0) +{ + register long ret asm("v0"); + register unsigned long fun asm("a0") = fid; + register unsigned long a1 asm("a1") = arg0; + + __asm__ __volatile__( + "hvcl "__stringify(KVM_HC_SERVICE) + : "=r" (ret) + : "r" (fun), "r" (a1) + : "memory" + ); + + return ret; +} + +static __always_inline long kvm_hypercall2(u64 fid, + unsigned long arg0, unsigned long arg1) +{ + register long ret asm("v0"); + register unsigned long fun asm("a0") = fid; + register
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..21d01d05791a --- /dev/null
Re: [PATCH v4 2/6] LoongArch: KVM: Add hypercall instruction emulation support
On 2024/2/19 上午10:41, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: On LoongArch system, hypercall instruction is supported when system runs on VM mode. This patch adds dummy function with hypercall instruction emulation, rather than inject EXCCODE_INE invalid instruction exception. Signed-off-by: Bibo Mao --- arch/loongarch/include/asm/Kbuild | 1 - arch/loongarch/include/asm/kvm_para.h | 26 ++ arch/loongarch/include/uapi/asm/Kbuild | 2 -- arch/loongarch/kvm/exit.c | 10 ++ 4 files changed, 36 insertions(+), 3 deletions(-) create mode 100644 arch/loongarch/include/asm/kvm_para.h delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild index 93783fa24f6e..22991a6f0e2b 100644 --- a/arch/loongarch/include/asm/Kbuild +++ b/arch/loongarch/include/asm/Kbuild @@ -23,4 +23,3 @@ generic-y += poll.h generic-y += param.h generic-y += posix_types.h generic-y += resource.h -generic-y += kvm_para.h diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h new file mode 100644 index ..9425d3b7e486 --- /dev/null +++ b/arch/loongarch/include/asm/kvm_para.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_KVM_PARA_H +#define _ASM_LOONGARCH_KVM_PARA_H + +/* + * LoongArch hypcall return code Maybe using "hypercall" in comments is better. will modify in next patch. + */ +#define KVM_HC_STATUS_SUCCESS 0 +#define KVM_HC_INVALID_CODE-1UL +#define KVM_HC_INVALID_PARAMETER -2UL Maybe KVM_HCALL_SUCCESS/KVM_HCALL_INVALID_CODE/KVM_HCALL_PARAMETER is better. yes, KVM_HCALL_ sounds better. Will modify it. Regards Bibo Mao Huacai + +static inline unsigned int kvm_arch_para_features(void) +{ + return 0; +} + +static inline unsigned int kvm_arch_para_hints(void) +{ + return 0; +} + +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} +#endif /* _ASM_LOONGARCH_KVM_PARA_H */ diff --git a/arch/loongarch/include/uapi/asm/Kbuild b/arch/loongarch/include/uapi/asm/Kbuild deleted file mode 100644 index 4aa680ca2e5f.. --- a/arch/loongarch/include/uapi/asm/Kbuild +++ /dev/null @@ -1,2 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0 -generic-y += kvm_para.h diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c index ed1d89d53e2e..d15c71320a11 100644 --- a/arch/loongarch/kvm/exit.c +++ b/arch/loongarch/kvm/exit.c @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu) return RESUME_GUEST; } +static int kvm_handle_hypcall(struct kvm_vcpu *vcpu) +{ + update_pc(>arch); + + /* Treat it as noop intruction, only set return value */ + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE; + return RESUME_GUEST; +} + /* * LoongArch KVM callback handling for unimplemented guest exiting */ @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = { [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled, [EXCCODE_LASXDIS] = kvm_handle_lasx_disabled, [EXCCODE_GSPR] = kvm_handle_gspr, + [EXCCODE_HVC] = kvm_handle_hypcall, }; int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault) -- 2.39.3
Re: [PATCH v4 1/6] LoongArch/smp: Refine ipi ops on LoongArch platform
Huacai, Thanks for your reviewing, I reply inline. On 2024/2/19 上午10:39, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: This patch refines ipi handling on LoongArch platform, there are three changes with this patch. 1. Add generic get_percpu_irq() api, replace some percpu irq functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq(). 2. Change parameter action definition with function loongson_send_ipi_single() and loongson_send_ipi_mask(). Normal decimal encoding is used rather than binary bitmap encoding for ipi action, ipi hw sender uses devimal action code, and ipi receiver will get binary bitmap encoding, the ipi hw will convert it into bitmap in ipi message buffer. What is "devimal" here? Maybe decimal? yeap, it should be decimal. 3. Add structure smp_ops on LoongArch platform so that pv ipi can be used later. Signed-off-by: Bibo Mao --- arch/loongarch/include/asm/hardirq.h | 4 ++ arch/loongarch/include/asm/irq.h | 10 - arch/loongarch/include/asm/smp.h | 31 +++ arch/loongarch/kernel/irq.c | 22 +-- arch/loongarch/kernel/perf_event.c | 14 +-- arch/loongarch/kernel/smp.c | 58 +++- arch/loongarch/kernel/time.c | 12 +- 7 files changed, 71 insertions(+), 80 deletions(-) diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 0ef3b18f8980..9f0038e19c7f 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -12,6 +12,10 @@ extern void ack_bad_irq(unsigned int irq); #define ack_bad_irq ack_bad_irq +enum ipi_msg_type { + IPI_RESCHEDULE, + IPI_CALL_FUNCTION, +}; #define NR_IPI 2 typedef struct { diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h index 218b4da0ea90..00101b6d601e 100644 --- a/arch/loongarch/include/asm/irq.h +++ b/arch/loongarch/include/asm/irq.h @@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle; extern struct fwnode_handle *pch_lpc_handle; extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS]; -extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev); +static inline int get_percpu_irq(int vector) +{ + struct irq_domain *d; + + d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY); + if (d) + return irq_create_mapping(d, vector); + return -EINVAL; +} #include #endif /* _ASM_IRQ_H */ diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h index f81e5f01d619..8a42632b038a 100644 --- a/arch/loongarch/include/asm/smp.h +++ b/arch/loongarch/include/asm/smp.h @@ -12,6 +12,13 @@ #include #include +struct smp_ops { + void (*init_ipi)(void); + void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action); + void (*send_ipi_single)(int cpu, unsigned int action); +}; + +extern struct smp_ops smp_ops; extern int smp_num_siblings; extern int num_processors; extern int disabled_cpus; @@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus); void loongson_boot_secondary(int cpu, struct task_struct *idle); void loongson_init_secondary(void); void loongson_smp_finish(void); -void loongson_send_ipi_single(int cpu, unsigned int action); -void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action); #ifdef CONFIG_HOTPLUG_CPU int loongson_cpu_disable(void); void loongson_cpu_die(unsigned int cpu); @@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS]; #define cpu_physical_id(cpu) cpu_logical_map(cpu) -#define SMP_BOOT_CPU 0x1 -#define SMP_RESCHEDULE 0x2 -#define SMP_CALL_FUNCTION 0x4 +#define ACTTION_BOOT_CPU 0 +#define ACTTION_RESCHEDULE 1 +#define ACTTION_CALL_FUNCTION 2 ACTTION? ACTION? it should be ACTION_xxx, will refresh it in next patch. Regards Bibo Mao Huacai +#define SMP_BOOT_CPU BIT(ACTTION_BOOT_CPU) +#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE) +#define SMP_CALL_FUNCTION BIT(ACTTION_CALL_FUNCTION) struct secondary_data { unsigned long stack; @@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data; extern asmlinkage void smpboot_entry(void); extern asmlinkage void start_secondary(void); - +extern void arch_send_call_function_single_ipi(int cpu); +extern void arch_send_call_function_ipi_mask(const struct cpumask *mask); extern void calculate_cpu_foreign_map(void); /* @@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void); */ extern void show_ipi_list(struct seq_file *p, int prec); -static inline void arch_send_call_function_single_ipi(int cpu) -{ - loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION); -} - -static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask) -{ - loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION); -} - #ifdef CONFIG_HOTPLUG_CPU static inline int
Re: [PATCH v18 2/3] vfio/pci: rename and export range_intersect_range
>> + >> +/** >> + * vfio_pci_core_range_intersect_range() - Determine overlap between a >> buffer >> + * and register offset ranges. >> + * @buf_start: start offset of the buffer >> + * @buf_cnt: number of buffer bytes. > > You could drop the '.' at the end to be consistent with the other. Ok, will make it consistent. >> +bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt, >> + loff_t reg_start, size_t reg_cnt, >> + loff_t *buf_offset, >> + size_t *intersect_count, >> + size_t *register_offset); >> #define VFIO_IOWRITE_DECLATION(size) \ >> int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \ >> bool test_mem, u##size val, void __iomem *io); > > Reviewed-by: Yishai Hadas Thanks
Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
Thanks Kevin and Yishai for the reviews. Comments inline. >> +static int nvgrace_gpu_mmap(struct vfio_device *core_vdev, >> + struct vm_area_struct *vma) >> +{ >> + struct nvgrace_gpu_pci_core_device *nvdev = >> + container_of(core_vdev, struct nvgrace_gpu_pci_core_device, >> + core_device.vdev); > > No need for a new line here. Ack. >> +static ssize_t >> +nvgrace_gpu_read_mem(struct nvgrace_gpu_pci_core_device *nvdev, >> + char __user *buf, size_t count, loff_t *ppos) >> +{ >> + u64 offset = *ppos & VFIO_PCI_OFFSET_MASK; >> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); >> + struct mem_region *memregion; >> + size_t mem_count, i; >> + u8 val = 0xFF; >> + int ret; >> + >> + memregion = nvgrace_gpu_memregion(index, nvdev); >> + if (!memregion) > > Can that happen ? it was just tested by the caller. Ok, I can remove it. Will put a comment instead that this has been checked. >> + /* >> + * Determine how many bytes to be actually read from the device memory. >> + * Read request beyond the actual device memory size is filled with ~0, >> + * while those beyond the actual reported size is skipped. >> + */ >> + if (offset >= memregion->memlength) >> + mem_count = 0; >> + else >> + mem_count = min(count, memregion->memlength - (size_t)offset); >> + >> + ret = nvgrace_gpu_map_and_read(nvdev, buf, mem_count, ppos); >> + if (ret) >> + return ret; >> + >> + /* >> + * Only the device memory present on the hardware is mapped, which may >> + * not be power-of-2 aligned. A read to an offset beyond the device >> memory >> + * size is filled with ~0. >> + */ >> + for (i = mem_count; i < count; i++) >> + put_user(val, (unsigned char __user *)(buf + i)); > > Did you condier a failure here ? Yeah, that has to be checked here. Will make the change in the next post. >> +/* >> + * Write count bytes to the device memory at a given offset. The actual >> device >> + * memory size (available) may not be a power-of-2. So the driver fakes the >> + * size to a power-of-2 (reported) when exposing to a user space driver. >> + * >> + * Writes extending beyond the reported size are truncated; writes starting >> + * beyond the reported size generate -EINVAL. >> + */ >> +static ssize_t >> +nvgrace_gpu_write_mem(struct nvgrace_gpu_pci_core_device *nvdev, >> + size_t count, loff_t *ppos, const char __user *buf) >> +{ >> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); >> + u64 offset = *ppos & VFIO_PCI_OFFSET_MASK; >> + struct mem_region *memregion; >> + size_t mem_count; >> + int ret = 0; >> + >> + memregion = nvgrace_gpu_memregion(index, nvdev); >> + if (!memregion) > > Same as the above note in nvgrace_gpu_read_mem(). Ack. >> +static const struct vfio_device_ops nvgrace_gpu_pci_ops = { >> + .name = "nvgrace-gpu-vfio-pci", >> + .init = vfio_pci_core_init_dev, >> + .release = vfio_pci_core_release_dev, >> + .open_device = nvgrace_gpu_open_device, >> + .close_device = nvgrace_gpu_close_device, >> + .ioctl = nvgrace_gpu_ioctl, >> + .read = nvgrace_gpu_read, >> + .write = nvgrace_gpu_write, >> + .mmap = nvgrace_gpu_mmap, >> + .request = vfio_pci_core_request, >> + .match = vfio_pci_core_match, >> + .bind_iommufd = vfio_iommufd_physical_bind, >> + .unbind_iommufd = vfio_iommufd_physical_unbind, >> + .attach_ioas = vfio_iommufd_physical_attach_ioas, >> + .detach_ioas = vfio_iommufd_physical_detach_ioas, >> +}; >> + >> +static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = { >> + .name = "nvgrace-gpu-vfio-pci-core", >> + .init = vfio_pci_core_init_dev, >> + .release = vfio_pci_core_release_dev, >> + .open_device = nvgrace_gpu_open_device, >> + .close_device = vfio_pci_core_close_device, >> + .ioctl = vfio_pci_core_ioctl, >> + .device_feature = vfio_pci_core_ioctl_feature, > > This entry is missing above as part of nvgrace_gpu_pci_ops. Yes. Will add. >> + .read = vfio_pci_core_read, >> + .write = vfio_pci_core_write, >> + .mmap = vfio_pci_core_mmap, >> + .request = vfio_pci_core_request, >> + .match = vfio_pci_core_match, >> + .bind_iommufd = vfio_iommufd_physical_bind, >> + .unbind_iommufd = vfio_iommufd_physical_unbind, >> + .attach_ioas = vfio_iommufd_physical_attach_ioas, >> + .detach_ioas = vfio_iommufd_physical_detach_ioas, >> +}; >> + >> +static struct >> +nvgrace_gpu_pci_core_device *nvgrace_gpu_drvdata(struct pci_dev *pdev) >> +{ >> + struct vfio_pci_core_device *core_device = dev_get_drvdata(>dev); >> + >> + return
Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
Hi, Bibo, On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: > > On LoongArch system, ipi hw uses iocsr registers, there is one iocsr > register access on ipi sending, and two iocsr access on ipi receiving > which is ipi interrupt handler. On VM mode all iocsr registers > accessing will cause VM to trap into hypervisor. So with ipi hw > notification once there will be three times of trap. > > This patch adds pv ipi support for VM, hypercall instruction is used > to ipi sender, and hypervisor will inject SWI on the VM. During SWI > interrupt handler, only estat CSR register is written to clear irq. > Estat CSR register access will not trap into hypervisor. So with pv ipi > supported, pv ipi sender will trap into hypervsor one time, pv ipi > revicer will not trap, there is only one time of trap. > > Also this patch adds ipi multicast support, the method is similar with > x86. With ipi multicast support, ipi notification can be sent to at most > 128 vcpus at one time. It reduces trap times into hypervisor greatly. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/include/asm/hardirq.h | 1 + > arch/loongarch/include/asm/kvm_host.h | 1 + > arch/loongarch/include/asm/kvm_para.h | 124 + > arch/loongarch/include/asm/loongarch.h | 1 + > arch/loongarch/kernel/irq.c| 2 +- > arch/loongarch/kernel/paravirt.c | 113 ++ > arch/loongarch/kernel/smp.c| 2 +- > arch/loongarch/kvm/exit.c | 73 ++- > arch/loongarch/kvm/vcpu.c | 1 + > 9 files changed, 314 insertions(+), 4 deletions(-) > > diff --git a/arch/loongarch/include/asm/hardirq.h > b/arch/loongarch/include/asm/hardirq.h > index 9f0038e19c7f..8a611843c1f0 100644 > --- a/arch/loongarch/include/asm/hardirq.h > +++ b/arch/loongarch/include/asm/hardirq.h > @@ -21,6 +21,7 @@ enum ipi_msg_type { > typedef struct { > unsigned int ipi_irqs[NR_IPI]; > unsigned int __softirq_pending; > + atomic_t messages cacheline_aligned_in_smp; Do we really need atomic_t? A plain "unsigned int" can reduce cost significantly. > } cacheline_aligned irq_cpustat_t; > > DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > diff --git a/arch/loongarch/include/asm/kvm_host.h > b/arch/loongarch/include/asm/kvm_host.h > index 57399d7cf8b7..1bf927e2bfac 100644 > --- a/arch/loongarch/include/asm/kvm_host.h > +++ b/arch/loongarch/include/asm/kvm_host.h > @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { > u64 idle_exits; > u64 cpucfg_exits; > u64 signal_exits; > + u64 hvcl_exits; hypercall_exits is better. > }; > > #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index 41200e922a82..a25a84e372b9 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -9,6 +9,10 @@ > #define HYPERVISOR_VENDOR_SHIFT8 > #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) > + code) > > +#define KVM_HC_CODE_SERVICE0 > +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, > KVM_HC_CODE_SERVICE) > +#define KVM_HC_FUNC_IPI 1 Change HC to HCALL is better. > + > /* > * LoongArch hypcall return code > */ > @@ -16,6 +20,126 @@ > #define KVM_HC_INVALID_CODE-1UL > #define KVM_HC_INVALID_PARAMETER -2UL > > +/* > + * Hypercalls interface for KVM hypervisor > + * > + * a0: function identifier > + * a1-a6: args > + * Return value will be placed in v0. > + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6. > + */ > +static __always_inline long kvm_hypercall(u64 fid) > +{ > + register long ret asm("v0"); > + register unsigned long fun asm("a0") = fid; > + > + __asm__ __volatile__( > + "hvcl "__stringify(KVM_HC_SERVICE) > + : "=r" (ret) > + : "r" (fun) > + : "memory" > + ); > + > + return ret; > +} > + > +static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0) > +{ > + register long ret asm("v0"); > + register unsigned long fun asm("a0") = fid; > + register unsigned long a1 asm("a1") = arg0; > + > + __asm__ __volatile__( > + "hvcl "__stringify(KVM_HC_SERVICE) > + : "=r" (ret) > + : "r" (fun), "r" (a1) > + : "memory" > + ); > + > + return ret; > +} > + > +static __always_inline long kvm_hypercall2(u64 fid, > + unsigned long arg0, unsigned long arg1) > +{ > + register long ret asm("v0"); > + register unsigned long fun asm("a0") = fid; > + register unsigned long a1 asm("a1") = arg0; > + register unsigned long a2 asm("a2") = arg1; > + > + __asm__ __volatile__( > + "hvcl "__stringify(KVM_HC_SERVICE) > +
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > > The patch adds paravirt interface for guest kernel, function > pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > runs on VM mode, it will call function kvm_para_available() to detect > whether current VMM is KVM hypervisor. And the paravirt function can work > only if current VMM is KVM hypervisor, since there is only KVM hypervisor > supported on LoongArch now. > > This patch only adds paravirt interface for guest kernel, however there > is not effective pv functions added here. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 > arch/loongarch/include/asm/kvm_para.h | 7 > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/paravirt.c | 41 +++ > arch/loongarch/kernel/setup.c | 2 + > 7 files changed, 88 insertions(+) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 10959e6c3583..817a56dff80f 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance significantly > + over full virtualization. However, when run without a hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index 9425d3b7e486..41200e922a82 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -2,6 +2,13 @@ > #ifndef _ASM_LOONGARCH_KVM_PARA_H > #define _ASM_LOONGARCH_KVM_PARA_H > > +/* > + * Hypcall code field > + */ > +#define HYPERVISOR_KVM 1 > +#define HYPERVISOR_VENDOR_SHIFT8 > +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) > + code) > + > /* > * LoongArch hypcall return code > */ > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..b64813592ba0 > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} The steal time code can be removed in this patch, I think. > + > +int pv_guest_init(void); > +#else > +static inline int pv_guest_init(void) > +{ > + return 0; > +} > + > +#endif // CONFIG_PARAVIRT > +#endif > diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > b/arch/loongarch/include/asm/paravirt_api_clock.h > new file mode 100644 > index ..65ac7cee0dad > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt_api_clock.h > @@ -0,0 +1 @@ > +#include > diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile > index 3c808c680370..662e6e9de12d 100644 > --- a/arch/loongarch/kernel/Makefile > +++ b/arch/loongarch/kernel/Makefile > @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-$(CONFIG_PROC_FS) += proc.o > +obj-$(CONFIG_PARAVIRT) += paravirt.o > > obj-$(CONFIG_SMP) += smp.o > > diff --git a/arch/loongarch/kernel/paravirt.c > b/arch/loongarch/kernel/paravirt.c > new file mode 100644 > index ..21d01d05791a > --- /dev/null > +++ b/arch/loongarch/kernel/paravirt.c > @@ -0,0 +1,41 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct static_key paravirt_steal_enabled; > +struct static_key paravirt_steal_rq_enabled; > + > +static u64 native_steal_clock(int cpu) > +{ > + return 0; > +} > + > +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); The steal time code can be removed in this patch, I think. > + > +static bool kvm_para_available(void) > +{ > + static int
Re: [PATCH v4 2/6] LoongArch: KVM: Add hypercall instruction emulation support
Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > > On LoongArch system, hypercall instruction is supported when system > runs on VM mode. This patch adds dummy function with hypercall > instruction emulation, rather than inject EXCCODE_INE invalid > instruction exception. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/include/asm/Kbuild | 1 - > arch/loongarch/include/asm/kvm_para.h | 26 ++ > arch/loongarch/include/uapi/asm/Kbuild | 2 -- > arch/loongarch/kvm/exit.c | 10 ++ > 4 files changed, 36 insertions(+), 3 deletions(-) > create mode 100644 arch/loongarch/include/asm/kvm_para.h > delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild > > diff --git a/arch/loongarch/include/asm/Kbuild > b/arch/loongarch/include/asm/Kbuild > index 93783fa24f6e..22991a6f0e2b 100644 > --- a/arch/loongarch/include/asm/Kbuild > +++ b/arch/loongarch/include/asm/Kbuild > @@ -23,4 +23,3 @@ generic-y += poll.h > generic-y += param.h > generic-y += posix_types.h > generic-y += resource.h > -generic-y += kvm_para.h > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > new file mode 100644 > index ..9425d3b7e486 > --- /dev/null > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -0,0 +1,26 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_KVM_PARA_H > +#define _ASM_LOONGARCH_KVM_PARA_H > + > +/* > + * LoongArch hypcall return code Maybe using "hypercall" in comments is better. > + */ > +#define KVM_HC_STATUS_SUCCESS 0 > +#define KVM_HC_INVALID_CODE-1UL > +#define KVM_HC_INVALID_PARAMETER -2UL Maybe KVM_HCALL_SUCCESS/KVM_HCALL_INVALID_CODE/KVM_HCALL_PARAMETER is better. Huacai > + > +static inline unsigned int kvm_arch_para_features(void) > +{ > + return 0; > +} > + > +static inline unsigned int kvm_arch_para_hints(void) > +{ > + return 0; > +} > + > +static inline bool kvm_check_and_clear_guest_paused(void) > +{ > + return false; > +} > +#endif /* _ASM_LOONGARCH_KVM_PARA_H */ > diff --git a/arch/loongarch/include/uapi/asm/Kbuild > b/arch/loongarch/include/uapi/asm/Kbuild > deleted file mode 100644 > index 4aa680ca2e5f.. > --- a/arch/loongarch/include/uapi/asm/Kbuild > +++ /dev/null > @@ -1,2 +0,0 @@ > -# SPDX-License-Identifier: GPL-2.0 > -generic-y += kvm_para.h > diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c > index ed1d89d53e2e..d15c71320a11 100644 > --- a/arch/loongarch/kvm/exit.c > +++ b/arch/loongarch/kvm/exit.c > @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu > *vcpu) > return RESUME_GUEST; > } > > +static int kvm_handle_hypcall(struct kvm_vcpu *vcpu) > +{ > + update_pc(>arch); > + > + /* Treat it as noop intruction, only set return value */ > + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE; > + return RESUME_GUEST; > +} > + > /* > * LoongArch KVM callback handling for unimplemented guest exiting > */ > @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] > = { > [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled, > [EXCCODE_LASXDIS] = kvm_handle_lasx_disabled, > [EXCCODE_GSPR] = kvm_handle_gspr, > + [EXCCODE_HVC] = kvm_handle_hypcall, > }; > > int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault) > -- > 2.39.3 >
Re: [PATCH v4 1/6] LoongArch/smp: Refine ipi ops on LoongArch platform
Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > > This patch refines ipi handling on LoongArch platform, there are > three changes with this patch. > 1. Add generic get_percpu_irq() api, replace some percpu irq functions > such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq(). > > 2. Change parameter action definition with function > loongson_send_ipi_single() and loongson_send_ipi_mask(). Normal decimal > encoding is used rather than binary bitmap encoding for ipi action, ipi > hw sender uses devimal action code, and ipi receiver will get binary bitmap > encoding, the ipi hw will convert it into bitmap in ipi message buffer. What is "devimal" here? Maybe decimal? > > 3. Add structure smp_ops on LoongArch platform so that pv ipi can be used > later. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/include/asm/hardirq.h | 4 ++ > arch/loongarch/include/asm/irq.h | 10 - > arch/loongarch/include/asm/smp.h | 31 +++ > arch/loongarch/kernel/irq.c | 22 +-- > arch/loongarch/kernel/perf_event.c | 14 +-- > arch/loongarch/kernel/smp.c | 58 +++- > arch/loongarch/kernel/time.c | 12 +- > 7 files changed, 71 insertions(+), 80 deletions(-) > > diff --git a/arch/loongarch/include/asm/hardirq.h > b/arch/loongarch/include/asm/hardirq.h > index 0ef3b18f8980..9f0038e19c7f 100644 > --- a/arch/loongarch/include/asm/hardirq.h > +++ b/arch/loongarch/include/asm/hardirq.h > @@ -12,6 +12,10 @@ > extern void ack_bad_irq(unsigned int irq); > #define ack_bad_irq ack_bad_irq > > +enum ipi_msg_type { > + IPI_RESCHEDULE, > + IPI_CALL_FUNCTION, > +}; > #define NR_IPI 2 > > typedef struct { > diff --git a/arch/loongarch/include/asm/irq.h > b/arch/loongarch/include/asm/irq.h > index 218b4da0ea90..00101b6d601e 100644 > --- a/arch/loongarch/include/asm/irq.h > +++ b/arch/loongarch/include/asm/irq.h > @@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle; > extern struct fwnode_handle *pch_lpc_handle; > extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS]; > > -extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev); > +static inline int get_percpu_irq(int vector) > +{ > + struct irq_domain *d; > + > + d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY); > + if (d) > + return irq_create_mapping(d, vector); > > + return -EINVAL; > +} > #include > > #endif /* _ASM_IRQ_H */ > diff --git a/arch/loongarch/include/asm/smp.h > b/arch/loongarch/include/asm/smp.h > index f81e5f01d619..8a42632b038a 100644 > --- a/arch/loongarch/include/asm/smp.h > +++ b/arch/loongarch/include/asm/smp.h > @@ -12,6 +12,13 @@ > #include > #include > > +struct smp_ops { > + void (*init_ipi)(void); > + void (*send_ipi_mask)(const struct cpumask *mask, unsigned int > action); > + void (*send_ipi_single)(int cpu, unsigned int action); > +}; > + > +extern struct smp_ops smp_ops; > extern int smp_num_siblings; > extern int num_processors; > extern int disabled_cpus; > @@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus); > void loongson_boot_secondary(int cpu, struct task_struct *idle); > void loongson_init_secondary(void); > void loongson_smp_finish(void); > -void loongson_send_ipi_single(int cpu, unsigned int action); > -void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action); > #ifdef CONFIG_HOTPLUG_CPU > int loongson_cpu_disable(void); > void loongson_cpu_die(unsigned int cpu); > @@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS]; > > #define cpu_physical_id(cpu) cpu_logical_map(cpu) > > -#define SMP_BOOT_CPU 0x1 > -#define SMP_RESCHEDULE 0x2 > -#define SMP_CALL_FUNCTION 0x4 > +#define ACTTION_BOOT_CPU 0 > +#define ACTTION_RESCHEDULE 1 > +#define ACTTION_CALL_FUNCTION 2 ACTTION? ACTION? Huacai > +#define SMP_BOOT_CPU BIT(ACTTION_BOOT_CPU) > +#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE) > +#define SMP_CALL_FUNCTION BIT(ACTTION_CALL_FUNCTION) > > struct secondary_data { > unsigned long stack; > @@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data; > > extern asmlinkage void smpboot_entry(void); > extern asmlinkage void start_secondary(void); > - > +extern void arch_send_call_function_single_ipi(int cpu); > +extern void arch_send_call_function_ipi_mask(const struct cpumask *mask); > extern void calculate_cpu_foreign_map(void); > > /* > @@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void); > */ > extern void show_ipi_list(struct seq_file *p, int prec); > > -static inline void arch_send_call_function_single_ipi(int cpu) > -{ > - loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION); > -} > - > -static inline void arch_send_call_function_ipi_mask(const struct cpumask > *mask) > -{ > - loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION); > -} > - > #ifdef CONFIG_HOTPLUG_CPU > static
Re: [PATCH v2] vdpa/mlx5: Allow CVQ size changes
QE tested this patch's V2, qemu no longer print error messages "qemu-system-x86_64: Insufficient written data (0)" after enable/disable multi queues multi times inside guest. Both "x-svq=on '' and without it are all test pass. Tested-by: Lei Yang On Fri, Feb 16, 2024 at 10:25 PM Jonah Palmer wrote: > > The MLX driver was not updating its control virtqueue size at set_vq_num > and instead always initialized to MLX5_CVQ_MAX_ENT (16) at > setup_cvq_vring. > > Qemu would try to set the size to 64 by default, however, because the > CVQ size always was initialized to 16, an error would be thrown when > sending >16 control messages (as used-ring entry 17 is initialized to 0). > For example, starting a guest with x-svq=on and then executing the > following command would produce the error below: > > # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done > > qemu-system-x86_64: Insufficient written data (0) > [ 435.331223] virtio_net virtio0: Failed to set mac address by vq command. > SIOCSIFHWADDR: Invalid argument > > Acked-by: Dragos Tatulea > Acked-by: Eugenio Pérez > Signed-off-by: Jonah Palmer > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 13 + > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index 778821bab7d9..ecfc16151d61 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -151,8 +151,6 @@ static void teardown_driver(struct mlx5_vdpa_net *ndev); > > static bool mlx5_vdpa_debug; > > -#define MLX5_CVQ_MAX_ENT 16 > - > #define MLX5_LOG_VIO_FLAG(_feature) > \ > do { > \ > if (features & BIT_ULL(_feature)) > \ > @@ -2276,9 +2274,16 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device > *vdev, u16 idx, u32 num) > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); > struct mlx5_vdpa_virtqueue *mvq; > > - if (!is_index_valid(mvdev, idx) || is_ctrl_vq_idx(mvdev, idx)) > + if (!is_index_valid(mvdev, idx)) > return; > > +if (is_ctrl_vq_idx(mvdev, idx)) { > +struct mlx5_control_vq *cvq = >cvq; > + > +cvq->vring.vring.num = num; > +return; > +} > + > mvq = >vqs[idx]; > mvq->num_ent = num; > } > @@ -2963,7 +2968,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev) > u16 idx = cvq->vring.last_avail_idx; > > err = vringh_init_iotlb(>vring, mvdev->actual_features, > - MLX5_CVQ_MAX_ENT, false, > + cvq->vring.vring.num, false, > (struct vring_desc > *)(uintptr_t)cvq->desc_addr, > (struct vring_avail > *)(uintptr_t)cvq->driver_addr, > (struct vring_used > *)(uintptr_t)cvq->device_addr); > -- > 2.39.3 >
Re: [PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll
On Sun, 18 Feb 2024 at 22:58, Luca Weiss wrote: > > Follow the updated bindings and use a QCS404-specific compatible for the > HFPLL on this SoC. > > Signed-off-by: Luca Weiss > --- > Please note that this patch should only land after the patch for the > clock driver. > --- > arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH v3] modules: wait do_free_init correctly
On Sat, 17 Feb 2024 16:18:10 +0800 Changbin Du wrote: > The synchronization here is just to ensure the module init's been freed > before doing W+X checking. But the commit 1a7b7d922081 ("modules: Use > vmalloc special flag") moves do_free_init() into a global workqueue > instead of call_rcu(). So now rcu_barrier() can not ensure that do_free_init > has completed. We should wait it via flush_work(). > > Without this fix, we still could encounter false positive reports in > W+X checking, and the rcu synchronization is unnecessary which can > introduce significant delay. > > Eric Chanudet reports that the rcu_barrier introduces ~0.1s delay on a > PREEMPT_RT kernel. > [0.291444] Freeing unused kernel memory: 5568K > [0.402442] Run /sbin/init as init process > > With this fix, the above delay can be eliminated. Thanks, I'll queue this as a delta, to be folded into the base patch prior to upstreaming. I added a Tested-by: Eric, if that's OK by him?
[PATCH v2 2/3] clk: qcom: hfpll: Add QCS404-specific compatible
It doesn't appear that the configuration is for the HFPLL is generic, so add a qcs404-specific compatible and rename the existing struct to qcs404. Keep qcom,hfpll in the driver for compatibility with old dtbs. Signed-off-by: Luca Weiss --- drivers/clk/qcom/hfpll.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/hfpll.c b/drivers/clk/qcom/hfpll.c index dac27e31ef60..b0b0cb074b4a 100644 --- a/drivers/clk/qcom/hfpll.c +++ b/drivers/clk/qcom/hfpll.c @@ -14,7 +14,7 @@ #include "clk-regmap.h" #include "clk-hfpll.h" -static const struct hfpll_data hdata = { +static const struct hfpll_data qcs404 = { .mode_reg = 0x00, .l_reg = 0x04, .m_reg = 0x08, @@ -84,10 +84,12 @@ static const struct hfpll_data msm8976_cci = { }; static const struct of_device_id qcom_hfpll_match_table[] = { - { .compatible = "qcom,hfpll", .data = }, { .compatible = "qcom,msm8976-hfpll-a53", .data = _a53 }, { .compatible = "qcom,msm8976-hfpll-a72", .data = _a72 }, { .compatible = "qcom,msm8976-hfpll-cci", .data = _cci }, + { .compatible = "qcom,qcs404-hfpll", .data = }, + /* Deprecated in bindings */ + { .compatible = "qcom,hfpll", .data = }, { } }; MODULE_DEVICE_TABLE(of, qcom_hfpll_match_table); -- 2.43.2
[PATCH v2 1/3] dt-bindings: clock: qcom,hfpll: Convert to YAML
Convert the .txt documentation to .yaml with some adjustments. * APQ8064/IPQ8064/MSM8960 compatibles are dropped since their HFPLLs are a part of GCC so there is no need for a separate compat entry. * Change the MSM8974 compatible to follow the updated naming schema. Theis compatible is not used upstream yet. * Add qcs404-hfpll. QCS404 currently uses qcom,hfpll. Mark that as deprecated since every SoC appears to need different driver data so "qcom,hfpll" makes no sense to keep Signed-off-by: Luca Weiss --- .../devicetree/bindings/clock/qcom,hfpll.txt | 63 .../devicetree/bindings/clock/qcom,hfpll.yaml | 69 ++ 2 files changed, 69 insertions(+), 63 deletions(-) diff --git a/Documentation/devicetree/bindings/clock/qcom,hfpll.txt b/Documentation/devicetree/bindings/clock/qcom,hfpll.txt deleted file mode 100644 index 5769cbbe76be.. --- a/Documentation/devicetree/bindings/clock/qcom,hfpll.txt +++ /dev/null @@ -1,63 +0,0 @@ -High-Frequency PLL (HFPLL) - -PROPERTIES - -- compatible: - Usage: required - Value type: : - shall contain only one of the following. The generic - compatible "qcom,hfpll" should be also included. - -"qcom,hfpll-ipq8064", "qcom,hfpll" -"qcom,hfpll-apq8064", "qcom,hfpll" -"qcom,hfpll-msm8974", "qcom,hfpll" -"qcom,hfpll-msm8960", "qcom,hfpll" -"qcom,msm8976-hfpll-a53", "qcom,hfpll" -"qcom,msm8976-hfpll-a72", "qcom,hfpll" -"qcom,msm8976-hfpll-cci", "qcom,hfpll" - -- reg: - Usage: required - Value type: - Definition: address and size of HPLL registers. An optional second - element specifies the address and size of the alias - register region. - -- clocks: - Usage: required - Value type: - Definition: reference to the xo clock. - -- clock-names: - Usage: required - Value type: - Definition: must be "xo". - -- clock-output-names: - Usage: required - Value type: - Definition: Name of the PLL. Typically hfpllX where X is a CPU number - starting at 0. Otherwise hfpll_Y where Y is more specific - such as "l2". - -Example: - -1) An HFPLL for the L2 cache. - - clock-controller@f9016000 { - compatible = "qcom,hfpll-ipq8064", "qcom,hfpll"; - reg = <0xf9016000 0x30>; - clocks = <_board>; - clock-names = "xo"; - clock-output-names = "hfpll_l2"; - }; - -2) An HFPLL for CPU0. This HFPLL has the alias register region. - - clock-controller@f908a000 { - compatible = "qcom,hfpll-ipq8064", "qcom,hfpll"; - reg = <0xf908a000 0x30>, <0xf900a000 0x30>; - clocks = <_board>; - clock-names = "xo"; - clock-output-names = "hfpll0"; - }; diff --git a/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml b/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml new file mode 100644 index ..8cb1c164f760 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/clock/qcom,hfpll.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm High-Frequency PLL + +maintainers: + - Bjorn Andersson + +description: + The HFPLL is used as CPU PLL on various Qualcomm SoCs. + +properties: + compatible: +oneOf: + - enum: + - qcom,msm8974-hfpll + - qcom,msm8976-hfpll-a53 + - qcom,msm8976-hfpll-a72 + - qcom,msm8976-hfpll-cci + - qcom,qcs404-hfpll + - const: qcom,hfpll +deprecated: true + + reg: +items: + - description: HFPLL registers + - description: Alias register region +minItems: 1 + + '#clock-cells': +const: 0 + + clocks: +items: + - description: board XO clock + + clock-names: +items: + - const: xo + + clock-output-names: +description: + Name of the PLL. Typically hfpllX where X is a CPU number starting at 0. + Otherwise hfpll_Y where Y is more specific such as "l2". +maxItems: 1 + +required: + - compatible + - reg + - '#clock-cells' + - clocks + - clock-names + - clock-output-names + +additionalProperties: false + +examples: + - | +clock-controller@f908a000 { +compatible = "qcom,msm8974-hfpll"; +reg = <0xf908a000 0x30>, <0xf900a000 0x30>; +#clock-cells = <0>; +clock-output-names = "hfpll0"; +clocks = <_board>; +clock-names = "xo"; +}; -- 2.43.2
[PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll
Follow the updated bindings and use a QCS404-specific compatible for the HFPLL on this SoC. Signed-off-by: Luca Weiss --- Please note that this patch should only land after the patch for the clock driver. --- arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi b/arch/arm64/boot/dts/qcom/qcs404.dtsi index 2f2eeaf2e945..4133d5a19deb 100644 --- a/arch/arm64/boot/dts/qcom/qcs404.dtsi +++ b/arch/arm64/boot/dts/qcom/qcs404.dtsi @@ -1308,7 +1308,7 @@ apcs_glb: mailbox@b011000 { }; apcs_hfpll: clock-controller@b016000 { - compatible = "qcom,hfpll"; + compatible = "qcom,qcs404-hfpll"; reg = <0x0b016000 0x30>; #clock-cells = <0>; clock-output-names = "apcs_hfpll"; -- 2.43.2
[PATCH v2 0/3] Convert qcom,hfpll documentation to yaml + related changes
Finally touch the hfpll doc and convert it to yaml, and do some related changes along the way. Signed-off-by: Luca Weiss --- Changes in v2: - Drop APQ8064/IPQ8064/MSM8960 compatibles (Dmitry) - Update example to MSM8974 since IPQ8064 is dropped - Clean up dt binding description (Krzysztof) - Remove second example in docs (Krzysztof) - Try to clear up the text and content around deprecating qcom,hfpll - Link to v1: https://lore.kernel.org/r/20231231-hfpll-yaml-v1-0-359d44a4e...@z3ntu.xyz --- Luca Weiss (3): dt-bindings: clock: qcom,hfpll: Convert to YAML clk: qcom: hfpll: Add QCS404-specific compatible arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll .../devicetree/bindings/clock/qcom,hfpll.txt | 63 .../devicetree/bindings/clock/qcom,hfpll.yaml | 69 ++ arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +- drivers/clk/qcom/hfpll.c | 6 +- 4 files changed, 74 insertions(+), 66 deletions(-) --- base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de change-id: 20231231-hfpll-yaml-9266f012365c Best regards, -- Luca Weiss
Re: [RFC PATCH v2 1/6] dt-bindings: mfd: add entry for Marvell 88PM886 PMIC
Rob Herring, 2024-02-15T08:20:52-06:00: > > .../bindings/mfd/marvell,88pm88x.yaml | 74 +++ > > Filename should match the compatible. > > In general, drop the 'x' wildcard. By "in general", do you mean for the drivers code also? As I have mentioned in the commit message for the driver, the other device is very similar and if the support for it was ever to be added (which I personally currently have no interest in), I believe it would make sense to extend this driver. Is it then still prefered to call it all just 88pm886 now? > > +properties: > > + compatible: > > +const: marvell,88pm886-a1 So the file should be called marvell,88pm886-a1.yaml, correct? Again, is it prefered to call it like this even if the other revision could eventually be added (again, I am not interested in that right now personally)? I mean, if I was implementing support for both revisions right now, it would make sense to name it just marvell,88pm886.yaml, no? Thank you, kind regards, K. B.
Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
On 16/02/2024 5:01, ank...@nvidia.com wrote: From: Ankit Agrawal NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device for the on-chip GPU that is the logical OS representation of the internal proprietary chip-to-chip cache coherent interconnect. The device is peculiar compared to a real PCI device in that whilst there is a real 64b PCI BAR1 (comprising region 2 & region 3) on the device, it is not used to access device memory once the faster chip-to-chip interconnect is initialized (occurs at the time of host system boot). The device memory is accessed instead using the chip-to-chip interconnect that is exposed as a contiguous physically addressable region on the host. This device memory aperture can be obtained from host ACPI table using device_property_read_u64(), according to the FW specification. Since the device memory is cache coherent with the CPU, it can be mmap into the user VMA with a cacheable mapping using remap_pfn_range() and used like a regular RAM. The device memory is not added to the host kernel, but mapped directly as this reduces memory wastage due to struct pages. There is also a requirement of a minimum reserved 1G uncached region (termed as resmem) to support the Multi-Instance GPU (MIG) feature [1]. This is to work around a HW defect. Based on [2], the requisite properties (uncached, unaligned access) can be achieved through a VM mapping (S1) of NORMAL_NC and host (S2) mapping with MemAttr[2:0]=0b101. To provide a different non-cached property to the reserved 1G region, it needs to be carved out from the device memory and mapped as a separate region in Qemu VMA with pgprot_writecombine(). pgprot_writecombine() sets the Qemu VMA page properties (pgprot) as NORMAL_NC. Provide a VFIO PCI variant driver that adapts the unique device memory representation into a more standard PCI representation facing userspace. The variant driver exposes these two regions - the non-cached reserved (resmem) and the cached rest of the device memory (termed as usemem) as separate VFIO 64b BAR regions. This is divergent from the baremetal approach, where the device memory is exposed as a device memory region. The decision for a different approach was taken in view of the fact that it would necessiate additional code in Qemu to discover and insert those regions in the VM IPA, along with the additional VM ACPI DSDT changes to communicate the device memory region IPA to the VM workloads. Moreover, this behavior would have to be added to a variety of emulators (beyond top of tree Qemu) out there desiring grace hopper support. Since the device implements 64-bit BAR0, the VFIO PCI variant driver maps the uncached carved out region to the next available PCI BAR (i.e. comprising of region 2 and 3). The cached device memory aperture is assigned BAR region 4 and 5. Qemu will then naturally generate a PCI device in the VM with the uncached aperture reported as BAR2 region, the cacheable as BAR4. The variant driver provides emulation for these fake BARs' PCI config space offset registers. The hardware ensures that the system does not crash when the memory is accessed with the memory enable turned off. It synthesis ~0 reads and dropped writes on such access. So there is no need to support the disablement/enablement of BAR through PCI_COMMAND config space register. The memory layout on the host looks like the following: devmem (memlength) |--| |-cached|--NC--| | | usemem.memphys resmem.memphys PCI BARs need to be aligned to the power-of-2, but the actual memory on the device may not. A read or write access to the physical address from the last device PFN up to the next power-of-2 aligned physical address results in reading ~0 and dropped writes. Note that the GPU device driver [6] is capable of knowing the exact device memory size through separate means. The device memory size is primarily kept in the system ACPI tables for use by the VFIO PCI variant module. Note that the usemem memory is added by the VM Nvidia device driver [5] to the VM kernel as memblocks. Hence make the usable memory size memblock (MEMBLK_SIZE) aligned. This is a hardwired ABI value between the GPU FW and VFIO driver. The VM device driver make use of the same value for its calculation to determine USEMEM size. Currently there is no provision in KVM for a S2 mapping with MemAttr[2:0]=0b101, but there is an ongoing effort to provide the same [3]. As previously mentioned, resmem is mapped pgprot_writecombine(), that sets the Qemu VMA page properties (pgprot) as NORMAL_NC. Using the proposed changes in [3] and [4], KVM marks the region with MemAttr[2:0]=0b101 in S2. If the device memory properties are not present, the driver registers the vfio-pci-core function pointers. Since there are no ACPI memory properties generated for the VM, the variant driver
Re: [PATCH v3 25/47] filelock: convert __locks_insert_block, conflict and deadlock checks to use file_lock_core
On Wed, 2024-01-31 at 18:02 -0500, Jeff Layton wrote: > Have both __locks_insert_block and the deadlock and conflict checking > functions take a struct file_lock_core pointer instead of a struct > file_lock one. Also, change posix_locks_deadlock to return bool. > > Signed-off-by: Jeff Layton > --- > fs/locks.c | 132 > + > 1 file changed, 72 insertions(+), 60 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index 1e8b943bd7f9..0dc1c9da858c 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -757,39 +757,41 @@ EXPORT_SYMBOL(locks_delete_block); > * waiters, and add beneath any waiter that blocks the new waiter. > * Thus wakeups don't happen until needed. > */ > -static void __locks_insert_block(struct file_lock *blocker, > - struct file_lock *waiter, > - bool conflict(struct file_lock *, > -struct file_lock *)) > +static void __locks_insert_block(struct file_lock *blocker_fl, > + struct file_lock *waiter_fl, > + bool conflict(struct file_lock_core *, > +struct file_lock_core *)) > { > - struct file_lock *fl; > - BUG_ON(!list_empty(>c.flc_blocked_member)); > + struct file_lock_core *blocker = _fl->c; > + struct file_lock_core *waiter = _fl->c; > + struct file_lock_core *flc; > > + BUG_ON(!list_empty(>flc_blocked_member)); > new_blocker: > - list_for_each_entry(fl, >c.flc_blocked_requests, > - c.flc_blocked_member) > - if (conflict(fl, waiter)) { > - blocker = fl; > + list_for_each_entry(flc, >flc_blocked_requests, > flc_blocked_member) > + if (conflict(flc, waiter)) { > + blocker = flc; > goto new_blocker; > } > - waiter->c.flc_blocker = blocker; > - list_add_tail(>c.flc_blocked_member, > - >c.flc_blocked_requests); > - if ((blocker->c.flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX) > - locks_insert_global_blocked(>c); > + waiter->flc_blocker = file_lock(blocker); > + list_add_tail(>flc_blocked_member, > + >flc_blocked_requests); > > - /* The requests in waiter->fl_blocked are known to conflict with > + if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == (FL_POSIX|FL_OFDLCK)) Christian, There is a bug in the above delta. That should read: if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX) I suspect that is the cause of the performance regression noted by the KTR. I believe the bug is fairly harmless -- it's just putting OFD locks into the global hash when it doesn't need to, which probably slows down deadlock checking. I'm going to spin up a patch and test it today, but I wanted to give you a heads up. I'll send the patch later today or tomorrow. > + locks_insert_global_blocked(waiter); > + > + /* The requests in waiter->flc_blocked are known to conflict with >* waiter, but might not conflict with blocker, or the requests >* and lock which block it. So they all need to be woken. >*/ > - __locks_wake_up_blocks(>c); > + __locks_wake_up_blocks(waiter); > } > > /* Must be called with flc_lock held. */ > static void locks_insert_block(struct file_lock *blocker, > struct file_lock *waiter, > -bool conflict(struct file_lock *, > - struct file_lock *)) > +bool conflict(struct file_lock_core *, > + struct file_lock_core *)) > { > spin_lock(_lock_lock); > __locks_insert_block(blocker, waiter, conflict); > @@ -846,12 +848,12 @@ locks_delete_lock_ctx(struct file_lock *fl, struct > list_head *dispose) > /* Determine if lock sys_fl blocks lock caller_fl. Common functionality > * checks for shared/exclusive status of overlapping locks. > */ > -static bool locks_conflict(struct file_lock *caller_fl, > -struct file_lock *sys_fl) > +static bool locks_conflict(struct file_lock_core *caller_flc, > +struct file_lock_core *sys_flc) > { > - if (lock_is_write(sys_fl)) > + if (sys_flc->flc_type == F_WRLCK) > return true; > - if (lock_is_write(caller_fl)) > + if (caller_flc->flc_type == F_WRLCK) > return true; > return false; > } > @@ -859,20 +861,23 @@ static bool locks_conflict(struct file_lock *caller_fl, > /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific > * checking before calling the locks_conflict(). > */ > -static bool posix_locks_conflict(struct file_lock *caller_fl, > - struct file_lock *sys_fl) > +static
Re: [PATCH v18 2/3] vfio/pci: rename and export range_intersect_range
On 16/02/2024 5:01, ank...@nvidia.com wrote: From: Ankit Agrawal range_intersect_range determines an overlap between two ranges. If an overlap, the helper function returns the overlapping offset and size. The VFIO PCI variant driver emulates the PCI config space BAR offset registers. These offset may be accessed for read/write with a variety of lengths including sub-word sizes from sub-word offsets. The driver makes use of this helper function to read/write the targeted part of the emulated register. Make this a vfio_pci_core function, rename and export as GPL. Also update references in virtio driver. Reviewed-by: Kevin Tian Signed-off-by: Ankit Agrawal --- drivers/vfio/pci/vfio_pci_config.c | 42 + drivers/vfio/pci/virtio/main.c | 72 +++--- include/linux/vfio_pci_core.h | 5 +++ 3 files changed, 73 insertions(+), 46 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 672a1804af6a..e2e6173a3375 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1966,3 +1966,45 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, return done; } + +/** + * vfio_pci_core_range_intersect_range() - Determine overlap between a buffer + *and register offset ranges. + * @buf_start: start offset of the buffer + * @buf_cnt: number of buffer bytes. You could drop the '.' at the end to be consistent with the other. + * @reg_start: start register offset + * @reg_cnt: number of register bytes + * @buf_offset:start offset of overlap in the buffer + * @intersect_count: number of overlapping bytes + * @register_offset: start offset of overlap in register + * + * Returns: true if there is overlap, false if not. + * The overlap start and size is returned through function args. + */ +bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt, +loff_t reg_start, size_t reg_cnt, +loff_t *buf_offset, +size_t *intersect_count, +size_t *register_offset) +{ + if (buf_start <= reg_start && + buf_start + buf_cnt > reg_start) { + *buf_offset = reg_start - buf_start; + *intersect_count = min_t(size_t, reg_cnt, +buf_start + buf_cnt - reg_start); + *register_offset = 0; + return true; + } + + if (buf_start > reg_start && + buf_start < reg_start + reg_cnt) { + *buf_offset = 0; + *intersect_count = min_t(size_t, buf_cnt, +reg_start + reg_cnt - buf_start); + *register_offset = buf_start - reg_start; + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(vfio_pci_core_range_intersect_range); diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c index d5af683837d3..b5d3a8c5bbc9 100644 --- a/drivers/vfio/pci/virtio/main.c +++ b/drivers/vfio/pci/virtio/main.c @@ -132,33 +132,6 @@ virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device *virtvdev, return ret ? ret : count; } -static bool range_intersect_range(loff_t range1_start, size_t count1, - loff_t range2_start, size_t count2, - loff_t *start_offset, - size_t *intersect_count, - size_t *register_offset) -{ - if (range1_start <= range2_start && - range1_start + count1 > range2_start) { - *start_offset = range2_start - range1_start; - *intersect_count = min_t(size_t, count2, -range1_start + count1 - range2_start); - *register_offset = 0; - return true; - } - - if (range1_start > range2_start && - range1_start < range2_start + count2) { - *start_offset = 0; - *intersect_count = min_t(size_t, count1, -range2_start + count2 - range1_start); - *register_offset = range1_start - range2_start; - return true; - } - - return false; -} - static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, char __user *buf, size_t count, loff_t *ppos) @@ -178,16 +151,18 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, if (ret < 0) return ret; - if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16), - _offset, _count, _offset)) { + if
Re: [PATCH v18 1/3] vfio/pci: rename and export do_io_rw()
On 16/02/2024 5:01, ank...@nvidia.com wrote: From: Ankit Agrawal do_io_rw() is used to read/write to the device MMIO. The grace hopper VFIO PCI variant driver require this functionality to read/write to its memory. Rename this as vfio_pci_core functions and export as GPL. Reviewed-by: Kevin Tian Signed-off-by: Ankit Agrawal --- drivers/vfio/pci/vfio_pci_rdwr.c | 16 +--- include/linux/vfio_pci_core.h| 5 - 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 07fea08ea8a2..03b8f7ada1ac 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -96,10 +96,10 @@ VFIO_IOREAD(32) * reads with -1. This is intended for handling MSI-X vector tables and * leftover space for ROM BARs. */ -static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, - void __iomem *io, char __user *buf, - loff_t off, size_t count, size_t x_start, - size_t x_end, bool iswrite) +ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, + void __iomem *io, char __user *buf, + loff_t off, size_t count, size_t x_start, + size_t x_end, bool iswrite) { ssize_t done = 0; int ret; @@ -201,6 +201,7 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, return done; } +EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) { @@ -279,8 +280,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, x_end = vdev->msix_offset + vdev->msix_size; } - done = do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos, - count, x_start, x_end, iswrite); + done = vfio_pci_core_do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos, + count, x_start, x_end, iswrite); if (done >= 0) *ppos += done; @@ -348,7 +349,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, * probing, so we don't currently worry about access in relation * to the memory enable bit in the command register. */ - done = do_io_rw(vdev, false, iomem, buf, off, count, 0, 0, iswrite); + done = vfio_pci_core_do_io_rw(vdev, false, iomem, buf, off, count, + 0, 0, iswrite); vga_put(vdev->pdev, rsrc); diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 85e84b92751b..cf9480a31f3e 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -130,7 +130,10 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar); pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state); - +ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, + void __iomem *io, char __user *buf, + loff_t off, size_t count, size_t x_start, + size_t x_end, bool iswrite); #define VFIO_IOWRITE_DECLATION(size) \ int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev,\ bool test_mem, u##size val, void __iomem *io); Reviewed-by: Yishai Hadas
Re: [PATCH 1/4] iommu: constify pointer to bus_type
On 2024/2/16 22:40, Krzysztof Kozlowski wrote: Make pointer to bus_type a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski --- drivers/iommu/iommu-priv.h | 5 +++-- drivers/iommu/iommu.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) Reviewed-by: Lu Baolu Best regards, baolu
[PATCH] bus: mhi: host: Change the trace string for the userspace tools mapping
User space tools can't map strings if we use directly, as the string address is internal to kernel. So add trace point strings for the user space tools to map strings properly. Signed-off-by: Krishna chaitanya chundru --- drivers/bus/mhi/host/main.c | 4 ++-- drivers/bus/mhi/host/trace.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index 2d38f6005da6..15d657af9b5b 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -1340,7 +1340,7 @@ static int mhi_update_channel_state(struct mhi_controller *mhi_cntrl, enum mhi_cmd_type cmd = MHI_CMD_NOP; int ret; - trace_mhi_channel_command_start(mhi_cntrl, mhi_chan, to_state, "Updating"); + trace_mhi_channel_command_start(mhi_cntrl, mhi_chan, to_state, TPS("Updating")); switch (to_state) { case MHI_CH_STATE_TYPE_RESET: write_lock_irq(_chan->lock); @@ -1407,7 +1407,7 @@ static int mhi_update_channel_state(struct mhi_controller *mhi_cntrl, write_unlock_irq(_chan->lock); } - trace_mhi_channel_command_end(mhi_cntrl, mhi_chan, to_state, "Updated"); + trace_mhi_channel_command_end(mhi_cntrl, mhi_chan, to_state, TPS("Updated")); exit_channel_update: mhi_cntrl->runtime_put(mhi_cntrl); mhi_device_put(mhi_cntrl->mhi_dev); diff --git a/drivers/bus/mhi/host/trace.h b/drivers/bus/mhi/host/trace.h index d12a98d44272..368515dcb22d 100644 --- a/drivers/bus/mhi/host/trace.h +++ b/drivers/bus/mhi/host/trace.h @@ -84,6 +84,8 @@ DEV_ST_TRANSITION_LIST #define dev_st_trans(a, b) { DEV_ST_TRANSITION_##a, b }, #define dev_st_trans_end(a, b) { DEV_ST_TRANSITION_##a, b } +#define TPS(x) tracepoint_string(x) + TRACE_EVENT(mhi_gen_tre, TP_PROTO(struct mhi_controller *mhi_cntrl, struct mhi_chan *mhi_chan, --- base-commit: ceeb64f41fe6a1eb9fc56d583983a81f8f3dd058 change-id: 20240218-ftrace_string-7677762aa63c Best regards, -- Krishna chaitanya chundru