Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-18 Thread maobibo




On 2024/2/19 下午3:16, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:18 PM maobibo  wrote:




On 2024/2/19 上午10:45, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:


On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr registers
accessing will cause VM to trap into hypervisor. So with ipi hw
notification once there will be three times of trap.

This patch adds pv ipi support for VM, hypercall instruction is used
to ipi sender, and hypervisor will inject SWI on the VM. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv ipi
supported, pv ipi sender will trap into hypervsor one time, pv ipi
revicer will not trap, there is only one time of trap.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
128 vcpus at one time. It reduces trap times into hypervisor greatly.

Signed-off-by: Bibo Mao 
---
   arch/loongarch/include/asm/hardirq.h   |   1 +
   arch/loongarch/include/asm/kvm_host.h  |   1 +
   arch/loongarch/include/asm/kvm_para.h  | 124 +
   arch/loongarch/include/asm/loongarch.h |   1 +
   arch/loongarch/kernel/irq.c|   2 +-
   arch/loongarch/kernel/paravirt.c   | 113 ++
   arch/loongarch/kernel/smp.c|   2 +-
   arch/loongarch/kvm/exit.c  |  73 ++-
   arch/loongarch/kvm/vcpu.c  |   1 +
   9 files changed, 314 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..8a611843c1f0 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
   typedef struct {
  unsigned int ipi_irqs[NR_IPI];
  unsigned int __softirq_pending;
+   atomic_t messages cacheline_aligned_in_smp;

Do we really need atomic_t? A plain "unsigned int" can reduce cost
significantly.

For IPI, there are multiple senders and one receiver, the sender uses
atomic_fetch_or(action, >messages) and the receiver uses
atomic_xchg(>messages, 0) to clear message.

There needs sync mechanism between senders and receiver, atomic is the
most simple method.

At least from receiver side, the native IPI doesn't need atomic for
read and clear:
static u32 ipi_read_clear(int cpu)
{
 u32 action;

 /* Load the ipi register to figure out what we're supposed to do */
 action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS);
 /* Clear the ipi register to clear the interrupt */
 iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR);
 wbflush();
It is because on physical hardware it is two IOCSR registers and also 
there is no method to use atomic read and clear method for IOCSR registers.


However if ipi message is stored on ddr memory, atomic read/clear can 
used. Your can compare price of one iocsr_read32 + one iocsr_write32 + 
wbflush with one atomic_xchg(>messages, 0)


Regards
Bibo Mao


 return action;
}




   } cacheline_aligned irq_cpustat_t;

   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 57399d7cf8b7..1bf927e2bfac 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
  u64 idle_exits;
  u64 cpucfg_exits;
  u64 signal_exits;
+   u64 hvcl_exits;

hypercall_exits is better.

yeap, hypercall_exits is better, will fix in next version.



   };

   #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 41200e922a82..a25a84e372b9 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -9,6 +9,10 @@
   #define HYPERVISOR_VENDOR_SHIFT8
   #define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
+ code)

+#define KVM_HC_CODE_SERVICE0
+#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
KVM_HC_CODE_SERVICE)
+#define  KVM_HC_FUNC_IPI   1

Change HC to HCALL is better.

will modify in next version.



+
   /*
* LoongArch hypcall return code
*/
@@ -16,6 +20,126 @@
   #define KVM_HC_INVALID_CODE-1UL
   #define KVM_HC_INVALID_PARAMETER   -2UL

+/*
+ * Hypercalls interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in v0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+   register long ret asm("v0");
+ 

Re: [syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok

2024-02-18 Thread Michael S. Tsirkin
On Sun, Feb 18, 2024 at 09:06:18PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0
> dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com
> 
> Key type pkcs7_test registered
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239)
> io scheduler mq-deadline registered
> io scheduler kyber registered
> io scheduler bfq registered
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> ACPI: button: Power Button [PWRF]
> input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
> ACPI: button: Sleep Button [SLPF]
> ioatdma: Intel(R) QuickData Technology Driver 5.00
> ACPI: \_SB_.LNKC: Enabled at IRQ 11
> virtio-pci :00:03.0: virtio_pci: leaving for legacy driver
> ACPI: \_SB_.LNKD: Enabled at IRQ 10
> virtio-pci :00:04.0: virtio_pci: leaving for legacy driver
> ACPI: \_SB_.LNKB: Enabled at IRQ 10
> virtio-pci :00:06.0: virtio_pci: leaving for legacy driver
> virtio-pci :00:07.0: virtio_pci: leaving for legacy driver
> N_HDLC line discipline registered with maxframe=4096
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
> 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
> 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
> Non-volatile memory driver v1.3
> Linux agpgart interface v0.103
> ACPI: bus type drm_connector registered
> [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> Console: switching to colour frame buffer device 128x48
> platform vkms: [drm] fb0: vkmsdrmfb frame buffer device
> usbcore: registered new interface driver udl
> brd: module loaded
> loop: module loaded
> zram: Added device: zram0
> null_blk: disk nullb0 created
> null_blk: module loaded
> Guest personality initialized and is inactive
> VMCI host device registered (name=vmci, major=10, minor=118)
> Initialized host personality
> usbcore: registered new interface driver rtsx_usb
> usbcore: registered new interface driver viperboard
> usbcore: registered new interface driver dln2
> usbcore: registered new interface driver pn533_usb
> nfcsim 0.2 initialized
> usbcore: registered new interface driver port100
> usbcore: registered new interface driver nfcmrvl
> Loading iSCSI transport class v2.0-870.
> virtio_scsi virtio0: 1/0/0 default/read/poll queues
> [ cut here ]
> refcount_t: decrement hit 0; leaking memory.
> WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 
> lib/refcount.c:31
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/25/2024
> RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 c6 
> 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb d9 e8 
> 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7
> RSP: :c9066e10 EFLAGS: 00010246
> RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000
> RDX:  RSI:  RDI: 
> RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc
> R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0
> R13: ea000502fdc8 R14: 1d4000a05fb9 R15: 
> FS:  () GS:8880b940() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 88823000 CR3: 0df32000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  reset_page_owner include/linux/page_owner.h:24 [inline]
>  free_pages_prepare mm/page_alloc.c:1140 [inline]
>  __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269
>  make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847
>  vring_alloc_queue drivers/virtio/virtio_ring.c:319 

Re: [RFC PATCH v2 1/6] dt-bindings: mfd: add entry for Marvell 88PM886 PMIC

2024-02-18 Thread Krzysztof Kozlowski
On 18/02/2024 16:10, Karel Balej wrote:
> Rob Herring, 2024-02-15T08:20:52-06:00:
>>>  .../bindings/mfd/marvell,88pm88x.yaml | 74 +++
>>
>> Filename should match the compatible.
>>
>> In general, drop the 'x' wildcard.
> 
> By "in general", do you mean for the drivers code also?

No, not driver. The rules for wildcard, that they are discouraged, are
DT binding rules.

> 
> As I have mentioned in the commit message for the driver, the other
> device is very similar and if the support for it was ever to be added
> (which I personally currently have no interest in), I believe it would
> make sense to extend this driver. Is it then still prefered to call it
> all just 88pm886 now?

Extend the driver, it's unrelated. Binding still should be named like
compatible, because that extension might never happen.

> 
>>> +properties:
>>> +  compatible:
>>> +const: marvell,88pm886-a1
> 
> So the file should be called marvell,88pm886-a1.yaml, correct? Again, is
> it prefered to call it like this even if the other revision could
> eventually be added (again, I am not interested in that right now

If you already add two devices, flexible name would be fine. But you do
not add it now and you might never add, so keep the filename=compatible.
It is fine if it has also other compatibles later. We already accepted
many bindings like that.


Best regards,
Krzysztof




Re: [PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll

2024-02-18 Thread Krzysztof Kozlowski
On 18/02/2024 21:57, Luca Weiss wrote:
> Follow the updated bindings and use a QCS404-specific compatible for the
> HFPLL on this SoC.
> 
> Signed-off-by: Luca Weiss 
> ---
> Please note that this patch should only land after the patch for the
> clock driver.
> ---

This patch should go in the next cycle, after clock driver is merged to
mainline, to preserve bisectability.

Best regards,
Krzysztof




Re: [PATCH v2 1/3] dt-bindings: clock: qcom,hfpll: Convert to YAML

2024-02-18 Thread Krzysztof Kozlowski
On 18/02/2024 21:57, Luca Weiss wrote:
> Convert the .txt documentation to .yaml with some adjustments.
> 
> * APQ8064/IPQ8064/MSM8960 compatibles are dropped since their HFPLLs are
>   a part of GCC so there is no need for a separate compat entry.
> * Change the MSM8974 compatible to follow the updated naming schema.
>   Theis compatible is not used upstream yet.
> * Add qcs404-hfpll. QCS404 currently uses qcom,hfpll. Mark that as
>   deprecated since every SoC appears to need different driver data so
>   "qcom,hfpll" makes no sense to keep
> 
> Signed-off-by: Luca Weiss 
> ---


Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof




Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-18 Thread Huacai Chen
On Mon, Feb 19, 2024 at 12:18 PM maobibo  wrote:
>
>
>
> On 2024/2/19 上午10:45, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:
> >>
> >> On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
> >> register access on ipi sending, and two iocsr access on ipi receiving
> >> which is ipi interrupt handler. On VM mode all iocsr registers
> >> accessing will cause VM to trap into hypervisor. So with ipi hw
> >> notification once there will be three times of trap.
> >>
> >> This patch adds pv ipi support for VM, hypercall instruction is used
> >> to ipi sender, and hypervisor will inject SWI on the VM. During SWI
> >> interrupt handler, only estat CSR register is written to clear irq.
> >> Estat CSR register access will not trap into hypervisor. So with pv ipi
> >> supported, pv ipi sender will trap into hypervsor one time, pv ipi
> >> revicer will not trap, there is only one time of trap.
> >>
> >> Also this patch adds ipi multicast support, the method is similar with
> >> x86. With ipi multicast support, ipi notification can be sent to at most
> >> 128 vcpus at one time. It reduces trap times into hypervisor greatly.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/include/asm/hardirq.h   |   1 +
> >>   arch/loongarch/include/asm/kvm_host.h  |   1 +
> >>   arch/loongarch/include/asm/kvm_para.h  | 124 +
> >>   arch/loongarch/include/asm/loongarch.h |   1 +
> >>   arch/loongarch/kernel/irq.c|   2 +-
> >>   arch/loongarch/kernel/paravirt.c   | 113 ++
> >>   arch/loongarch/kernel/smp.c|   2 +-
> >>   arch/loongarch/kvm/exit.c  |  73 ++-
> >>   arch/loongarch/kvm/vcpu.c  |   1 +
> >>   9 files changed, 314 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/loongarch/include/asm/hardirq.h 
> >> b/arch/loongarch/include/asm/hardirq.h
> >> index 9f0038e19c7f..8a611843c1f0 100644
> >> --- a/arch/loongarch/include/asm/hardirq.h
> >> +++ b/arch/loongarch/include/asm/hardirq.h
> >> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> >>   typedef struct {
> >>  unsigned int ipi_irqs[NR_IPI];
> >>  unsigned int __softirq_pending;
> >> +   atomic_t messages cacheline_aligned_in_smp;
> > Do we really need atomic_t? A plain "unsigned int" can reduce cost
> > significantly.
> For IPI, there are multiple senders and one receiver, the sender uses
> atomic_fetch_or(action, >messages) and the receiver uses
> atomic_xchg(>messages, 0) to clear message.
>
> There needs sync mechanism between senders and receiver, atomic is the
> most simple method.
At least from receiver side, the native IPI doesn't need atomic for
read and clear:
static u32 ipi_read_clear(int cpu)
{
u32 action;

/* Load the ipi register to figure out what we're supposed to do */
action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS);
/* Clear the ipi register to clear the interrupt */
iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR);
wbflush();

return action;
}

> >
> >>   } cacheline_aligned irq_cpustat_t;
> >>
> >>   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> >> diff --git a/arch/loongarch/include/asm/kvm_host.h 
> >> b/arch/loongarch/include/asm/kvm_host.h
> >> index 57399d7cf8b7..1bf927e2bfac 100644
> >> --- a/arch/loongarch/include/asm/kvm_host.h
> >> +++ b/arch/loongarch/include/asm/kvm_host.h
> >> @@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
> >>  u64 idle_exits;
> >>  u64 cpucfg_exits;
> >>  u64 signal_exits;
> >> +   u64 hvcl_exits;
> > hypercall_exits is better.
> yeap, hypercall_exits is better, will fix in next version.
> >
> >>   };
> >>
> >>   #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
> >> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >> b/arch/loongarch/include/asm/kvm_para.h
> >> index 41200e922a82..a25a84e372b9 100644
> >> --- a/arch/loongarch/include/asm/kvm_para.h
> >> +++ b/arch/loongarch/include/asm/kvm_para.h
> >> @@ -9,6 +9,10 @@
> >>   #define HYPERVISOR_VENDOR_SHIFT8
> >>   #define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >> HYPERVISOR_VENDOR_SHIFT) + code)
> >>
> >> +#define KVM_HC_CODE_SERVICE0
> >> +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
> >> KVM_HC_CODE_SERVICE)
> >> +#define  KVM_HC_FUNC_IPI   1
> > Change HC to HCALL is better.
> will modify in next version.
> >
> >> +
> >>   /*
> >>* LoongArch hypcall return code
> >>*/
> >> @@ -16,6 +20,126 @@
> >>   #define KVM_HC_INVALID_CODE-1UL
> >>   #define KVM_HC_INVALID_PARAMETER   -2UL
> >>
> >> +/*
> >> + * Hypercalls interface for KVM hypervisor
> >> + *
> >> + * a0: function identifier
> >> + * a1-a6: args
> >> + * Return value will be placed in v0.
> >> + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
> >> + */
> >> +static __always_inline long kvm_hypercall(u64 fid)
> >> 

[syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok

2024-02-18 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218
kernel config:  https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0
dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com

Key type pkcs7_test registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239)
io scheduler mq-deadline registered
io scheduler kyber registered
io scheduler bfq registered
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: button: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: button: Sleep Button [SLPF]
ioatdma: Intel(R) QuickData Technology Driver 5.00
ACPI: \_SB_.LNKC: Enabled at IRQ 11
virtio-pci :00:03.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKD: Enabled at IRQ 10
virtio-pci :00:04.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKB: Enabled at IRQ 10
virtio-pci :00:06.0: virtio_pci: leaving for legacy driver
virtio-pci :00:07.0: virtio_pci: leaving for legacy driver
N_HDLC line discipline registered with maxframe=4096
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
ACPI: bus type drm_connector registered
[drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
Console: switching to colour frame buffer device 128x48
platform vkms: [drm] fb0: vkmsdrmfb frame buffer device
usbcore: registered new interface driver udl
brd: module loaded
loop: module loaded
zram: Added device: zram0
null_blk: disk nullb0 created
null_blk: module loaded
Guest personality initialized and is inactive
VMCI host device registered (name=vmci, major=10, minor=118)
Initialized host personality
usbcore: registered new interface driver rtsx_usb
usbcore: registered new interface driver viperboard
usbcore: registered new interface driver dln2
usbcore: registered new interface driver pn533_usb
nfcsim 0.2 initialized
usbcore: registered new interface driver port100
usbcore: registered new interface driver nfcmrvl
Loading iSCSI transport class v2.0-870.
virtio_scsi virtio0: 1/0/0 default/read/poll queues
[ cut here ]
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 
lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/25/2024
RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 c6 
16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb d9 e8 8b 
94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7
RSP: :c9066e10 EFLAGS: 00010246
RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000
RDX:  RSI:  RDI: 
RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc
R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0
R13: ea000502fdc8 R14: 1d4000a05fb9 R15: 
FS:  () GS:8880b940() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 88823000 CR3: 0df32000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1140 [inline]
 __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269
 make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847
 vring_alloc_queue drivers/virtio/virtio_ring.c:319 [inline]
 vring_alloc_queue_split+0x20a/0x600 drivers/virtio/virtio_ring.c:1108
 vring_create_virtqueue_split+0xc6/0x310 drivers/virtio/virtio_ring.c:1158
 vring_create_virtqueue+0xca/0x110 drivers/virtio/virtio_ring.c:2683
 setup_vq+0xe9/0x2d0 

Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-18 Thread maobibo




On 2024/2/19 上午10:45, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:


On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr registers
accessing will cause VM to trap into hypervisor. So with ipi hw
notification once there will be three times of trap.

This patch adds pv ipi support for VM, hypercall instruction is used
to ipi sender, and hypervisor will inject SWI on the VM. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv ipi
supported, pv ipi sender will trap into hypervsor one time, pv ipi
revicer will not trap, there is only one time of trap.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
128 vcpus at one time. It reduces trap times into hypervisor greatly.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/hardirq.h   |   1 +
  arch/loongarch/include/asm/kvm_host.h  |   1 +
  arch/loongarch/include/asm/kvm_para.h  | 124 +
  arch/loongarch/include/asm/loongarch.h |   1 +
  arch/loongarch/kernel/irq.c|   2 +-
  arch/loongarch/kernel/paravirt.c   | 113 ++
  arch/loongarch/kernel/smp.c|   2 +-
  arch/loongarch/kvm/exit.c  |  73 ++-
  arch/loongarch/kvm/vcpu.c  |   1 +
  9 files changed, 314 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..8a611843c1f0 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
  typedef struct {
 unsigned int ipi_irqs[NR_IPI];
 unsigned int __softirq_pending;
+   atomic_t messages cacheline_aligned_in_smp;

Do we really need atomic_t? A plain "unsigned int" can reduce cost
significantly.
For IPI, there are multiple senders and one receiver, the sender uses 
atomic_fetch_or(action, >messages) and the receiver uses 
atomic_xchg(>messages, 0) to clear message.


There needs sync mechanism between senders and receiver, atomic is the 
most simple method.



  } cacheline_aligned irq_cpustat_t;

  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 57399d7cf8b7..1bf927e2bfac 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
 u64 idle_exits;
 u64 cpucfg_exits;
 u64 signal_exits;
+   u64 hvcl_exits;

hypercall_exits is better.

yeap, hypercall_exits is better, will fix in next version.



  };

  #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 41200e922a82..a25a84e372b9 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -9,6 +9,10 @@
  #define HYPERVISOR_VENDOR_SHIFT8
  #define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)

+#define KVM_HC_CODE_SERVICE0
+#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
KVM_HC_CODE_SERVICE)
+#define  KVM_HC_FUNC_IPI   1

Change HC to HCALL is better.

will modify in next version.



+
  /*
   * LoongArch hypcall return code
   */
@@ -16,6 +20,126 @@
  #define KVM_HC_INVALID_CODE-1UL
  #define KVM_HC_INVALID_PARAMETER   -2UL

+/*
+ * Hypercalls interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in v0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun), "r" (a1)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall2(u64 fid,
+   unsigned long arg0, unsigned long arg1)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread maobibo




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  9 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  2 +
  7 files changed, 88 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypcall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.

Originally I want to remove this piece of code, but it fails to compile 
if CONFIG_PARAVIRT is selected. Here is reference code, function 
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.


static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(_steal_enabled)) {
u64 steal;

steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
steal = min(steal, maxtime);
account_steal_time(steal);
this_rq()->prev_steal_time += steal;

return steal;
}
#endif
return 0;
}


+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null

Re: [PATCH v4 2/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-02-18 Thread maobibo



On 2024/2/19 上午10:41, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


On LoongArch system, hypercall instruction is supported when system
runs on VM mode. This patch adds dummy function with hypercall
instruction emulation, rather than inject EXCCODE_INE invalid
instruction exception.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/Kbuild  |  1 -
  arch/loongarch/include/asm/kvm_para.h  | 26 ++
  arch/loongarch/include/uapi/asm/Kbuild |  2 --
  arch/loongarch/kvm/exit.c  | 10 ++
  4 files changed, 36 insertions(+), 3 deletions(-)
  create mode 100644 arch/loongarch/include/asm/kvm_para.h
  delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild

diff --git a/arch/loongarch/include/asm/Kbuild 
b/arch/loongarch/include/asm/Kbuild
index 93783fa24f6e..22991a6f0e2b 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -23,4 +23,3 @@ generic-y += poll.h
  generic-y += param.h
  generic-y += posix_types.h
  generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index ..9425d3b7e486
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypcall return code

Maybe using "hypercall" in comments is better.

will modify in next patch.




+ */
+#define KVM_HC_STATUS_SUCCESS  0
+#define KVM_HC_INVALID_CODE-1UL
+#define KVM_HC_INVALID_PARAMETER   -2UL

Maybe KVM_HCALL_SUCCESS/KVM_HCALL_INVALID_CODE/KVM_HCALL_PARAMETER is better.

yes, KVM_HCALL_ sounds better. Will modify it.

Regards
Bibo Mao


Huacai


+
+static inline unsigned int kvm_arch_para_features(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+   return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ed1d89d53e2e..d15c71320a11 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
 return RESUME_GUEST;
  }

+static int kvm_handle_hypcall(struct kvm_vcpu *vcpu)
+{
+   update_pc(>arch);
+
+   /* Treat it as noop intruction, only set return value */
+   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE;
+   return RESUME_GUEST;
+}
+
  /*
   * LoongArch KVM callback handling for unimplemented guest exiting
   */
@@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = 
{
 [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled,
 [EXCCODE_LASXDIS]   = kvm_handle_lasx_disabled,
 [EXCCODE_GSPR]  = kvm_handle_gspr,
+   [EXCCODE_HVC]   = kvm_handle_hypcall,
  };

  int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
--
2.39.3






Re: [PATCH v4 1/6] LoongArch/smp: Refine ipi ops on LoongArch platform

2024-02-18 Thread maobibo

Huacai,

Thanks for your reviewing, I reply inline.

On 2024/2/19 上午10:39, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


This patch refines ipi handling on LoongArch platform, there are
three changes with this patch.
1. Add generic get_percpu_irq() api, replace some percpu irq functions
such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq().

2. Change parameter action definition with function
loongson_send_ipi_single() and loongson_send_ipi_mask(). Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses devimal action code, and ipi receiver will get binary bitmap
encoding, the ipi hw will convert it into bitmap in ipi message buffer.

What is "devimal" here? Maybe decimal?

yeap, it should be decimal.





3. Add structure smp_ops on LoongArch platform so that pv ipi can be used
later.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/hardirq.h |  4 ++
  arch/loongarch/include/asm/irq.h | 10 -
  arch/loongarch/include/asm/smp.h | 31 +++
  arch/loongarch/kernel/irq.c  | 22 +--
  arch/loongarch/kernel/perf_event.c   | 14 +--
  arch/loongarch/kernel/smp.c  | 58 +++-
  arch/loongarch/kernel/time.c | 12 +-
  7 files changed, 71 insertions(+), 80 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 0ef3b18f8980..9f0038e19c7f 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -12,6 +12,10 @@
  extern void ack_bad_irq(unsigned int irq);
  #define ack_bad_irq ack_bad_irq

+enum ipi_msg_type {
+   IPI_RESCHEDULE,
+   IPI_CALL_FUNCTION,
+};
  #define NR_IPI 2

  typedef struct {
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index 218b4da0ea90..00101b6d601e 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
  extern struct fwnode_handle *pch_lpc_handle;
  extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];

-extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
+static inline int get_percpu_irq(int vector)
+{
+   struct irq_domain *d;
+
+   d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
+   if (d)
+   return irq_create_mapping(d, vector);

+   return -EINVAL;
+}
  #include 

  #endif /* _ASM_IRQ_H */
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index f81e5f01d619..8a42632b038a 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -12,6 +12,13 @@
  #include 
  #include 

+struct smp_ops {
+   void (*init_ipi)(void);
+   void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action);
+   void (*send_ipi_single)(int cpu, unsigned int action);
+};
+
+extern struct smp_ops smp_ops;
  extern int smp_num_siblings;
  extern int num_processors;
  extern int disabled_cpus;
@@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
  void loongson_boot_secondary(int cpu, struct task_struct *idle);
  void loongson_init_secondary(void);
  void loongson_smp_finish(void);
-void loongson_send_ipi_single(int cpu, unsigned int action);
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
  #ifdef CONFIG_HOTPLUG_CPU
  int loongson_cpu_disable(void);
  void loongson_cpu_die(unsigned int cpu);
@@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];

  #define cpu_physical_id(cpu)   cpu_logical_map(cpu)

-#define SMP_BOOT_CPU   0x1
-#define SMP_RESCHEDULE 0x2
-#define SMP_CALL_FUNCTION  0x4
+#define ACTTION_BOOT_CPU   0
+#define ACTTION_RESCHEDULE 1
+#define ACTTION_CALL_FUNCTION  2

ACTTION? ACTION?

it should be ACTION_xxx, will refresh it in next patch.

Regards
Bibo Mao


Huacai


+#define SMP_BOOT_CPU   BIT(ACTTION_BOOT_CPU)
+#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE)
+#define SMP_CALL_FUNCTION  BIT(ACTTION_CALL_FUNCTION)

  struct secondary_data {
 unsigned long stack;
@@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;

  extern asmlinkage void smpboot_entry(void);
  extern asmlinkage void start_secondary(void);
-
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
  extern void calculate_cpu_foreign_map(void);

  /*
@@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
   */
  extern void show_ipi_list(struct seq_file *p, int prec);

-static inline void arch_send_call_function_single_ipi(int cpu)
-{
-   loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
-}
-
-static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
-{
-   loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
-}
-
  #ifdef CONFIG_HOTPLUG_CPU
  static inline int 

Re: [PATCH v18 2/3] vfio/pci: rename and export range_intersect_range

2024-02-18 Thread Ankit Agrawal
>> +
>> +/**
>> + * vfio_pci_core_range_intersect_range() - Determine overlap between a 
>> buffer
>> + *  and register offset ranges.
>> + * @buf_start:   start offset of the buffer
>> + * @buf_cnt: number of buffer bytes.
>
> You could drop the '.' at the end to be consistent with the other.

Ok, will make it consistent.

>> +bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
>> +  loff_t reg_start, size_t reg_cnt,
>> +  loff_t *buf_offset,
>> +  size_t *intersect_count,
>> +  size_t *register_offset);
>>   #define VFIO_IOWRITE_DECLATION(size) \
>>   int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev,  \
>>    bool test_mem, u##size val, void __iomem *io);
>
> Reviewed-by: Yishai Hadas 

Thanks


Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-18 Thread Ankit Agrawal
Thanks Kevin and Yishai for the reviews. Comments inline.

>> +static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
>> + struct vm_area_struct *vma)
>> +{
>> + struct nvgrace_gpu_pci_core_device *nvdev =
>> + container_of(core_vdev, struct nvgrace_gpu_pci_core_device,
>> +  core_device.vdev);
>
> No need for a new line here.

Ack.

>> +static ssize_t
>> +nvgrace_gpu_read_mem(struct nvgrace_gpu_pci_core_device *nvdev,
>> +  char __user *buf, size_t count, loff_t *ppos)
>> +{
>> + u64 offset = *ppos & VFIO_PCI_OFFSET_MASK;
>> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> + struct mem_region *memregion;
>> + size_t mem_count, i;
>> + u8 val = 0xFF;
>> + int ret;
>> +
>> + memregion = nvgrace_gpu_memregion(index, nvdev);
>> + if (!memregion)
>
> Can that happen ? it was just tested by the caller.

Ok, I can remove it. Will put a comment instead that this has been checked.

>> + /*
>> +  * Determine how many bytes to be actually read from the device memory.
>> +  * Read request beyond the actual device memory size is filled with ~0,
>> +  * while those beyond the actual reported size is skipped.
>> +  */
>> + if (offset >= memregion->memlength)
>> + mem_count = 0;
>> + else
>> + mem_count = min(count, memregion->memlength - (size_t)offset);
>> +
>> + ret = nvgrace_gpu_map_and_read(nvdev, buf, mem_count, ppos);
>> + if (ret)
>> + return ret;
>> +
>> + /*
>> +  * Only the device memory present on the hardware is mapped, which may
>> +  * not be power-of-2 aligned. A read to an offset beyond the device 
>> memory
>> +  * size is filled with ~0.
>> +  */
>> + for (i = mem_count; i < count; i++)
>> + put_user(val, (unsigned char __user *)(buf + i));
>
> Did you condier a failure here ?

Yeah, that has to be checked here. Will make the change in the next post.

>> +/*
>> + * Write count bytes to the device memory at a given offset. The actual 
>> device
>> + * memory size (available) may not be a power-of-2. So the driver fakes the
>> + * size to a power-of-2 (reported) when exposing to a user space driver.
>> + *
>> + * Writes extending beyond the reported size are truncated; writes starting
>> + * beyond the reported size generate -EINVAL.
>> + */
>> +static ssize_t
>> +nvgrace_gpu_write_mem(struct nvgrace_gpu_pci_core_device *nvdev,
>> +   size_t count, loff_t *ppos, const char __user *buf)
>> +{
>> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> + u64 offset = *ppos & VFIO_PCI_OFFSET_MASK;
>> + struct mem_region *memregion;
>> + size_t mem_count;
>> + int ret = 0;
>> +
>> + memregion = nvgrace_gpu_memregion(index, nvdev);
>> + if (!memregion)
>
> Same as the above note in nvgrace_gpu_read_mem().

Ack.

>> +static const struct vfio_device_ops nvgrace_gpu_pci_ops = {
>> + .name   = "nvgrace-gpu-vfio-pci",
>> + .init   = vfio_pci_core_init_dev,
>> + .release    = vfio_pci_core_release_dev,
>> + .open_device    = nvgrace_gpu_open_device,
>> + .close_device   = nvgrace_gpu_close_device,
>> + .ioctl  = nvgrace_gpu_ioctl,
>> + .read   = nvgrace_gpu_read,
>> + .write  = nvgrace_gpu_write,
>> + .mmap   = nvgrace_gpu_mmap,
>> + .request    = vfio_pci_core_request,
>> + .match  = vfio_pci_core_match,
>> + .bind_iommufd   = vfio_iommufd_physical_bind,
>> + .unbind_iommufd = vfio_iommufd_physical_unbind,
>> + .attach_ioas    = vfio_iommufd_physical_attach_ioas,
>> + .detach_ioas    = vfio_iommufd_physical_detach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = {
>> + .name   = "nvgrace-gpu-vfio-pci-core",
>> + .init   = vfio_pci_core_init_dev,
>> + .release    = vfio_pci_core_release_dev,
>> + .open_device    = nvgrace_gpu_open_device,
>> + .close_device   = vfio_pci_core_close_device,
>> + .ioctl  = vfio_pci_core_ioctl,
>> + .device_feature = vfio_pci_core_ioctl_feature,
>
> This entry is missing above as part of nvgrace_gpu_pci_ops.
Yes. Will add.
>> + .read   = vfio_pci_core_read,
>> + .write  = vfio_pci_core_write,
>> + .mmap   = vfio_pci_core_mmap,
>> + .request    = vfio_pci_core_request,
>> + .match  = vfio_pci_core_match,
>> + .bind_iommufd   = vfio_iommufd_physical_bind,
>> + .unbind_iommufd = vfio_iommufd_physical_unbind,
>> + .attach_ioas    = vfio_iommufd_physical_attach_ioas,
>> + .detach_ioas    = vfio_iommufd_physical_detach_ioas,
>> +};
>> +
>> +static struct
>> +nvgrace_gpu_pci_core_device *nvgrace_gpu_drvdata(struct pci_dev *pdev)
>> +{
>> + struct vfio_pci_core_device *core_device = dev_get_drvdata(>dev);
>> +
>> + return 

Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:
>
> On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
> register access on ipi sending, and two iocsr access on ipi receiving
> which is ipi interrupt handler. On VM mode all iocsr registers
> accessing will cause VM to trap into hypervisor. So with ipi hw
> notification once there will be three times of trap.
>
> This patch adds pv ipi support for VM, hypercall instruction is used
> to ipi sender, and hypervisor will inject SWI on the VM. During SWI
> interrupt handler, only estat CSR register is written to clear irq.
> Estat CSR register access will not trap into hypervisor. So with pv ipi
> supported, pv ipi sender will trap into hypervsor one time, pv ipi
> revicer will not trap, there is only one time of trap.
>
> Also this patch adds ipi multicast support, the method is similar with
> x86. With ipi multicast support, ipi notification can be sent to at most
> 128 vcpus at one time. It reduces trap times into hypervisor greatly.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/include/asm/hardirq.h   |   1 +
>  arch/loongarch/include/asm/kvm_host.h  |   1 +
>  arch/loongarch/include/asm/kvm_para.h  | 124 +
>  arch/loongarch/include/asm/loongarch.h |   1 +
>  arch/loongarch/kernel/irq.c|   2 +-
>  arch/loongarch/kernel/paravirt.c   | 113 ++
>  arch/loongarch/kernel/smp.c|   2 +-
>  arch/loongarch/kvm/exit.c  |  73 ++-
>  arch/loongarch/kvm/vcpu.c  |   1 +
>  9 files changed, 314 insertions(+), 4 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..8a611843c1f0 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>  typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> +   atomic_t messages cacheline_aligned_in_smp;
Do we really need atomic_t? A plain "unsigned int" can reduce cost
significantly.

>  } cacheline_aligned irq_cpustat_t;
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/kvm_host.h 
> b/arch/loongarch/include/asm/kvm_host.h
> index 57399d7cf8b7..1bf927e2bfac 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
> u64 idle_exits;
> u64 cpucfg_exits;
> u64 signal_exits;
> +   u64 hvcl_exits;
hypercall_exits is better.

>  };
>
>  #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index 41200e922a82..a25a84e372b9 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -9,6 +9,10 @@
>  #define HYPERVISOR_VENDOR_SHIFT8
>  #define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
>
> +#define KVM_HC_CODE_SERVICE0
> +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
> KVM_HC_CODE_SERVICE)
> +#define  KVM_HC_FUNC_IPI   1
Change HC to HCALL is better.

> +
>  /*
>   * LoongArch hypcall return code
>   */
> @@ -16,6 +20,126 @@
>  #define KVM_HC_INVALID_CODE-1UL
>  #define KVM_HC_INVALID_PARAMETER   -2UL
>
> +/*
> + * Hypercalls interface for KVM hypervisor
> + *
> + * a0: function identifier
> + * a1-a6: args
> + * Return value will be placed in v0.
> + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
> + */
> +static __always_inline long kvm_hypercall(u64 fid)
> +{
> +   register long ret asm("v0");
> +   register unsigned long fun asm("a0") = fid;
> +
> +   __asm__ __volatile__(
> +   "hvcl "__stringify(KVM_HC_SERVICE)
> +   : "=r" (ret)
> +   : "r" (fun)
> +   : "memory"
> +   );
> +
> +   return ret;
> +}
> +
> +static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
> +{
> +   register long ret asm("v0");
> +   register unsigned long fun asm("a0") = fid;
> +   register unsigned long a1  asm("a1") = arg0;
> +
> +   __asm__ __volatile__(
> +   "hvcl "__stringify(KVM_HC_SERVICE)
> +   : "=r" (ret)
> +   : "r" (fun), "r" (a1)
> +   : "memory"
> +   );
> +
> +   return ret;
> +}
> +
> +static __always_inline long kvm_hypercall2(u64 fid,
> +   unsigned long arg0, unsigned long arg1)
> +{
> +   register long ret asm("v0");
> +   register unsigned long fun asm("a0") = fid;
> +   register unsigned long a1  asm("a1") = arg0;
> +   register unsigned long a2  asm("a2") = arg1;
> +
> +   __asm__ __volatile__(
> +   "hvcl "__stringify(KVM_HC_SERVICE)
> +  

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
>
> The patch adds paravirt interface for guest kernel, function
> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> runs on VM mode, it will call function kvm_para_available() to detect
> whether current VMM is KVM hypervisor. And the paravirt function can work
> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> This patch only adds paravirt interface for guest kernel, however there
> is not effective pv functions added here.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|  9 
>  arch/loongarch/include/asm/kvm_para.h |  7 
>  arch/loongarch/include/asm/paravirt.h | 27 
>  .../include/asm/paravirt_api_clock.h  |  1 +
>  arch/loongarch/kernel/Makefile|  1 +
>  arch/loongarch/kernel/paravirt.c  | 41 +++
>  arch/loongarch/kernel/setup.c |  2 +
>  7 files changed, 88 insertions(+)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 10959e6c3583..817a56dff80f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index 9425d3b7e486..41200e922a82 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
>  #ifndef _ASM_LOONGARCH_KVM_PARA_H
>  #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypcall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT8
> +#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
> +
>  /*
>   * LoongArch hypcall return code
>   */
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..b64813592ba0
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
The steal time code can be removed in this patch, I think.

> +
> +int pv_guest_init(void);
> +#else
> +static inline int pv_guest_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c 
> b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index ..21d01d05791a
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct static_key paravirt_steal_enabled;
> +struct static_key paravirt_steal_rq_enabled;
> +
> +static u64 native_steal_clock(int cpu)
> +{
> +   return 0;
> +}
> +
> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
The steal time code can be removed in this patch, I think.

> +
> +static bool kvm_para_available(void)
> +{
> +   static int 

Re: [PATCH v4 2/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
>
> On LoongArch system, hypercall instruction is supported when system
> runs on VM mode. This patch adds dummy function with hypercall
> instruction emulation, rather than inject EXCCODE_INE invalid
> instruction exception.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/include/asm/Kbuild  |  1 -
>  arch/loongarch/include/asm/kvm_para.h  | 26 ++
>  arch/loongarch/include/uapi/asm/Kbuild |  2 --
>  arch/loongarch/kvm/exit.c  | 10 ++
>  4 files changed, 36 insertions(+), 3 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/kvm_para.h
>  delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
>
> diff --git a/arch/loongarch/include/asm/Kbuild 
> b/arch/loongarch/include/asm/Kbuild
> index 93783fa24f6e..22991a6f0e2b 100644
> --- a/arch/loongarch/include/asm/Kbuild
> +++ b/arch/loongarch/include/asm/Kbuild
> @@ -23,4 +23,3 @@ generic-y += poll.h
>  generic-y += param.h
>  generic-y += posix_types.h
>  generic-y += resource.h
> -generic-y += kvm_para.h
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> new file mode 100644
> index ..9425d3b7e486
> --- /dev/null
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_KVM_PARA_H
> +#define _ASM_LOONGARCH_KVM_PARA_H
> +
> +/*
> + * LoongArch hypcall return code
Maybe using "hypercall" in comments is better.

> + */
> +#define KVM_HC_STATUS_SUCCESS  0
> +#define KVM_HC_INVALID_CODE-1UL
> +#define KVM_HC_INVALID_PARAMETER   -2UL
Maybe KVM_HCALL_SUCCESS/KVM_HCALL_INVALID_CODE/KVM_HCALL_PARAMETER is better.

Huacai

> +
> +static inline unsigned int kvm_arch_para_features(void)
> +{
> +   return 0;
> +}
> +
> +static inline unsigned int kvm_arch_para_hints(void)
> +{
> +   return 0;
> +}
> +
> +static inline bool kvm_check_and_clear_guest_paused(void)
> +{
> +   return false;
> +}
> +#endif /* _ASM_LOONGARCH_KVM_PARA_H */
> diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
> b/arch/loongarch/include/uapi/asm/Kbuild
> deleted file mode 100644
> index 4aa680ca2e5f..
> --- a/arch/loongarch/include/uapi/asm/Kbuild
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -# SPDX-License-Identifier: GPL-2.0
> -generic-y += kvm_para.h
> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> index ed1d89d53e2e..d15c71320a11 100644
> --- a/arch/loongarch/kvm/exit.c
> +++ b/arch/loongarch/kvm/exit.c
> @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu 
> *vcpu)
> return RESUME_GUEST;
>  }
>
> +static int kvm_handle_hypcall(struct kvm_vcpu *vcpu)
> +{
> +   update_pc(>arch);
> +
> +   /* Treat it as noop intruction, only set return value */
> +   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE;
> +   return RESUME_GUEST;
> +}
> +
>  /*
>   * LoongArch KVM callback handling for unimplemented guest exiting
>   */
> @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] 
> = {
> [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled,
> [EXCCODE_LASXDIS]   = kvm_handle_lasx_disabled,
> [EXCCODE_GSPR]  = kvm_handle_gspr,
> +   [EXCCODE_HVC]   = kvm_handle_hypcall,
>  };
>
>  int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
> --
> 2.39.3
>



Re: [PATCH v4 1/6] LoongArch/smp: Refine ipi ops on LoongArch platform

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
>
> This patch refines ipi handling on LoongArch platform, there are
> three changes with this patch.
> 1. Add generic get_percpu_irq() api, replace some percpu irq functions
> such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq().
>
> 2. Change parameter action definition with function
> loongson_send_ipi_single() and loongson_send_ipi_mask(). Normal decimal
> encoding is used rather than binary bitmap encoding for ipi action, ipi
> hw sender uses devimal action code, and ipi receiver will get binary bitmap
> encoding, the ipi hw will convert it into bitmap in ipi message buffer.
What is "devimal" here? Maybe decimal?

>
> 3. Add structure smp_ops on LoongArch platform so that pv ipi can be used
> later.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/include/asm/hardirq.h |  4 ++
>  arch/loongarch/include/asm/irq.h | 10 -
>  arch/loongarch/include/asm/smp.h | 31 +++
>  arch/loongarch/kernel/irq.c  | 22 +--
>  arch/loongarch/kernel/perf_event.c   | 14 +--
>  arch/loongarch/kernel/smp.c  | 58 +++-
>  arch/loongarch/kernel/time.c | 12 +-
>  7 files changed, 71 insertions(+), 80 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 0ef3b18f8980..9f0038e19c7f 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -12,6 +12,10 @@
>  extern void ack_bad_irq(unsigned int irq);
>  #define ack_bad_irq ack_bad_irq
>
> +enum ipi_msg_type {
> +   IPI_RESCHEDULE,
> +   IPI_CALL_FUNCTION,
> +};
>  #define NR_IPI 2
>
>  typedef struct {
> diff --git a/arch/loongarch/include/asm/irq.h 
> b/arch/loongarch/include/asm/irq.h
> index 218b4da0ea90..00101b6d601e 100644
> --- a/arch/loongarch/include/asm/irq.h
> +++ b/arch/loongarch/include/asm/irq.h
> @@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
>  extern struct fwnode_handle *pch_lpc_handle;
>  extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];
>
> -extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
> +static inline int get_percpu_irq(int vector)
> +{
> +   struct irq_domain *d;
> +
> +   d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
> +   if (d)
> +   return irq_create_mapping(d, vector);
>
> +   return -EINVAL;
> +}
>  #include 
>
>  #endif /* _ASM_IRQ_H */
> diff --git a/arch/loongarch/include/asm/smp.h 
> b/arch/loongarch/include/asm/smp.h
> index f81e5f01d619..8a42632b038a 100644
> --- a/arch/loongarch/include/asm/smp.h
> +++ b/arch/loongarch/include/asm/smp.h
> @@ -12,6 +12,13 @@
>  #include 
>  #include 
>
> +struct smp_ops {
> +   void (*init_ipi)(void);
> +   void (*send_ipi_mask)(const struct cpumask *mask, unsigned int 
> action);
> +   void (*send_ipi_single)(int cpu, unsigned int action);
> +};
> +
> +extern struct smp_ops smp_ops;
>  extern int smp_num_siblings;
>  extern int num_processors;
>  extern int disabled_cpus;
> @@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
>  void loongson_boot_secondary(int cpu, struct task_struct *idle);
>  void loongson_init_secondary(void);
>  void loongson_smp_finish(void);
> -void loongson_send_ipi_single(int cpu, unsigned int action);
> -void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
>  #ifdef CONFIG_HOTPLUG_CPU
>  int loongson_cpu_disable(void);
>  void loongson_cpu_die(unsigned int cpu);
> @@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];
>
>  #define cpu_physical_id(cpu)   cpu_logical_map(cpu)
>
> -#define SMP_BOOT_CPU   0x1
> -#define SMP_RESCHEDULE 0x2
> -#define SMP_CALL_FUNCTION  0x4
> +#define ACTTION_BOOT_CPU   0
> +#define ACTTION_RESCHEDULE 1
> +#define ACTTION_CALL_FUNCTION  2
ACTTION? ACTION?

Huacai

> +#define SMP_BOOT_CPU   BIT(ACTTION_BOOT_CPU)
> +#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE)
> +#define SMP_CALL_FUNCTION  BIT(ACTTION_CALL_FUNCTION)
>
>  struct secondary_data {
> unsigned long stack;
> @@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;
>
>  extern asmlinkage void smpboot_entry(void);
>  extern asmlinkage void start_secondary(void);
> -
> +extern void arch_send_call_function_single_ipi(int cpu);
> +extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
>  extern void calculate_cpu_foreign_map(void);
>
>  /*
> @@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
>   */
>  extern void show_ipi_list(struct seq_file *p, int prec);
>
> -static inline void arch_send_call_function_single_ipi(int cpu)
> -{
> -   loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
> -}
> -
> -static inline void arch_send_call_function_ipi_mask(const struct cpumask 
> *mask)
> -{
> -   loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
> -}
> -
>  #ifdef CONFIG_HOTPLUG_CPU
>  static 

Re: [PATCH v2] vdpa/mlx5: Allow CVQ size changes

2024-02-18 Thread Lei Yang
QE tested this patch's V2, qemu no longer print error messages
"qemu-system-x86_64: Insufficient written data (0)" after
enable/disable multi queues multi times inside guest. Both "x-svq=on
'' and without it are all test pass.

Tested-by: Lei Yang 

On Fri, Feb 16, 2024 at 10:25 PM Jonah Palmer  wrote:
>
> The MLX driver was not updating its control virtqueue size at set_vq_num
> and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
> setup_cvq_vring.
>
> Qemu would try to set the size to 64 by default, however, because the
> CVQ size always was initialized to 16, an error would be thrown when
> sending >16 control messages (as used-ring entry 17 is initialized to 0).
> For example, starting a guest with x-svq=on and then executing the
> following command would produce the error below:
>
>  # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done
>
>  qemu-system-x86_64: Insufficient written data (0)
>  [  435.331223] virtio_net virtio0: Failed to set mac address by vq command.
>  SIOCSIFHWADDR: Invalid argument
>
> Acked-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 778821bab7d9..ecfc16151d61 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -151,8 +151,6 @@ static void teardown_driver(struct mlx5_vdpa_net *ndev);
>
>  static bool mlx5_vdpa_debug;
>
> -#define MLX5_CVQ_MAX_ENT 16
> -
>  #define MLX5_LOG_VIO_FLAG(_feature)  
>   \
> do {  
>  \
> if (features & BIT_ULL(_feature)) 
>  \
> @@ -2276,9 +2274,16 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device 
> *vdev, u16 idx, u32 num)
> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> struct mlx5_vdpa_virtqueue *mvq;
>
> -   if (!is_index_valid(mvdev, idx) || is_ctrl_vq_idx(mvdev, idx))
> +   if (!is_index_valid(mvdev, idx))
> return;
>
> +if (is_ctrl_vq_idx(mvdev, idx)) {
> +struct mlx5_control_vq *cvq = >cvq;
> +
> +cvq->vring.vring.num = num;
> +return;
> +}
> +
> mvq = >vqs[idx];
> mvq->num_ent = num;
>  }
> @@ -2963,7 +2968,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
> u16 idx = cvq->vring.last_avail_idx;
>
> err = vringh_init_iotlb(>vring, mvdev->actual_features,
> -   MLX5_CVQ_MAX_ENT, false,
> +   cvq->vring.vring.num, false,
> (struct vring_desc 
> *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail 
> *)(uintptr_t)cvq->driver_addr,
> (struct vring_used 
> *)(uintptr_t)cvq->device_addr);
> --
> 2.39.3
>




Re: [PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll

2024-02-18 Thread Dmitry Baryshkov
On Sun, 18 Feb 2024 at 22:58, Luca Weiss  wrote:
>
> Follow the updated bindings and use a QCS404-specific compatible for the
> HFPLL on this SoC.
>
> Signed-off-by: Luca Weiss 
> ---
> Please note that this patch should only land after the patch for the
> clock driver.
> ---
>  arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Dmitry Baryshkov 


-- 
With best wishes
Dmitry



Re: [PATCH v3] modules: wait do_free_init correctly

2024-02-18 Thread Andrew Morton
On Sat, 17 Feb 2024 16:18:10 +0800 Changbin Du  wrote:

> The synchronization here is just to ensure the module init's been freed
> before doing W+X checking. But the commit 1a7b7d922081 ("modules: Use
> vmalloc special flag") moves do_free_init() into a global workqueue
> instead of call_rcu(). So now rcu_barrier() can not ensure that do_free_init
> has completed. We should wait it via flush_work().
> 
> Without this fix, we still could encounter false positive reports in
> W+X checking, and the rcu synchronization is unnecessary which can
> introduce significant delay.
> 
> Eric Chanudet reports that the rcu_barrier introduces ~0.1s delay on a
> PREEMPT_RT kernel.
>   [0.291444] Freeing unused kernel memory: 5568K
>   [0.402442] Run /sbin/init as init process
> 
> With this fix, the above delay can be eliminated.

Thanks, I'll queue this as a delta, to be folded into the base patch
prior to upstreaming.

I added a Tested-by: Eric, if that's OK by him?



[PATCH v2 2/3] clk: qcom: hfpll: Add QCS404-specific compatible

2024-02-18 Thread Luca Weiss
It doesn't appear that the configuration is for the HFPLL is generic, so
add a qcs404-specific compatible and rename the existing struct to
qcs404.

Keep qcom,hfpll in the driver for compatibility with old dtbs.

Signed-off-by: Luca Weiss 
---
 drivers/clk/qcom/hfpll.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/qcom/hfpll.c b/drivers/clk/qcom/hfpll.c
index dac27e31ef60..b0b0cb074b4a 100644
--- a/drivers/clk/qcom/hfpll.c
+++ b/drivers/clk/qcom/hfpll.c
@@ -14,7 +14,7 @@
 #include "clk-regmap.h"
 #include "clk-hfpll.h"
 
-static const struct hfpll_data hdata = {
+static const struct hfpll_data qcs404 = {
.mode_reg = 0x00,
.l_reg = 0x04,
.m_reg = 0x08,
@@ -84,10 +84,12 @@ static const struct hfpll_data msm8976_cci = {
 };
 
 static const struct of_device_id qcom_hfpll_match_table[] = {
-   { .compatible = "qcom,hfpll", .data =  },
{ .compatible = "qcom,msm8976-hfpll-a53", .data = _a53 },
{ .compatible = "qcom,msm8976-hfpll-a72", .data = _a72 },
{ .compatible = "qcom,msm8976-hfpll-cci", .data = _cci },
+   { .compatible = "qcom,qcs404-hfpll", .data =  },
+   /* Deprecated in bindings */
+   { .compatible = "qcom,hfpll", .data =  },
{ }
 };
 MODULE_DEVICE_TABLE(of, qcom_hfpll_match_table);

-- 
2.43.2




[PATCH v2 1/3] dt-bindings: clock: qcom,hfpll: Convert to YAML

2024-02-18 Thread Luca Weiss
Convert the .txt documentation to .yaml with some adjustments.

* APQ8064/IPQ8064/MSM8960 compatibles are dropped since their HFPLLs are
  a part of GCC so there is no need for a separate compat entry.
* Change the MSM8974 compatible to follow the updated naming schema.
  Theis compatible is not used upstream yet.
* Add qcs404-hfpll. QCS404 currently uses qcom,hfpll. Mark that as
  deprecated since every SoC appears to need different driver data so
  "qcom,hfpll" makes no sense to keep

Signed-off-by: Luca Weiss 
---
 .../devicetree/bindings/clock/qcom,hfpll.txt   | 63 
 .../devicetree/bindings/clock/qcom,hfpll.yaml  | 69 ++
 2 files changed, 69 insertions(+), 63 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/qcom,hfpll.txt 
b/Documentation/devicetree/bindings/clock/qcom,hfpll.txt
deleted file mode 100644
index 5769cbbe76be..
--- a/Documentation/devicetree/bindings/clock/qcom,hfpll.txt
+++ /dev/null
@@ -1,63 +0,0 @@
-High-Frequency PLL (HFPLL)
-
-PROPERTIES
-
-- compatible:
-   Usage: required
-   Value type: :
-   shall contain only one of the following. The generic
-   compatible "qcom,hfpll" should be also included.
-
-"qcom,hfpll-ipq8064", "qcom,hfpll"
-"qcom,hfpll-apq8064", "qcom,hfpll"
-"qcom,hfpll-msm8974", "qcom,hfpll"
-"qcom,hfpll-msm8960", "qcom,hfpll"
-"qcom,msm8976-hfpll-a53", "qcom,hfpll"
-"qcom,msm8976-hfpll-a72", "qcom,hfpll"
-"qcom,msm8976-hfpll-cci", "qcom,hfpll"
-
-- reg:
-   Usage: required
-   Value type: 
-   Definition: address and size of HPLL registers. An optional second
-   element specifies the address and size of the alias
-   register region.
-
-- clocks:
-   Usage: required
-   Value type: 
-   Definition: reference to the xo clock.
-
-- clock-names:
-   Usage: required
-   Value type: 
-   Definition: must be "xo".
-
-- clock-output-names:
-   Usage: required
-   Value type: 
-   Definition: Name of the PLL. Typically hfpllX where X is a CPU number
-   starting at 0. Otherwise hfpll_Y where Y is more specific
-   such as "l2".
-
-Example:
-
-1) An HFPLL for the L2 cache.
-
-   clock-controller@f9016000 {
-   compatible = "qcom,hfpll-ipq8064", "qcom,hfpll";
-   reg = <0xf9016000 0x30>;
-   clocks = <_board>;
-   clock-names = "xo";
-   clock-output-names = "hfpll_l2";
-   };
-
-2) An HFPLL for CPU0. This HFPLL has the alias register region.
-
-   clock-controller@f908a000 {
-   compatible = "qcom,hfpll-ipq8064", "qcom,hfpll";
-   reg = <0xf908a000 0x30>, <0xf900a000 0x30>;
-   clocks = <_board>;
-   clock-names = "xo";
-   clock-output-names = "hfpll0";
-   };
diff --git a/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml 
b/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml
new file mode 100644
index ..8cb1c164f760
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/qcom,hfpll.yaml
@@ -0,0 +1,69 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/qcom,hfpll.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm High-Frequency PLL
+
+maintainers:
+  - Bjorn Andersson 
+
+description:
+  The HFPLL is used as CPU PLL on various Qualcomm SoCs.
+
+properties:
+  compatible:
+oneOf:
+  - enum:
+  - qcom,msm8974-hfpll
+  - qcom,msm8976-hfpll-a53
+  - qcom,msm8976-hfpll-a72
+  - qcom,msm8976-hfpll-cci
+  - qcom,qcs404-hfpll
+  - const: qcom,hfpll
+deprecated: true
+
+  reg:
+items:
+  - description: HFPLL registers
+  - description: Alias register region
+minItems: 1
+
+  '#clock-cells':
+const: 0
+
+  clocks:
+items:
+  - description: board XO clock
+
+  clock-names:
+items:
+  - const: xo
+
+  clock-output-names:
+description:
+  Name of the PLL. Typically hfpllX where X is a CPU number starting at 0.
+  Otherwise hfpll_Y where Y is more specific such as "l2".
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - '#clock-cells'
+  - clocks
+  - clock-names
+  - clock-output-names
+
+additionalProperties: false
+
+examples:
+  - |
+clock-controller@f908a000 {
+compatible = "qcom,msm8974-hfpll";
+reg = <0xf908a000 0x30>, <0xf900a000 0x30>;
+#clock-cells = <0>;
+clock-output-names = "hfpll0";
+clocks = <_board>;
+clock-names = "xo";
+};

-- 
2.43.2




[PATCH v2 3/3] arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll

2024-02-18 Thread Luca Weiss
Follow the updated bindings and use a QCS404-specific compatible for the
HFPLL on this SoC.

Signed-off-by: Luca Weiss 
---
Please note that this patch should only land after the patch for the
clock driver.
---
 arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi 
b/arch/arm64/boot/dts/qcom/qcs404.dtsi
index 2f2eeaf2e945..4133d5a19deb 100644
--- a/arch/arm64/boot/dts/qcom/qcs404.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs404.dtsi
@@ -1308,7 +1308,7 @@ apcs_glb: mailbox@b011000 {
};
 
apcs_hfpll: clock-controller@b016000 {
-   compatible = "qcom,hfpll";
+   compatible = "qcom,qcs404-hfpll";
reg = <0x0b016000 0x30>;
#clock-cells = <0>;
clock-output-names = "apcs_hfpll";

-- 
2.43.2




[PATCH v2 0/3] Convert qcom,hfpll documentation to yaml + related changes

2024-02-18 Thread Luca Weiss
Finally touch the hfpll doc and convert it to yaml, and do some related
changes along the way.

Signed-off-by: Luca Weiss 
---
Changes in v2:
- Drop APQ8064/IPQ8064/MSM8960 compatibles (Dmitry)
- Update example to MSM8974 since IPQ8064 is dropped
- Clean up dt binding description (Krzysztof)
- Remove second example in docs (Krzysztof)
- Try to clear up the text and content around deprecating qcom,hfpll
- Link to v1: 
https://lore.kernel.org/r/20231231-hfpll-yaml-v1-0-359d44a4e...@z3ntu.xyz

---
Luca Weiss (3):
  dt-bindings: clock: qcom,hfpll: Convert to YAML
  clk: qcom: hfpll: Add QCS404-specific compatible
  arm64: dts: qcom: qcs404: Use qcs404-hfpll compatible for hfpll

 .../devicetree/bindings/clock/qcom,hfpll.txt   | 63 
 .../devicetree/bindings/clock/qcom,hfpll.yaml  | 69 ++
 arch/arm64/boot/dts/qcom/qcs404.dtsi   |  2 +-
 drivers/clk/qcom/hfpll.c   |  6 +-
 4 files changed, 74 insertions(+), 66 deletions(-)
---
base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de
change-id: 20231231-hfpll-yaml-9266f012365c

Best regards,
-- 
Luca Weiss 




Re: [RFC PATCH v2 1/6] dt-bindings: mfd: add entry for Marvell 88PM886 PMIC

2024-02-18 Thread Karel Balej
Rob Herring, 2024-02-15T08:20:52-06:00:
> >  .../bindings/mfd/marvell,88pm88x.yaml | 74 +++
>
> Filename should match the compatible.
>
> In general, drop the 'x' wildcard.

By "in general", do you mean for the drivers code also?

As I have mentioned in the commit message for the driver, the other
device is very similar and if the support for it was ever to be added
(which I personally currently have no interest in), I believe it would
make sense to extend this driver. Is it then still prefered to call it
all just 88pm886 now?

> > +properties:
> > +  compatible:
> > +const: marvell,88pm886-a1

So the file should be called marvell,88pm886-a1.yaml, correct? Again, is
it prefered to call it like this even if the other revision could
eventually be added (again, I am not interested in that right now
personally)? I mean, if I was implementing support for both revisions
right now, it would make sense to name it just marvell,88pm886.yaml, no?

Thank you, kind regards,
K. B.



Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-18 Thread Yishai Hadas

On 16/02/2024 5:01, ank...@nvidia.com wrote:

From: Ankit Agrawal 

NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device
for the on-chip GPU that is the logical OS representation of the
internal proprietary chip-to-chip cache coherent interconnect.

The device is peculiar compared to a real PCI device in that whilst
there is a real 64b PCI BAR1 (comprising region 2 & region 3) on the
device, it is not used to access device memory once the faster
chip-to-chip interconnect is initialized (occurs at the time of host
system boot). The device memory is accessed instead using the chip-to-chip
interconnect that is exposed as a contiguous physically addressable
region on the host. This device memory aperture can be obtained from host
ACPI table using device_property_read_u64(), according to the FW
specification. Since the device memory is cache coherent with the CPU,
it can be mmap into the user VMA with a cacheable mapping using
remap_pfn_range() and used like a regular RAM. The device memory
is not added to the host kernel, but mapped directly as this reduces
memory wastage due to struct pages.

There is also a requirement of a minimum reserved 1G uncached region
(termed as resmem) to support the Multi-Instance GPU (MIG) feature [1].
This is to work around a HW defect. Based on [2], the requisite properties
(uncached, unaligned access) can be achieved through a VM mapping (S1)
of NORMAL_NC and host (S2) mapping with MemAttr[2:0]=0b101. To provide
a different non-cached property to the reserved 1G region, it needs to
be carved out from the device memory and mapped as a separate region
in Qemu VMA with pgprot_writecombine(). pgprot_writecombine() sets the
Qemu VMA page properties (pgprot) as NORMAL_NC.

Provide a VFIO PCI variant driver that adapts the unique device memory
representation into a more standard PCI representation facing userspace.

The variant driver exposes these two regions - the non-cached reserved
(resmem) and the cached rest of the device memory (termed as usemem) as
separate VFIO 64b BAR regions. This is divergent from the baremetal
approach, where the device memory is exposed as a device memory region.
The decision for a different approach was taken in view of the fact that
it would necessiate additional code in Qemu to discover and insert those
regions in the VM IPA, along with the additional VM ACPI DSDT changes to
communicate the device memory region IPA to the VM workloads. Moreover,
this behavior would have to be added to a variety of emulators (beyond
top of tree Qemu) out there desiring grace hopper support.

Since the device implements 64-bit BAR0, the VFIO PCI variant driver
maps the uncached carved out region to the next available PCI BAR (i.e.
comprising of region 2 and 3). The cached device memory aperture is
assigned BAR region 4 and 5. Qemu will then naturally generate a PCI
device in the VM with the uncached aperture reported as BAR2 region,
the cacheable as BAR4. The variant driver provides emulation for these
fake BARs' PCI config space offset registers.

The hardware ensures that the system does not crash when the memory
is accessed with the memory enable turned off. It synthesis ~0 reads
and dropped writes on such access. So there is no need to support the
disablement/enablement of BAR through PCI_COMMAND config space register.

The memory layout on the host looks like the following:
devmem (memlength)
|--|
|-cached|--NC--|
|   |
usemem.memphys  resmem.memphys

PCI BARs need to be aligned to the power-of-2, but the actual memory on the
device may not. A read or write access to the physical address from the
last device PFN up to the next power-of-2 aligned physical address
results in reading ~0 and dropped writes. Note that the GPU device
driver [6] is capable of knowing the exact device memory size through
separate means. The device memory size is primarily kept in the system
ACPI tables for use by the VFIO PCI variant module.

Note that the usemem memory is added by the VM Nvidia device driver [5]
to the VM kernel as memblocks. Hence make the usable memory size memblock
(MEMBLK_SIZE) aligned. This is a hardwired ABI value between the GPU FW and
VFIO driver. The VM device driver make use of the same value for its
calculation to determine USEMEM size.

Currently there is no provision in KVM for a S2 mapping with
MemAttr[2:0]=0b101, but there is an ongoing effort to provide the same [3].
As previously mentioned, resmem is mapped pgprot_writecombine(), that
sets the Qemu VMA page properties (pgprot) as NORMAL_NC. Using the
proposed changes in [3] and [4], KVM marks the region with
MemAttr[2:0]=0b101 in S2.

If the device memory properties are not present, the driver registers the
vfio-pci-core function pointers. Since there are no ACPI memory properties
generated for the VM, the variant driver 

Re: [PATCH v3 25/47] filelock: convert __locks_insert_block, conflict and deadlock checks to use file_lock_core

2024-02-18 Thread Jeff Layton
On Wed, 2024-01-31 at 18:02 -0500, Jeff Layton wrote:
> Have both __locks_insert_block and the deadlock and conflict checking
> functions take a struct file_lock_core pointer instead of a struct
> file_lock one. Also, change posix_locks_deadlock to return bool.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/locks.c | 132 
> +
>  1 file changed, 72 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 1e8b943bd7f9..0dc1c9da858c 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -757,39 +757,41 @@ EXPORT_SYMBOL(locks_delete_block);
>   * waiters, and add beneath any waiter that blocks the new waiter.
>   * Thus wakeups don't happen until needed.
>   */
> -static void __locks_insert_block(struct file_lock *blocker,
> -  struct file_lock *waiter,
> -  bool conflict(struct file_lock *,
> -struct file_lock *))
> +static void __locks_insert_block(struct file_lock *blocker_fl,
> +  struct file_lock *waiter_fl,
> +  bool conflict(struct file_lock_core *,
> +struct file_lock_core *))
>  {
> - struct file_lock *fl;
> - BUG_ON(!list_empty(>c.flc_blocked_member));
> + struct file_lock_core *blocker = _fl->c;
> + struct file_lock_core *waiter = _fl->c;
> + struct file_lock_core *flc;
>  
> + BUG_ON(!list_empty(>flc_blocked_member));
>  new_blocker:
> - list_for_each_entry(fl, >c.flc_blocked_requests,
> - c.flc_blocked_member)
> - if (conflict(fl, waiter)) {
> - blocker =  fl;
> + list_for_each_entry(flc, >flc_blocked_requests, 
> flc_blocked_member)
> + if (conflict(flc, waiter)) {
> + blocker =  flc;
>   goto new_blocker;
>   }
> - waiter->c.flc_blocker = blocker;
> - list_add_tail(>c.flc_blocked_member,
> -   >c.flc_blocked_requests);
> - if ((blocker->c.flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
> - locks_insert_global_blocked(>c);
> + waiter->flc_blocker = file_lock(blocker);
> + list_add_tail(>flc_blocked_member,
> +   >flc_blocked_requests);
>  
> - /* The requests in waiter->fl_blocked are known to conflict with
> + if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == (FL_POSIX|FL_OFDLCK))

Christian,

There is a bug in the above delta. That should read:

if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)

I suspect that is the cause of the performance regression noted by the
KTR.

I believe the bug is fairly harmless -- it's just putting OFD locks into
the global hash when it doesn't need to, which probably slows down
deadlock checking. I'm going to spin up a patch and test it today, but I
wanted to give you a heads up.

I'll send the patch later today or tomorrow.
 
> + locks_insert_global_blocked(waiter);
> +
> + /* The requests in waiter->flc_blocked are known to conflict with
>* waiter, but might not conflict with blocker, or the requests
>* and lock which block it.  So they all need to be woken.
>*/
> - __locks_wake_up_blocks(>c);
> + __locks_wake_up_blocks(waiter);
>  }
>  
>  /* Must be called with flc_lock held. */
>  static void locks_insert_block(struct file_lock *blocker,
>  struct file_lock *waiter,
> -bool conflict(struct file_lock *,
> -  struct file_lock *))
> +bool conflict(struct file_lock_core *,
> +  struct file_lock_core *))
>  {
>   spin_lock(_lock_lock);
>   __locks_insert_block(blocker, waiter, conflict);
> @@ -846,12 +848,12 @@ locks_delete_lock_ctx(struct file_lock *fl, struct 
> list_head *dispose)
>  /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
>   * checks for shared/exclusive status of overlapping locks.
>   */
> -static bool locks_conflict(struct file_lock *caller_fl,
> -struct file_lock *sys_fl)
> +static bool locks_conflict(struct file_lock_core *caller_flc,
> +struct file_lock_core *sys_flc)
>  {
> - if (lock_is_write(sys_fl))
> + if (sys_flc->flc_type == F_WRLCK)
>   return true;
> - if (lock_is_write(caller_fl))
> + if (caller_flc->flc_type == F_WRLCK)
>   return true;
>   return false;
>  }
> @@ -859,20 +861,23 @@ static bool locks_conflict(struct file_lock *caller_fl,
>  /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
>   * checking before calling the locks_conflict().
>   */
> -static bool posix_locks_conflict(struct file_lock *caller_fl,
> -  struct file_lock *sys_fl)
> +static 

Re: [PATCH v18 2/3] vfio/pci: rename and export range_intersect_range

2024-02-18 Thread Yishai Hadas

On 16/02/2024 5:01, ank...@nvidia.com wrote:

From: Ankit Agrawal 

range_intersect_range determines an overlap between two ranges. If an
overlap, the helper function returns the overlapping offset and size.

The VFIO PCI variant driver emulates the PCI config space BAR offset
registers. These offset may be accessed for read/write with a variety
of lengths including sub-word sizes from sub-word offsets. The driver
makes use of this helper function to read/write the targeted part of
the emulated register.

Make this a vfio_pci_core function, rename and export as GPL. Also
update references in virtio driver.

Reviewed-by: Kevin Tian 
Signed-off-by: Ankit Agrawal 
---
  drivers/vfio/pci/vfio_pci_config.c | 42 +
  drivers/vfio/pci/virtio/main.c | 72 +++---
  include/linux/vfio_pci_core.h  |  5 +++
  3 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index 672a1804af6a..e2e6173a3375 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1966,3 +1966,45 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device 
*vdev, char __user *buf,
  
  	return done;

  }
+
+/**
+ * vfio_pci_core_range_intersect_range() - Determine overlap between a buffer
+ *and register offset ranges.
+ * @buf_start: start offset of the buffer
+ * @buf_cnt:   number of buffer bytes.


You could drop the '.' at the end to be consistent with the other.


+ * @reg_start: start register offset
+ * @reg_cnt:   number of register bytes
+ * @buf_offset:start offset of overlap in the buffer
+ * @intersect_count:   number of overlapping bytes
+ * @register_offset:   start offset of overlap in register
+ *
+ * Returns: true if there is overlap, false if not.
+ * The overlap start and size is returned through function args.
+ */
+bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
+loff_t reg_start, size_t reg_cnt,
+loff_t *buf_offset,
+size_t *intersect_count,
+size_t *register_offset)
+{
+   if (buf_start <= reg_start &&
+   buf_start + buf_cnt > reg_start) {
+   *buf_offset = reg_start - buf_start;
+   *intersect_count = min_t(size_t, reg_cnt,
+buf_start + buf_cnt - reg_start);
+   *register_offset = 0;
+   return true;
+   }
+
+   if (buf_start > reg_start &&
+   buf_start < reg_start + reg_cnt) {
+   *buf_offset = 0;
+   *intersect_count = min_t(size_t, buf_cnt,
+reg_start + reg_cnt - buf_start);
+   *register_offset = buf_start - reg_start;
+   return true;
+   }
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_range_intersect_range);
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
index d5af683837d3..b5d3a8c5bbc9 100644
--- a/drivers/vfio/pci/virtio/main.c
+++ b/drivers/vfio/pci/virtio/main.c
@@ -132,33 +132,6 @@ virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device 
*virtvdev,
return ret ? ret : count;
  }
  
-static bool range_intersect_range(loff_t range1_start, size_t count1,

- loff_t range2_start, size_t count2,
- loff_t *start_offset,
- size_t *intersect_count,
- size_t *register_offset)
-{
-   if (range1_start <= range2_start &&
-   range1_start + count1 > range2_start) {
-   *start_offset = range2_start - range1_start;
-   *intersect_count = min_t(size_t, count2,
-range1_start + count1 - range2_start);
-   *register_offset = 0;
-   return true;
-   }
-
-   if (range1_start > range2_start &&
-   range1_start < range2_start + count2) {
-   *start_offset = 0;
-   *intersect_count = min_t(size_t, count1,
-range2_start + count2 - range1_start);
-   *register_offset = range1_start - range2_start;
-   return true;
-   }
-
-   return false;
-}
-
  static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
char __user *buf, size_t count,
loff_t *ppos)
@@ -178,16 +151,18 @@ static ssize_t virtiovf_pci_read_config(struct 
vfio_device *core_vdev,
if (ret < 0)
return ret;
  
-	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),

- _offset, _count, _offset)) 
{
+   if 

Re: [PATCH v18 1/3] vfio/pci: rename and export do_io_rw()

2024-02-18 Thread Yishai Hadas

On 16/02/2024 5:01, ank...@nvidia.com wrote:

From: Ankit Agrawal 

do_io_rw() is used to read/write to the device MMIO. The grace hopper
VFIO PCI variant driver require this functionality to read/write to
its memory.

Rename this as vfio_pci_core functions and export as GPL.

Reviewed-by: Kevin Tian 
Signed-off-by: Ankit Agrawal 
---
  drivers/vfio/pci/vfio_pci_rdwr.c | 16 +---
  include/linux/vfio_pci_core.h|  5 -
  2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 07fea08ea8a2..03b8f7ada1ac 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -96,10 +96,10 @@ VFIO_IOREAD(32)
   * reads with -1.  This is intended for handling MSI-X vector tables and
   * leftover space for ROM BARs.
   */
-static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
-   void __iomem *io, char __user *buf,
-   loff_t off, size_t count, size_t x_start,
-   size_t x_end, bool iswrite)
+ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool 
test_mem,
+  void __iomem *io, char __user *buf,
+  loff_t off, size_t count, size_t x_start,
+  size_t x_end, bool iswrite)
  {
ssize_t done = 0;
int ret;
@@ -201,6 +201,7 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, 
bool test_mem,
  
  	return done;

  }
+EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
  
  int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)

  {
@@ -279,8 +280,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, 
char __user *buf,
x_end = vdev->msix_offset + vdev->msix_size;
}
  
-	done = do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos,

-   count, x_start, x_end, iswrite);
+   done = vfio_pci_core_do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, 
buf, pos,
+ count, x_start, x_end, iswrite);
  
  	if (done >= 0)

*ppos += done;
@@ -348,7 +349,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, 
char __user *buf,
 * probing, so we don't currently worry about access in relation
 * to the memory enable bit in the command register.
 */
-   done = do_io_rw(vdev, false, iomem, buf, off, count, 0, 0, iswrite);
+   done = vfio_pci_core_do_io_rw(vdev, false, iomem, buf, off, count,
+ 0, 0, iswrite);
  
  	vga_put(vdev->pdev, rsrc);
  
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h

index 85e84b92751b..cf9480a31f3e 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -130,7 +130,10 @@ void vfio_pci_core_finish_enable(struct 
vfio_pci_core_device *vdev);
  int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
-
+ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool 
test_mem,
+  void __iomem *io, char __user *buf,
+  loff_t off, size_t count, size_t x_start,
+  size_t x_end, bool iswrite);
  #define VFIO_IOWRITE_DECLATION(size) \
  int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev,\
bool test_mem, u##size val, void __iomem *io);


Reviewed-by: Yishai Hadas 



Re: [PATCH 1/4] iommu: constify pointer to bus_type

2024-02-18 Thread Baolu Lu

On 2024/2/16 22:40, Krzysztof Kozlowski wrote:

Make pointer to bus_type a pointer to const for code safety.

Signed-off-by: Krzysztof Kozlowski
---
  drivers/iommu/iommu-priv.h | 5 +++--
  drivers/iommu/iommu.c  | 5 +++--
  2 files changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Lu Baolu 

Best regards,
baolu



[PATCH] bus: mhi: host: Change the trace string for the userspace tools mapping

2024-02-18 Thread Krishna chaitanya chundru
User space tools can't map strings if we use directly, as the string
address is internal to kernel.

So add trace point strings for the user space tools to map strings
properly.

Signed-off-by: Krishna chaitanya chundru 
---
 drivers/bus/mhi/host/main.c  | 4 ++--
 drivers/bus/mhi/host/trace.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index 2d38f6005da6..15d657af9b5b 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -1340,7 +1340,7 @@ static int mhi_update_channel_state(struct mhi_controller 
*mhi_cntrl,
enum mhi_cmd_type cmd = MHI_CMD_NOP;
int ret;
 
-   trace_mhi_channel_command_start(mhi_cntrl, mhi_chan, to_state, 
"Updating");
+   trace_mhi_channel_command_start(mhi_cntrl, mhi_chan, to_state, 
TPS("Updating"));
switch (to_state) {
case MHI_CH_STATE_TYPE_RESET:
write_lock_irq(_chan->lock);
@@ -1407,7 +1407,7 @@ static int mhi_update_channel_state(struct mhi_controller 
*mhi_cntrl,
write_unlock_irq(_chan->lock);
}
 
-   trace_mhi_channel_command_end(mhi_cntrl, mhi_chan, to_state, "Updated");
+   trace_mhi_channel_command_end(mhi_cntrl, mhi_chan, to_state, 
TPS("Updated"));
 exit_channel_update:
mhi_cntrl->runtime_put(mhi_cntrl);
mhi_device_put(mhi_cntrl->mhi_dev);
diff --git a/drivers/bus/mhi/host/trace.h b/drivers/bus/mhi/host/trace.h
index d12a98d44272..368515dcb22d 100644
--- a/drivers/bus/mhi/host/trace.h
+++ b/drivers/bus/mhi/host/trace.h
@@ -84,6 +84,8 @@ DEV_ST_TRANSITION_LIST
 #define dev_st_trans(a, b) { DEV_ST_TRANSITION_##a, b },
 #define dev_st_trans_end(a, b) { DEV_ST_TRANSITION_##a, b }
 
+#define TPS(x) tracepoint_string(x)
+
 TRACE_EVENT(mhi_gen_tre,
 
TP_PROTO(struct mhi_controller *mhi_cntrl, struct mhi_chan *mhi_chan,

---
base-commit: ceeb64f41fe6a1eb9fc56d583983a81f8f3dd058
change-id: 20240218-ftrace_string-7677762aa63c

Best regards,
-- 
Krishna chaitanya chundru