date:20240408

Re: [PATCH v3] vp_vdpa: fix the method of calculating vectors

2024-04-08 Thread Michael S. Tsirkin

better subject:

 vp_vdpa: don't allocate unused msix vectors

to make it clear it's not a bugfix.




more comments below, but most importantly this
looks like it adds a bug.

On Tue, Apr 09, 2024 at 09:49:35AM +0800, lyx634449800 wrote:
> When there is a ctlq and it doesn't require interrupt
> callbacks,the original method of calculating vectors
> wastes hardware msi or msix resources as well as system
> IRQ resources.
> 
> When conducting performance testing using testpmd in the
> guest os, it was found that the performance was lower compared
> to directly using vfio-pci to passthrough the device
> 
> In scenarios where the virtio device in the guest os does
> not utilize interrupts, the vdpa driver still configures
> the hardware's msix vector. Therefore, the hardware still
> sends interrupts to the host os. Because of this unnecessary
> action by the hardware, hardware performance decreases, and
> it also affects the performance of the host os.
> 
> Before modification:(interrupt mode)
>  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
>  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
>  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
>  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
> 
> After modification:(interrupt mode)
>  32:  0  0  1  7   PCI-MSI 32768-edge  vp-vdpa[:00:02.0]-0
>  33: 36  0  3  0   PCI-MSI 32769-edge  vp-vdpa[:00:02.0]-1
>  34:  0  0  0  0   PCI-MSI 32770-edge  vp-vdpa[:00:02.0]-config
> 
> Before modification:(virtio pmd mode for guest os)
>  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
>  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
>  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
>  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
> 
> After modification:(virtio pmd mode for guest os)
>  32: 0  0  0   0   PCI-MSI 32768-edge   vp-vdpa[:00:02.0]-config
> 
> To verify the use of the virtio PMD mode in the guest operating
> system, the following patch needs to be applied to QEMU:
> https://lore.kernel.org/all/20240408073311.2049-1-yuxue@jaguarmicro.com
> 
> Signed-off-by: lyx634449800 


Bad S.O.B format. Should be

Signed-off-by: Real Name 


> ---
> 
> V3: delete unused variables and add validation records
> V2: fix when allocating IRQs, scan all queues
> 
>  drivers/vdpa/virtio_pci/vp_vdpa.c | 35 +++
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> b/drivers/vdpa/virtio_pci/vp_vdpa.c
> index df5f4a3bccb5..cd3aeb3b8f21 100644
> --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> @@ -160,22 +160,31 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
>   struct pci_dev *pdev = mdev->pci_dev;
>   int i, ret, irq;
>   int queues = vp_vdpa->queues;
> - int vectors = queues + 1;
> + int msix_vec, allocated_vectors = 0;


I would actually call allocated_vectors -> vectors, make the patch
smaller.

>  
> - ret = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> - if (ret != vectors) {
> + for (i = 0; i < queues; i++) {
> + if (vp_vdpa->vring[i].cb.callback)
> + allocated_vectors++;
> + }
> + allocated_vectors = allocated_vectors + 1;

better: 
allocated_vectors++; /* extra one for config */

> +
> + ret = pci_alloc_irq_vectors(pdev, allocated_vectors, allocated_vectors,
> + PCI_IRQ_MSIX);
> + if (ret != allocated_vectors) {
>   dev_err(>dev,
>   "vp_vdpa: fail to allocate irq vectors want %d but 
> %d\n",
> - vectors, ret);
> + allocated_vectors, ret);
>   return ret;
>   }
> -
> - vp_vdpa->vectors = vectors;
> + vp_vdpa->vectors = allocated_vectors;
>  
>   for (i = 0; i < queues; i++) {
> + if (!vp_vdpa->vring[i].cb.callback)
> + continue;
> +
>   snprintf(vp_vdpa->vring[i].msix_name, VP_VDPA_NAME_SIZE,
>   "vp-vdpa[%s]-%d\n", pci_name(pdev), i);
> - irq = pci_irq_vector(pdev, i);
> + irq = pci_irq_vector(pdev, msix_vec);

using uninitialized msix_vec here?

I would expect compiler to warn about it.


pay attention to compiler warnings pls.


>   ret = devm_request_irq(>dev, irq,
>  vp_vdpa_vq_handler,
>  0, vp_vdpa->vring[i].msix_name,
> @@ -185,23 +194,23 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
>   "vp_vdpa: fail to request irq for vq %d\n", i);
>   goto err;
>   }
> - vp_modern_queue_vector(mdev, i, i);
> + vp_modern_queue_vector(mdev, i, msix_vec);
>

Re: [PATCH v3 11/25] media: i2c: imx258: Add get_selection for pixel array information

2024-04-08 Thread Luis Garcia

On 4/3/24 12:46, Pavel Machek wrote:
> Hi!
> 
>> Libcamera requires the cropping information for each mode, so
>> add this information to the driver.
> 
>> @@ -116,6 +124,9 @@ struct imx258_mode {
>>  u32 link_freq_index;
>>  /* Default register values */
>>  struct imx258_reg_list reg_list;
>> +
>> +/* Analog crop rectangle. */
> 
> No need for "." at the end, as it is not above.
> 
>> +struct v4l2_rect crop;
>>  };
> 
> If the crop is same in all modes, should we have it in common place?
> 
> Best regards,
>   Pavel

I gave this a try similar to what was done on imx219 but its having
issues and makes a bunch of changes to the following patches so we
might need to hold off on that change for now.

[PATCH v2 fs/proc/bootconfig 1/2] fs/proc: remove redundant comments from /proc/bootconfig

2024-04-08 Thread Paul E. McKenney

From: Zhenhua Huang 

commit 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to
/proc/bootconfig") adds bootloader argument comments into /proc/bootconfig.

/proc/bootconfig shows boot_command_line[] multiple times following
every xbc key value pair, that's duplicated and not necessary.
Remove redundant ones.

Output before and after the fix is like:
key1 = value1
*bootloader argument comments*
key2 = value2
*bootloader argument comments*
key3 = value3
*bootloader argument comments*
...

key1 = value1
key2 = value2
key3 = value3
*bootloader argument comments*
...

Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to 
/proc/bootconfig")
Signed-off-by: Zhenhua Huang 
Signed-off-by: Paul E. McKenney 
Cc: 
Cc: 
Acked-by: Masami Hiramatsu (Google) 
---
 fs/proc/bootconfig.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
index 902b326e1e560..e5635a6b127b0 100644
--- a/fs/proc/bootconfig.c
+++ b/fs/proc/bootconfig.c
@@ -62,12 +62,12 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
size)
break;
dst += ret;
}
-   if (ret >= 0 && boot_command_line[0]) {
-   ret = snprintf(dst, rest(dst, end), "# Parameters from 
bootloader:\n# %s\n",
-  boot_command_line);
-   if (ret > 0)
-   dst += ret;
-   }
+   }
+   if (ret >= 0 && boot_command_line[0]) {
+   ret = snprintf(dst, rest(dst, end), "# Parameters from 
bootloader:\n# %s\n",
+  boot_command_line);
+   if (ret > 0)
+   dst += ret;
}
 out:
kfree(key);
-- 
2.40.1

[PATCH v2 fs/proc/bootconfig 2/2] fs/proc: Skip bootloader comment if no embedded kernel parameters

2024-04-08 Thread Paul E. McKenney

From: Masami Hiramatsu 

If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file.  This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Paul E. McKenney 
---
 fs/proc/bootconfig.c   | 2 +-
 include/linux/bootconfig.h | 1 +
 init/main.c| 5 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
index e5635a6b127b0..87dcaae32ff87 100644
--- a/fs/proc/bootconfig.c
+++ b/fs/proc/bootconfig.c
@@ -63,7 +63,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
size)
dst += ret;
}
}
-   if (ret >= 0 && boot_command_line[0]) {
+   if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) {
ret = snprintf(dst, rest(dst, end), "# Parameters from 
bootloader:\n# %s\n",
   boot_command_line);
if (ret > 0)
diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index ca73940e26df8..e5ee2c694401e 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -10,6 +10,7 @@
 #ifdef __KERNEL__
 #include 
 #include 
+bool __init cmdline_has_extra_options(void);
 #else /* !__KERNEL__ */
 /*
  * NOTE: This is only for tools/bootconfig, because tools/bootconfig will
diff --git a/init/main.c b/init/main.c
index 2ca52474d0c30..881f6230ee59e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -487,6 +487,11 @@ static int __init warn_bootconfig(char *str)
 
 early_param("bootconfig", warn_bootconfig);
 
+bool __init cmdline_has_extra_options(void)
+{
+   return extra_command_line || extra_init_args;
+}
+
 /* Change NUL term back to "=", to make "param" the whole string. */
 static void __init repair_env_string(char *param, char *val)
 {
-- 
2.40.1

[PATCH v2 fs/proc/bootconfig 0/2] remove redundant comments from /proc/bootconfig

2024-04-08 Thread Paul E. McKenney

Hello!

This series removes redundant comments from /proc/bootconfig:

1.  fs/proc: remove redundant comments from /proc/bootconfig,
courtesy of Zhenhua Huang.

2.  fs/proc: Skip bootloader comment if no embedded kernel parameters,
courtesy of Masami Hiramatsu.

Thanx, Paul



 b/fs/proc/bootconfig.c   |   12 ++--
 b/include/linux/bootconfig.h |1 +
 b/init/main.c|5 +
 fs/proc/bootconfig.c |2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

Re: [PATCH v10 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

2024-04-08 Thread Haitao Huang


On Mon, 08 Apr 2024 17:37:10 -0500, Huang, Kai  wrote:




On 9/04/2024 6:03 am, Haitao Huang wrote:


 The misc root cgroup is a static similar to sgx_cg_root. So  
misc_cg_root()  won't be NULL
However, based on how css_misc() was check NULL, I suppose  
sgx_get_current_cg() may be NULL when cgroup is disabled (again not  
100% sure but we handle it anyway)


Could you help to check?  Sorry I am busy on something else thus won't  
be able to do any actual check.



It's always non-NULL based on testing.

It's hard for me to say definitely by reading the code. But IIUC  
cgroup_disable command-line only blocks operations in /sys/fs/cgroup so  
user space can't set up controllers and config limits, etc., for the  
diasabled ones. Each task->cgroups would still have a non-NULL pointer to  
the static root object for each cgroup that is enabled by KConfig, so  
get_current_misc_cg() thus  sgx_get_current_cg() should not return NULL  
regardless 'cgroup_disable=misc'.


Maybe @Michal or @tj can confirm?

Thanks
Haitao

Re: [PATCH] drivers/virtio: delayed configuration descriptor flags

2024-04-08 Thread Jason Wang

On Tue, Apr 9, 2024 at 1:27 AM ni.liqiang  wrote:
>
> In our testing of the virtio hardware accelerator, we found that
> configuring the flags of the descriptor after addr and len,
> as implemented in DPDK, seems to be more friendly to the hardware.
>
> In our Virtio hardware implementation tests, using the default
> open-source code, the hardware's bulk reads ensure performance
> but correctness is compromised. If we refer to the implementation code
> of DPDK, placing the flags configuration of the descriptor
> after addr and len, virtio backend can function properly based on
> our hardware accelerator.
>
> I am somewhat puzzled by this. From a software process perspective,
> it seems that there should be no difference whether
> the flags configuration of the descriptor is before or after addr and len.
> However, this is not the case according to experimental test results.
> We would like to know if such a change in the configuration order
> is reasonable and acceptable?

Harmless but a hint that there's a bug in your hardware?

More below

>
> Thanks.
>
> Signed-off-by: ni.liqiang 
> Reviewed-by: jin.qi 
> Tested-by: jin.qi 
> Cc: ni.liqiang 
> ---
>  drivers/virtio/virtio_ring.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 6f7e5010a673..bea2c2fb084e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1472,15 +1472,16 @@ static inline int virtqueue_add_packed(struct 
> virtqueue *_vq,
> flags = cpu_to_le16(vq->packed.avail_used_flags |
> (++c == total_sg ? 0 : VRING_DESC_F_NEXT) 
> |
> (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
> -   if (i == head)
> -   head_flags = flags;
> -   else
> -   desc[i].flags = flags;
>
> desc[i].addr = cpu_to_le64(addr);
> desc[i].len = cpu_to_le32(sg->length);
> desc[i].id = cpu_to_le16(id);
>
> +   if (i == head)
> +   head_flags = flags;
> +   else
> +   desc[i].flags = flags;
> +

The head_flags are not updated at this time, so descriptors are not
available, the device should not start to read the chain:

/*
 * A driver MUST NOT make the first descriptor in the list
 * available before all subsequent descriptors comprising
 * the list are made available.
 */
virtio_wmb(vq->weak_barriers);
vq->packed.vring.desc[head].flags = head_flags;
vq->num_added += descs_used;

It looks like your device does speculation reading on the descriptors
that are not available?

Thanks

> if (unlikely(vq->use_dma_api)) {
> vq->packed.desc_extra[curr].addr = addr;
> vq->packed.desc_extra[curr].len = sg->length;
> --
> 2.34.1
>
>

Re: [PATCH v3] vp_vdpa: fix the method of calculating vectors

2024-04-08 Thread Jason Wang

On Tue, Apr 9, 2024 at 9:49 AM lyx634449800  wrote:
>
> When there is a ctlq and it doesn't require interrupt
> callbacks,the original method of calculating vectors
> wastes hardware msi or msix resources as well as system
> IRQ resources.
>
> When conducting performance testing using testpmd in the
> guest os, it was found that the performance was lower compared
> to directly using vfio-pci to passthrough the device
>
> In scenarios where the virtio device in the guest os does
> not utilize interrupts, the vdpa driver still configures
> the hardware's msix vector. Therefore, the hardware still
> sends interrupts to the host os. Because of this unnecessary
> action by the hardware, hardware performance decreases, and
> it also affects the performance of the host os.
>
> Before modification:(interrupt mode)
>  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
>  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
>  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
>  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
>
> After modification:(interrupt mode)
>  32:  0  0  1  7   PCI-MSI 32768-edge  vp-vdpa[:00:02.0]-0
>  33: 36  0  3  0   PCI-MSI 32769-edge  vp-vdpa[:00:02.0]-1
>  34:  0  0  0  0   PCI-MSI 32770-edge  vp-vdpa[:00:02.0]-config
>
> Before modification:(virtio pmd mode for guest os)
>  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
>  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
>  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
>  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
>
> After modification:(virtio pmd mode for guest os)
>  32: 0  0  0   0   PCI-MSI 32768-edge   vp-vdpa[:00:02.0]-config
>
> To verify the use of the virtio PMD mode in the guest operating
> system, the following patch needs to be applied to QEMU:
> https://lore.kernel.org/all/20240408073311.2049-1-yuxue@jaguarmicro.com
>
> Signed-off-by: lyx634449800 
> ---
>
> V3: delete unused variables and add validation records
> V2: fix when allocating IRQs, scan all queues
>
>  drivers/vdpa/virtio_pci/vp_vdpa.c | 35 +++
>  1 file changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> b/drivers/vdpa/virtio_pci/vp_vdpa.c
> index df5f4a3bccb5..cd3aeb3b8f21 100644
> --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> @@ -160,22 +160,31 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
> struct pci_dev *pdev = mdev->pci_dev;
> int i, ret, irq;
> int queues = vp_vdpa->queues;
> -   int vectors = queues + 1;
> +   int msix_vec, allocated_vectors = 0;
>
> -   ret = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> -   if (ret != vectors) {
> +   for (i = 0; i < queues; i++) {
> +   if (vp_vdpa->vring[i].cb.callback)
> +   allocated_vectors++;
> +   }
> +   allocated_vectors = allocated_vectors + 1;
> +
> +   ret = pci_alloc_irq_vectors(pdev, allocated_vectors, 
> allocated_vectors,
> +   PCI_IRQ_MSIX);
> +   if (ret != allocated_vectors) {
> dev_err(>dev,
> "vp_vdpa: fail to allocate irq vectors want %d but 
> %d\n",
> -   vectors, ret);
> +   allocated_vectors, ret);
> return ret;
> }
> -
> -   vp_vdpa->vectors = vectors;
> +   vp_vdpa->vectors = allocated_vectors;
>
> for (i = 0; i < queues; i++) {
> +   if (!vp_vdpa->vring[i].cb.callback)
> +   continue;
> +
> snprintf(vp_vdpa->vring[i].msix_name, VP_VDPA_NAME_SIZE,
> "vp-vdpa[%s]-%d\n", pci_name(pdev), i);
> -   irq = pci_irq_vector(pdev, i);
> +   irq = pci_irq_vector(pdev, msix_vec);
> ret = devm_request_irq(>dev, irq,
>vp_vdpa_vq_handler,
>0, vp_vdpa->vring[i].msix_name,
> @@ -185,23 +194,23 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
> "vp_vdpa: fail to request irq for vq %d\n", 
> i);
> goto err;
> }
> -   vp_modern_queue_vector(mdev, i, i);
> +   vp_modern_queue_vector(mdev, i, msix_vec);
> vp_vdpa->vring[i].irq = irq;
> +   msix_vec++;
> }
>
> snprintf(vp_vdpa->msix_name, VP_VDPA_NAME_SIZE, 
> "vp-vdpa[%s]-config\n",
> -pci_name(pdev));
> -   irq = pci_irq_vector(pdev, queues);
> +   pci_name(pdev));
> +   irq = pci_irq_vector(pdev, msix_vec);
> ret = devm_request_irq(>dev, irq, vp_vdpa_config_handler, 0,
>

[PATCH v3] vp_vdpa: fix the method of calculating vectors

2024-04-08 Thread lyx634449800

When there is a ctlq and it doesn't require interrupt
callbacks,the original method of calculating vectors
wastes hardware msi or msix resources as well as system
IRQ resources.

When conducting performance testing using testpmd in the
guest os, it was found that the performance was lower compared
to directly using vfio-pci to passthrough the device

In scenarios where the virtio device in the guest os does
not utilize interrupts, the vdpa driver still configures
the hardware's msix vector. Therefore, the hardware still
sends interrupts to the host os. Because of this unnecessary
action by the hardware, hardware performance decreases, and
it also affects the performance of the host os.

Before modification:(interrupt mode)
 32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
 33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
 34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
 35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config

After modification:(interrupt mode)
 32:  0  0  1  7   PCI-MSI 32768-edge  vp-vdpa[:00:02.0]-0
 33: 36  0  3  0   PCI-MSI 32769-edge  vp-vdpa[:00:02.0]-1
 34:  0  0  0  0   PCI-MSI 32770-edge  vp-vdpa[:00:02.0]-config

Before modification:(virtio pmd mode for guest os)
 32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
 33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
 34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
 35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config

After modification:(virtio pmd mode for guest os)
 32: 0  0  0   0   PCI-MSI 32768-edge   vp-vdpa[:00:02.0]-config

To verify the use of the virtio PMD mode in the guest operating
system, the following patch needs to be applied to QEMU:
https://lore.kernel.org/all/20240408073311.2049-1-yuxue@jaguarmicro.com

Signed-off-by: lyx634449800 
---

V3: delete unused variables and add validation records
V2: fix when allocating IRQs, scan all queues

 drivers/vdpa/virtio_pci/vp_vdpa.c | 35 +++
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index df5f4a3bccb5..cd3aeb3b8f21 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -160,22 +160,31 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
struct pci_dev *pdev = mdev->pci_dev;
int i, ret, irq;
int queues = vp_vdpa->queues;
-   int vectors = queues + 1;
+   int msix_vec, allocated_vectors = 0;
 
-   ret = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
-   if (ret != vectors) {
+   for (i = 0; i < queues; i++) {
+   if (vp_vdpa->vring[i].cb.callback)
+   allocated_vectors++;
+   }
+   allocated_vectors = allocated_vectors + 1;
+
+   ret = pci_alloc_irq_vectors(pdev, allocated_vectors, allocated_vectors,
+   PCI_IRQ_MSIX);
+   if (ret != allocated_vectors) {
dev_err(>dev,
"vp_vdpa: fail to allocate irq vectors want %d but 
%d\n",
-   vectors, ret);
+   allocated_vectors, ret);
return ret;
}
-
-   vp_vdpa->vectors = vectors;
+   vp_vdpa->vectors = allocated_vectors;
 
for (i = 0; i < queues; i++) {
+   if (!vp_vdpa->vring[i].cb.callback)
+   continue;
+
snprintf(vp_vdpa->vring[i].msix_name, VP_VDPA_NAME_SIZE,
"vp-vdpa[%s]-%d\n", pci_name(pdev), i);
-   irq = pci_irq_vector(pdev, i);
+   irq = pci_irq_vector(pdev, msix_vec);
ret = devm_request_irq(>dev, irq,
   vp_vdpa_vq_handler,
   0, vp_vdpa->vring[i].msix_name,
@@ -185,23 +194,23 @@ static int vp_vdpa_request_irq(struct vp_vdpa *vp_vdpa)
"vp_vdpa: fail to request irq for vq %d\n", i);
goto err;
}
-   vp_modern_queue_vector(mdev, i, i);
+   vp_modern_queue_vector(mdev, i, msix_vec);
vp_vdpa->vring[i].irq = irq;
+   msix_vec++;
}
 
snprintf(vp_vdpa->msix_name, VP_VDPA_NAME_SIZE, "vp-vdpa[%s]-config\n",
-pci_name(pdev));
-   irq = pci_irq_vector(pdev, queues);
+   pci_name(pdev));
+   irq = pci_irq_vector(pdev, msix_vec);
ret = devm_request_irq(>dev, irq, vp_vdpa_config_handler, 0,
   vp_vdpa->msix_name, vp_vdpa);
if (ret) {
dev_err(>dev,
-   "vp_vdpa: fail to request irq for vq %d\n", i);
+   "vp_vdpa: fail to request irq for config\n");
goto err;
}
-

Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio

2024-04-08 Thread Hou Tao

Hi,

On 4/8/2024 3:45 PM, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
>> From: Hou Tao 
>>
>> Hi,
>>
>> The patch set aims to fix the warning related to an abnormal size
>> parameter of kmalloc() in virtiofs. The warning occurred when attempting
>> to insert a 10MB sized kernel module kept in a virtiofs with cache
>> disabled. As analyzed in patch #1, the root cause is that the length of
>> the read buffer is no limited, and the read buffer is passed directly to
>> virtiofs through out_args[0].value. Therefore patch #1 limits the
>> length of the read buffer passed to virtiofs by using max_pages. However
>> it is not enough, because now the maximal value of max_pages is 256.
>> Consequently, when reading a 10MB-sized kernel module, the length of the
>> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
>> try to allocate 2MB from memory subsystem. The request for 2MB of
>> physically contiguous memory significantly stress the memory subsystem
>> and may fail indefinitely on hosts with fragmented memory. To address
>> this, patch #2~#5 use scattered pages in a bio_vec to replace the
>> kmalloc-allocated bounce buffer when the length of the bounce buffer for
>> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
>> allocation of the bounce buffer and sg array in virtiofs is that
>> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
>> Therefore the last patch uses GFP_NOFS for the allocation of both sg
>> array and bounce buffer when initiated by the kworker. For more details,
>> please check the individual patches.
>>
>> As usual, comments are always welcome.
>>
>> Change Log:
> Bernd should I just merge the patchset as is?
> It seems to fix a real problem and no one has the
> time to work on a better fix  WDYT?

Sorry for the long delay. I am just start to prepare for v3. In v3, I
plan to avoid the unnecessary memory copy between fuse args and bio_vec.
Will post it before next week.
>
>
>> v2:
>>   * limit the length of ITER_KVEC dio by max_pages instead of the
>> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>> dio being consistent with other rw operations.
>>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>> backed by scattered pages when the length of the bounce buffer for
>> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>> fragmented memory, the KVEC_ITER dio can be handled normally by
>> virtiofs. (Bernd Schubert)
>>   * merge the GFP_NOFS patch [1] into this patch-set and use
>> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>> (Benjamin Coddington)
>>
>> v1: 
>> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
>>
>> [1]: 
>> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/
>>
>> Hou Tao (6):
>>   fuse: limit the length of ITER_KVEC dio by max_pages
>>   virtiofs: move alloc/free of argbuf into separated helpers
>>   virtiofs: factor out more common methods for argbuf
>>   virtiofs: support bounce buffer backed by scattered pages
>>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
>>   virtiofs: use GFP_NOFS when enqueuing request through kworker
>>
>>  fs/fuse/file.c  |  12 +-
>>  fs/fuse/virtio_fs.c | 336 +---
>>  2 files changed, 296 insertions(+), 52 deletions(-)
>>
>> -- 
>> 2.29.2

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot

Hi Michal,

kernel test robot noticed the following build errors:

[auto build test ERROR on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: arm-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 
8b3b4a92adee40483c27f26c478a384cd69c6f05)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090903.3jz667sn-...@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/sysctl.c:23:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:522:36: warning: arithmetic between different 
enumeration types ('enum node_stat_item' and 'enum lru_list') 
[-Wenum-enum-conversion]
 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
 |   ~~~ ^ ~~~
>> kernel/sysctl.c:1819:14: error: initializing 'void *' with an expression of 
>> type 'const int *' discards qualifiers 
>> [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
1819 | .extra2 = _max_max,
 |   ^~~~
   1 warning and 1 error generated.


vim +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reb

Re: [PATCH fs/proc/bootconfig] remove redundant comments from /proc/bootconfig

2024-04-08 Thread Paul E. McKenney

On Tue, Apr 09, 2024 at 09:25:40AM +0900, Masami Hiramatsu wrote:
> On Mon, 8 Apr 2024 12:18:19 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Sat, Apr 06, 2024 at 11:11:08AM +0900, Masami Hiramatsu wrote:
> > > On Thu, 4 Apr 2024 21:25:41 -0700
> > > "Paul E. McKenney"  wrote:
> > > 
> > > > On Fri, Apr 05, 2024 at 11:57:45AM +0900, Masami Hiramatsu wrote:
> > > > > On Fri, 5 Apr 2024 10:23:24 +0900
> > > > > Masami Hiramatsu (Google)  wrote:
> > > > > 
> > > > > > On Thu, 4 Apr 2024 10:43:14 -0700
> > > > > > "Paul E. McKenney"  wrote:
> > > > > > 
> > > > > > > On Thu, Apr 04, 2024 at 08:55:22AM +0900, Masami Hiramatsu wrote:
> > > > > > > > On Wed, 3 Apr 2024 12:16:28 -0700
> > > > > > > > "Paul E. McKenney"  wrote:
> > > > > > > > 
> > > > > > > > > commit 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > > > comment to
> > > > > > > > > /proc/bootconfig") adds bootloader argument comments into 
> > > > > > > > > /proc/bootconfig.
> > > > > > > > > 
> > > > > > > > > /proc/bootconfig shows boot_command_line[] multiple times 
> > > > > > > > > following
> > > > > > > > > every xbc key value pair, that's duplicated and not necessary.
> > > > > > > > > Remove redundant ones.
> > > > > > > > > 
> > > > > > > > > Output before and after the fix is like:
> > > > > > > > > key1 = value1
> > > > > > > > > *bootloader argument comments*
> > > > > > > > > key2 = value2
> > > > > > > > > *bootloader argument comments*
> > > > > > > > > key3 = value3
> > > > > > > > > *bootloader argument comments*
> > > > > > > > > ...
> > > > > > > > > 
> > > > > > > > > key1 = value1
> > > > > > > > > key2 = value2
> > > > > > > > > key3 = value3
> > > > > > > > > *bootloader argument comments*
> > > > > > > > > ...
> > > > > > > > > 
> > > > > > > > > Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > > > comment to /proc/bootconfig")
> > > > > > > > > Signed-off-by: Zhenhua Huang 
> > > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > > Cc: Masami Hiramatsu 
> > > > > > > > > Cc: 
> > > > > > > > > Cc: 
> > > > > > > > 
> > > > > > > > OOps, good catch! Let me pick it.
> > > > > > > > 
> > > > > > > > Acked-by: Masami Hiramatsu (Google) 
> > > > > > > 
> > > > > > > Thank you, and I have applied your ack and pulled this into its 
> > > > > > > own
> > > > > > > bootconfig.2024.04.04a.
> > > > > > > 
> > > > > > > My guess is that you will push this via your own tree, and so I 
> > > > > > > will
> > > > > > > drop my copy as soon as yours hits -next.
> > > > > > 
> > > > > > Thanks! I would like to make PR this soon as bootconfig fixes for 
> > > > > > v6.9-rc2.
> > > > > 
> > > > > Hmm I found that this always shows the command line comment in
> > > > > /proc/bootconfig even without "bootconfig" option.
> > > > > I think that is easier for user-tools but changes the behavior and
> > > > > a bit redundant.
> > > > > 
> > > > > We should skip showing this original argument comment if bootconfig is
> > > > > not initialized (no "bootconfig" in cmdline) as it is now.
> > > > 
> > > > So something like this folded into that patch?
> > > 
> > > Hm, I expected just checking it in the loop as below.
> > > 
> > > 
> > > diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> > > index e5635a6b127b..98e0780f7e07 100644
> > > --- a/fs/proc/bootconfig.c
> > > +++ b/fs/proc/bootconfig.c
> > > @@ -27,6 +27,7 @@ static int __init copy_xbc_key_value_list(char *dst, 
> > > size_t size)
> > >  {
> > >   struct xbc_node *leaf, *vnode;
> > >   char *key, *end = dst + size;
> > > + bool empty = true;
> > >   const char *val;
> > >   char q;
> > >   int ret = 0;
> > > @@ -62,8 +63,9 @@ static int __init copy_xbc_key_value_list(char *dst, 
> > > size_t size)
> > >   break;
> > >   dst += ret;
> > >   }
> > > + empty = false;
> > >   }
> > > - if (ret >= 0 && boot_command_line[0]) {
> > > + if (!empty && ret >= 0 && boot_command_line[0]) {
> > >   ret = snprintf(dst, rest(dst, end), "# Parameters from 
> > > bootloader:\n# %s\n",
> > >  boot_command_line);
> > >   if (ret > 0)
> > > 
> > > 
> > > 
> > > The difference is checking "bootconfig" cmdline option or checking
> > > the "bootconfig" is actually empty. So the behaviors are different
> > > when the "bootconfig" is specified but there is no bootconfig data.
> > 
> > Ah, understood, the point is to avoid the comment in cases where its
> > content would be identical to /proc/cmdline.
> > 
> > > Another idea is to check whether the cmdline is actually updated by
> > > bootconfig and show original one only if it is updated.
> > > (I think this fits the purpose of the original patch better.)
> > > 
> > > 
> > > diff --git

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot

Hi Michal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: alpha-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090849.mgj3z0xi-...@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/sysctl.c:1819:35: warning: initialization discards 'const' qualifier 
>> from pointer target type [-Wdiscarded-qualifiers]
1819 | .extra2 = _max_max,
 |   ^


vim +/const +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reboot-cmd",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1653 .data   
= reboot_command,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1654 .maxlen 
= 256,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1655 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1656 .proc_handler   
= proc_dostring,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1657 },
^1da177e4c3f41 Linus Torvalds  2005-04-16  1658 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1659 .procname

Re: [PATCHv2 1/3] uprobe: Add uretprobe syscall to speed up return probe

2024-04-08 Thread Google

On Mon, 8 Apr 2024 18:02:13 +0200
Jiri Olsa  wrote:

> On Fri, Apr 05, 2024 at 01:02:30PM +0200, Oleg Nesterov wrote:
> > On 04/05, Jiri Olsa wrote:
> > >
> > > On Fri, Apr 05, 2024 at 10:22:03AM +0900, Masami Hiramatsu wrote:
> > > >
> > > > I think this expects setjmp/longjmp as below
> > > >
> > > > foo() { <- retprobe1
> > > > setjmp()
> > > > bar() { <- retprobe2
> > > > longjmp()
> > > > }
> > > > } <- return to trampoline
> > > >
> > > > In this case, we need to skip retprobe2's instance.
> > 
> > Yes,
> > 
> > > > My concern is, if we can not find appropriate return instance, what 
> > > > happen?
> > > > e.g.
> > > >
> > > > foo() { <-- retprobe1
> > > >bar() { # sp is decremented
> > > >sys_uretprobe() <-- ??
> > > > }
> > > > }
> > > >
> > > > It seems sys_uretprobe() will handle retprobe1 at that point instead of
> > > > SIGILL.
> > >
> > > yes, and I think it's fine, you get the consumer called in wrong place,
> > > but it's your fault and kernel won't crash
> > 
> > Agreed.
> > 
> > With or without this patch userpace can also do
> > 
> > foo() { <-- retprobe1
> > bar() {
> > jump to xol_area
> > }
> > }
> > 
> > handle_trampoline() will handle retprobe1.
> > 
> > > this can be fixed by checking the syscall is called from the trampoline
> > > and prevent handle_trampoline call if it's not
> > 
> > Yes, but I still do not think this makes a lot of sense. But I won't argue.
> > 
> > And what should sys_uretprobe() do if it is not called from the trampoline?
> > I'd prefer force_sig(SIGILL) to punish the abuser ;) OK, OK, EINVAL.
> 
> so the similar behaviour with int3 ends up with immediate SIGTRAP
> and not invoking pending uretprobe consumers, like:
> 
>   - setup uretprobe for foo
>   - foo() {
>   executes int 3 -> sends SIGTRAP
> }
> 
> because the int3 handler checks if it got executed from the uretprobe's
> trampoline.. if not it treats that int3 as regular trap

Yeah, that is consistent behavior. Sounds good to me.

> 
> while for uretprobe syscall we have at the moment following behaviour:
> 
>   - setup uretprobe for foo
>   - foo() {
>  uretprobe_syscall -> executes foo's uretprobe consumers
> }
>   - at some point we get to the 'ret' instruction that jump into uretprobe
> trampoline and the uretprobe_syscall won't find pending uretprobe and
> will send SIGILL
> 
> 
> so I think we should mimic int3 behaviour and:
> 
>   - setup uretprobe for foo
>   - foo() {
>  uretprobe_syscall -> check if we got executed from uretprobe's
>  trampoline and send SIGILL if that's not the case

OK, this looks good to me.

> 
> I think it's better to have the offending process killed right away,
> rather than having more undefined behaviour, waiting for final 'ret'
> instruction that jumps to uretprobe trampoline and causes SIGILL
> 
> > 
> > I agree very much with Andrii,
> > 
> >sigreturn()  exists only to allow the implementation of signal 
> > handlers.  It should never be
> >called directly.  Details of the arguments (if any) passed to 
> > sigreturn() vary depending  on
> >the architecture.
> > 
> > this is how sys_uretprobe() should be treated/documented.
> 
> yes, will include man page patch in new version

And please follow Documentation/process/adding-syscalls.rst in new version,
then we can avoid repeating the same discussion :-)

Thank you!

> 
> jirka
> 
> > 
> > sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> > and return -EINVAL if this addr is not valid. But why?
> > 
> > Oleg.
> > 


-- 
Masami Hiramatsu (Google)

Re: [PATCH fs/proc/bootconfig] remove redundant comments from /proc/bootconfig

2024-04-08 Thread Google

On Mon, 8 Apr 2024 12:18:19 -0700
"Paul E. McKenney"  wrote:

> On Sat, Apr 06, 2024 at 11:11:08AM +0900, Masami Hiramatsu wrote:
> > On Thu, 4 Apr 2024 21:25:41 -0700
> > "Paul E. McKenney"  wrote:
> > 
> > > On Fri, Apr 05, 2024 at 11:57:45AM +0900, Masami Hiramatsu wrote:
> > > > On Fri, 5 Apr 2024 10:23:24 +0900
> > > > Masami Hiramatsu (Google)  wrote:
> > > > 
> > > > > On Thu, 4 Apr 2024 10:43:14 -0700
> > > > > "Paul E. McKenney"  wrote:
> > > > > 
> > > > > > On Thu, Apr 04, 2024 at 08:55:22AM +0900, Masami Hiramatsu wrote:
> > > > > > > On Wed, 3 Apr 2024 12:16:28 -0700
> > > > > > > "Paul E. McKenney"  wrote:
> > > > > > > 
> > > > > > > > commit 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > > comment to
> > > > > > > > /proc/bootconfig") adds bootloader argument comments into 
> > > > > > > > /proc/bootconfig.
> > > > > > > > 
> > > > > > > > /proc/bootconfig shows boot_command_line[] multiple times 
> > > > > > > > following
> > > > > > > > every xbc key value pair, that's duplicated and not necessary.
> > > > > > > > Remove redundant ones.
> > > > > > > > 
> > > > > > > > Output before and after the fix is like:
> > > > > > > > key1 = value1
> > > > > > > > *bootloader argument comments*
> > > > > > > > key2 = value2
> > > > > > > > *bootloader argument comments*
> > > > > > > > key3 = value3
> > > > > > > > *bootloader argument comments*
> > > > > > > > ...
> > > > > > > > 
> > > > > > > > key1 = value1
> > > > > > > > key2 = value2
> > > > > > > > key3 = value3
> > > > > > > > *bootloader argument comments*
> > > > > > > > ...
> > > > > > > > 
> > > > > > > > Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > > comment to /proc/bootconfig")
> > > > > > > > Signed-off-by: Zhenhua Huang 
> > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > Cc: Masami Hiramatsu 
> > > > > > > > Cc: 
> > > > > > > > Cc: 
> > > > > > > 
> > > > > > > OOps, good catch! Let me pick it.
> > > > > > > 
> > > > > > > Acked-by: Masami Hiramatsu (Google) 
> > > > > > 
> > > > > > Thank you, and I have applied your ack and pulled this into its own
> > > > > > bootconfig.2024.04.04a.
> > > > > > 
> > > > > > My guess is that you will push this via your own tree, and so I will
> > > > > > drop my copy as soon as yours hits -next.
> > > > > 
> > > > > Thanks! I would like to make PR this soon as bootconfig fixes for 
> > > > > v6.9-rc2.
> > > > 
> > > > Hmm I found that this always shows the command line comment in
> > > > /proc/bootconfig even without "bootconfig" option.
> > > > I think that is easier for user-tools but changes the behavior and
> > > > a bit redundant.
> > > > 
> > > > We should skip showing this original argument comment if bootconfig is
> > > > not initialized (no "bootconfig" in cmdline) as it is now.
> > > 
> > > So something like this folded into that patch?
> > 
> > Hm, I expected just checking it in the loop as below.
> > 
> > 
> > diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> > index e5635a6b127b..98e0780f7e07 100644
> > --- a/fs/proc/bootconfig.c
> > +++ b/fs/proc/bootconfig.c
> > @@ -27,6 +27,7 @@ static int __init copy_xbc_key_value_list(char *dst, 
> > size_t size)
> >  {
> > struct xbc_node *leaf, *vnode;
> > char *key, *end = dst + size;
> > +   bool empty = true;
> > const char *val;
> > char q;
> > int ret = 0;
> > @@ -62,8 +63,9 @@ static int __init copy_xbc_key_value_list(char *dst, 
> > size_t size)
> > break;
> > dst += ret;
> > }
> > +   empty = false;
> > }
> > -   if (ret >= 0 && boot_command_line[0]) {
> > +   if (!empty && ret >= 0 && boot_command_line[0]) {
> > ret = snprintf(dst, rest(dst, end), "# Parameters from 
> > bootloader:\n# %s\n",
> >boot_command_line);
> > if (ret > 0)
> > 
> > 
> > 
> > The difference is checking "bootconfig" cmdline option or checking
> > the "bootconfig" is actually empty. So the behaviors are different
> > when the "bootconfig" is specified but there is no bootconfig data.
> 
> Ah, understood, the point is to avoid the comment in cases where its
> content would be identical to /proc/cmdline.
> 
> > Another idea is to check whether the cmdline is actually updated by
> > bootconfig and show original one only if it is updated.
> > (I think this fits the purpose of the original patch better.)
> > 
> > 
> > diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> > index e5635a6b127b..95d6a231210c 100644
> > --- a/fs/proc/bootconfig.c
> > +++ b/fs/proc/bootconfig.c
> > @@ -10,6 +10,9 @@
> >  #include 
> >  #include 
> >  
> > +/* defined in main/init.c */
> > +bool __init cmdline_has_extra_options(void);
> > +
> >  static

Re: [PATCH v2] rethook: Remove warning messages printed for finding return address of a frame.

2024-04-08 Thread Google

On Mon,  8 Apr 2024 10:51:40 -0700
Kui-Feng Lee  wrote:

> The function rethook_find_ret_addr() prints a warning message and returns 0
> when the target task is running and is not the "current" task in order to
> prevent the incorrect return address, although it still may return an
> incorrect address.
> 
> However, the warning message turns into noise when BPF profiling programs
> call bpf_get_task_stack() on running tasks in a firm with a large number of
> hosts.
> 
> The callers should be aware and willing to take the risk of receiving an
> incorrect return address from a task that is currently running other than
> the "current" one. A warning is not needed here as the callers are intent
> on it.
> 

OK, looks good to me. Let me pick it to probes/for-next. Thanks!

> Acked-by: Andrii Nakryiko 
> Acked-by: John Fastabend 
> Signed-off-by: Kui-Feng Lee 
> 
> ---
> Changes from v1:
> 
>  - Rephrased the commit log.
> 
>- Removed the confusing last part of the first paragraph.
> 
>- Removed "frequently" from the 2nd paragraph, replaced by "a firm with
>  a large number of hosts".
> 
> v1: https://lore.kernel.org/all/20240401191621.758056-1-thinker...@gmail.com/
> ---
>  kernel/trace/rethook.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index fa03094e9e69..4297a132a7ae 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
> @@ -248,7 +248,7 @@ unsigned long rethook_find_ret_addr(struct task_struct 
> *tsk, unsigned long frame
>   if (WARN_ON_ONCE(!cur))
>   return 0;
>  
> - if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
> + if (tsk != current && task_is_running(tsk))
>   return 0;
>  
>   do {
> -- 
> 2.34.1
> 


-- 
Masami Hiramatsu (Google)

Re: [PATCH v10 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

2024-04-08 Thread Huang, Kai





On 9/04/2024 6:03 am, Haitao Huang wrote:




The misc root cgroup is a static similar to sgx_cg_root. So 
misc_cg_root()  won't be NULL
However, based on how css_misc() was check NULL, I suppose 
sgx_get_current_cg() may be NULL when cgroup is disabled (again not 100% 
sure but we handle it anyway)


Could you help to check?  Sorry I am busy on something else thus won't 
be able to do any actual check.

Re: UBSAN array-index-out-of-bounds in read_blocklist

2024-04-08 Thread Phillip Lougher


On 08/04/2024 01:56, Ubisectech Sirius wrote:

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.8. Attached to the 
email were a PoC file of the issue.

Hello,

I have found the cause of the problem and fixed it.

A patch will be sent to the LKML shortly.

Thanks

Phillip

Squashfs maintainer.



Stack dump:
[ cut here ]
UBSAN: array-index-out-of-bounds in fs/squashfs/file.c:256:34
index 4294967295 is out of range for type 'meta_entry [127]'
CPU: 1 PID: 16927 Comm: syz-executor.0 Not tainted 6.8.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
  
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106
  ubsan_epilogue lib/ubsan.c:217 [inline]
  __ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:347
  fill_meta_index fs/squashfs/file.c:256 [inline]
  read_blocklist+0x175e/0x1790 fs/squashfs/file.c:333
  squashfs_readahead+0x14fa/0x22d0 fs/squashfs/file.c:591
  read_pages+0x1a2/0xd70 mm/readahead.c:160
  page_cache_ra_unbounded+0x477/0x5f0 mm/readahead.c:269
  do_page_cache_ra mm/readahead.c:299 [inline]
  page_cache_ra_order+0x772/0xa00 mm/readahead.c:544
  do_sync_mmap_readahead mm/filemap.c:3153 [inline]
  filemap_fault+0x1691/0x3390 mm/filemap.c:3245
  __do_fault+0x108/0x490 mm/memory.c:4396
  do_read_fault mm/memory.c:4758 [inline]
  do_fault mm/memory.c:4888 [inline]
  do_pte_missing mm/memory.c:3745 [inline]
  handle_pte_fault mm/memory.c:5164 [inline]
  __handle_mm_fault+0x340a/0x48b0 mm/memory.c:5305
  handle_mm_fault+0x3c2/0xa40 mm/memory.c:5470
  do_user_addr_fault+0x2ed/0x1010 arch/x86/mm/fault.c:1355
  handle_page_fault arch/x86/mm/fault.c:1498 [inline]
  exc_page_fault+0x99/0x180 arch/x86/mm/fault.c:1554
  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0033:0x7f8446a28202
Code: 48 63 c3 5b c3 0f 1f 00 f3 0f 1e fa 48 89 7c 24 f0 48 89 74 24 e8 48 89 54 24 
e0 48 8b 4c 24 f0 48 8b 54 24 e8 48 8b 74 24 e0 <8b> 41 40 23 81 00 01 00 00 f3 
0f 6f 06 c1 e0 06 48 01 d0 0f 11 00
RSP: 002b:7f8447755018 EFLAGS: 00010212
RAX: 7f8446a281e0 RBX: 7f8446bcbf80 RCX: 20ff
RDX:  RSI: 2000 RDI: 20ff
RBP: 7f8446af14a6 R08:  R09: 
R10: 20ff R11: 2000 R12: 
R13: 000b R14: 7f8446bcbf80 R15: 7f8447735000
  
---[ end trace ]---

Thank you for taking the time to read this email and we look forward to working 
with you further.

[PATCH v14 4/4] remoteproc: zynqmp: parse TCM from device tree

2024-04-08 Thread Tanmay Shah

ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v14:
  - Add Versal platform support
  - Add Versal-NET platform support
  - Maintain backward compatibility for ZynqMP platform and use hardcode
TCM addresses
  - Configure TCM based on xlnx,tcm-mode property for Versal
  - Avoid TCM configuration if that property isn't available in DT 

 drivers/remoteproc/xlnx_r5_remoteproc.c | 173 ++--
 1 file changed, 132 insertions(+), 41 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 0f942440b4e2..504492f930ac 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -300,36 +300,6 @@ static void zynqmp_r5_rproc_kick(struct rproc *rproc, int 
vqid)
dev_warn(dev, "failed to send message\n");
 }
 
-/*
- * zynqmp_r5_set_mode()
- *
- * set RPU cluster and TCM operation mode
- *
- * @r5_core: pointer to zynqmp_r5_core type object
- * @fw_reg_val: value expected by firmware to configure RPU cluster mode
- * @tcm_mode: value expected by fw to configure TCM mode (lockstep or split)
- *
- * Return: 0 for success and < 0 for failure
- */
-static int zynqmp_r5_set_mode(struct zynqmp_r5_core *r5_core,
- enum rpu_oper_mode fw_reg_val,
- enum rpu_tcm_comb tcm_mode)
-{
-   int ret;
-
-   ret = zynqmp_pm_set_rpu_mode(r5_core->pm_domain_id, fw_reg_val);
-   if (ret < 0) {
-   dev_err(r5_core->dev, "failed to set RPU mode\n");
-   return ret;
-   }
-
-   ret = zynqmp_pm_set_tcm_config(r5_core->pm_domain_id, tcm_mode);
-   if (ret < 0)
-   dev_err(r5_core->dev, "failed to configure TCM\n");
-
-   return ret;
-}
-
 /*
  * zynqmp_r5_rproc_start()
  * @rproc: single R5 core's corresponding rproc instance
@@ -761,6 +731,103 @@ static struct zynqmp_r5_core 
*zynqmp_r5_add_rproc_core(struct device *cdev)
return ERR_PTR(ret);
 }
 
+static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster)
+{
+   int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count;
+   struct of_phandle_args out_args;
+   struct zynqmp_r5_core *r5_core;
+   struct platform_device *cpdev;
+   struct mem_bank_data *tcm;
+   struct device_node *np;
+   struct resource *res;
+   u64 abs_addr, size;
+   struct device *dev;
+
+   for (i = 0; i < cluster->core_count; i++) {
+   r5_core = cluster->r5_cores[i];
+   dev = r5_core->dev;
+   np = r5_core->np;
+
+   pd_count = of_count_phandle_with_args(np, "power-domains",
+ "#power-domain-cells");
+
+   if (pd_count <= 0) {
+   dev_err(dev, "invalid power-domains property, %d\n", 
pd_count);
+   return -EINVAL;
+   }
+
+   /* First entry in power-domains list is for r5 core, rest for 
TCM. */
+   tcm_bank_count = pd_count - 1;
+
+   if (tcm_bank_count <= 0) {
+   dev_err(dev, "invalid TCM count %d\n", tcm_bank_count);
+   return -EINVAL;
+   }
+
+   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
+ sizeof(struct mem_bank_data 
*),
+ GFP_KERNEL);
+   if (!r5_core->tcm_banks)
+   return -ENOMEM;
+
+   r5_core->tcm_bank_count = tcm_bank_count;
+   for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, 
tcm_pd_idx++) {
+   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
+  GFP_KERNEL);
+   if (!tcm)
+   return -ENOMEM;
+
+   r5_core->tcm_banks[j] = tcm;
+
+   /* Get power-domains id of TCM. */
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+tcm_pd_idx, _args);
+   if (ret) {
+   dev_err(r5_core->dev,
+

[PATCH v14 3/4] dts: zynqmp: add properties for TCM in remoteproc

2024-04-08 Thread Tanmay Shah

Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---

Changes in v14:
  - Add xlnx,tcm-mode property in remoteproc node

 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 67 +--
 2 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+_split {
+   status = "okay";
+};
+
+_lockstep {
+   status = "disabled";
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 25d20d803230..ef31b0fc73d1 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -260,19 +260,76 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
+   xlnx,tcm-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x0 0x1 0x0 0xffe1 0x0 0x1>,
+<0x0 0x3 0x0 0xffe3 0x0 0x1>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x1>,
+ <0x0 0x2 0x0 0x1>,
+ <0x0 0x1 0x0 0x1>,
+ <0x0 0x3 0x0 0x1>;
+   reg-names = "atcm0", "btcm0", "atcm1", "btcm1";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+   xlnx,tcm-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
memory-region = <_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+

[PATCH v14 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-04-08 Thread Tanmay Shah

From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Tanmay Shah 

---

Changes in v14:
  - Remove previous RB tag
  - Add xlnx,tcm-mode property
  - Add Versal platform support
  - Add Versal-NET platform support

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 279 --
 1 file changed, 257 insertions(+), 22 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..6f13da11f593 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -18,11 +18,26 @@ description: |
 
 properties:
   compatible:
-const: xlnx,zynqmp-r5fss
+enum:
+  - xlnx,zynqmp-r5fss
+  - xlnx,versal-r5fss
+  - xlnx,versal-net-r52fss
+
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
 
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
+default: 1
 description: |
   The RPU MPCore can operate in split mode (Dual-processor performance), 
Safety
   lock-step mode(Both RPU cores execute the same code in lock-step,
@@ -36,8 +51,16 @@ properties:
   1: lockstep mode (default)
   2: single cpu mode
 
+  xlnx,tcm-mode:
+$ref: /schemas/types.yaml#/definitions/uint32
+enum: [0, 1]
+description: |
+  Configure RPU TCM
+  0: split mode
+  1: lockstep mode
+
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r(.*)@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -52,10 +75,22 @@ patternProperties:
 
 properties:
   compatible:
-const: xlnx,zynqmp-r5f
+enum:
+  - xlnx,zynqmp-r5f
+  - xlnx,versal-r5f
+  - xlnx,versal-net-r52f
+
+  reg:
+minItems: 1
+maxItems: 4
+
+  reg-names:
+minItems: 1
+maxItems: 4
 
   power-domains:
-maxItems: 1
+minItems: 2
+maxItems: 5
 
   mboxes:
 minItems: 1
@@ -101,35 +136,235 @@ patternProperties:
 
 required:
   - compatible
+  - reg
+  - reg-names
   - power-domains
 
-unevaluatedProperties: false
-
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
+
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - xlnx,versal-net-r52fss
+then:
+  properties:
+xlnx,tcm-mode: false
+
+  patternProperties:
+"^r52f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+- description: CTCM internal memory
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+- const: ctcm0
+
+power-domains:
+  minItems: 2
+  items:
+- description: RPU core power domain
+- description: ATCM power domain
+- description: BTCM power domain
+- description: CTCM power domain
+
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - xlnx,zynqmp-r5fss
+  - xlnx,versal-r5fss
+then:
+  if:
+properties:
+  xlnx,cluster-mode:
+enum: [1, 2]
+  then:
+properties:
+  xlnx,tcm-mode:
+enum: [1]
+
+patternProperties:
+  "^r5f@[0-9a-f]+$":
+type: object
+
+properties:
+  reg:
+minItems: 1
+items:
+  - description: ATCM

[PATCH v14 0/4] add zynqmp TCM bindings

2024-04-08 Thread Tanmay Shah

Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

---

prerequisite-patch-link: 
https://lore.kernel.org/all/d4556268-8274-4089-949f-3b97d6779...@gmail.com/
Base Branch: 6.9.rc2

Changes in v14:
  - Add xlnx,tcm-mode property and use it for TCM configuration
  - Add Versal and Versal-NET platform support
  - Maintain backward compatibility for ZynqMP platform and use hardcode
TCM addresses

Changes in v13:
  - Have power-domains property for lockstep case instead of
keeping it flexible.
  - Add "items:" list in power-domains property

Changes in v12:
  - add "reg", "reg-names" and "power-domains" in pattern properties
  - add "reg" and "reg-names" in required list
  - keep "power-domains" in required list as it was before the change

Changes in v11:
  - Fix yamllint warning and reduce indentation as needed
  - Remove redundant initialization of the variable
  - Return correct error code if memory allocation failed

Changs in v10:
  - Add new patch (1/4) to series that changes hardcode TCM addresses in
lockstep mode and removes separate handling of TCM in lockstep and
split mode
  - modify number of "reg", "reg-names" and "power-domains" entries
based on cluster mode
  - Add extra optional atcm and btcm in "reg" property for lockstep mode
  - Add "reg-names" for extra optional atcm and btcm for lockstep mode
  - Drop previous Ack as bindings has new change
  - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
  - Add each tcm's power-domains in lockstep mode
  - Drop previous Ack as new change in dts patchset
  - Remove redundant changes in driver to handle TCM in lockstep mode

Changes in v9:
  - Fix rproc lockstep dts
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.
  - Fix comments as suggested

Changes in v8:
  - Remove use of pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - %s/pm_dev1/pm_dev_core0/r
  - %s/pm_dev_link1/pm_dev_core0_link/r
  - %s/pm_dev2/pm_dev_core1/r
  - %s/pm_dev_link2/pm_dev_core1_link/r
  - remove pm_domain_id check to move next patch
  - add comment about how 1st entry in pm domain list is used
  - fix loop when jump to fail_add_pm_domains loop
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

Changes in v6:
  - Introduce new node entry for r5f cluster split mode dts and
keep it disabled by default.
  - Keep remoteproc lockstep mode enabled by default to maintian
back compatibility.
  - Enable split mode only for zcu102 board to demo split mode use
  - Remove spurious change
  - Handle errors in add_pm_domains function
  - Remove redundant code to handle errors from remove_pm_domains
  - Missing . at the end of the commit message
  - remove redundant initialization of variables
  - remove fail_tcm label and relevant code to free memory
acquired using devm_* API. As this will be freed when device free it
  - add extra check to see if "reg" property is supported or not

Changes in v5:
  - maintain Rob's Ack on bindings patch as no changes in bindings
  -

[PATCH v14 1/4] remoteproc: zynqmp: fix lockstep mode memory region

2024-04-08 Thread Tanmay Shah

In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This makes driver maintanance easy and makes design robust for future
platorms as well.

Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 146 ++--
 1 file changed, 12 insertions(+), 134 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index cfbd97b89c26..0f942440b4e2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -541,14 +541,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -581,10 +581,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -594,99 +594,16 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
-   }
-
-   rproc_add_carveout(rproc, rproc_mem);
-   rproc_coredump_add_segment(rproc, da, bank_size);
-   }
-
-   return 0;
-
-release_tcm_split:
-   /* If failed, Turn off all TCM banks turned on before */
-   for (i--; i >= 0; i--) {
-   pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
-   zynqmp_pm_release_node(pm_domain_id);
-   }
-   return ret;
-}
-
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   /* Go through zynqmp banks for r5 node */
-   num_banks = r5_core->tcm_bank_count;
-
-   /*
-* In lockstep mode, TCM is contiguous memory

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Andrew Morton

On Mon,  8 Apr 2024 16:58:18 +0200 Michal Koutný  wrote:

> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

That seems like a large change.

It isn't clear why we'd want to merge this patchset.  Does it improve
anyone's life and if so, how?

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 17:17, Ondřej Jirman wrote:
> On Mon, Apr 08, 2024 at 03:27:00PM GMT, Krzysztof Kozlowski wrote:
>> On 08/04/2024 14:48, Ondřej Jirman wrote:
>>> Yeah, I understand where the confusion is. The driver is not for anx7688 
>>> chip
>>> really. The driver is named anx7688, but that's mostly a historical 
>>> accident at
>>> this point.
>>>
>>> I guess there can be a driver for anx7688 chip that can directly use the 
>>> chip's
>>> resources from the host by directly manipulating its registers and 
>>> implementing
>>> type-c functionality via eg. Linux's TCPM or TCPCI stack, etc. (eg. like
>>> fusb302 driver, or various tcpci subdrivers).
>>>
>>> But in this case the chip is driven by an optional on-chip microcontroller's
>>> firmware and *this driver* is specifically for *the Type-C port on 
>>> Pinephone*
>>
>> We do not talk here about the driver, but bindings, so hardware.
> 
> Got it. Bindings should be the same regardless of what driver would be used,
> whether this OCM based one, or some future one based on the above mentioned
> TCPCI in-kernel implementation. Hardware is the same in both cases.
> 
> Just trying to imagine how to actually solve the issues...
> 
> Basic thing with the I2C regulator thing is that needs to be enabled as long
> as anx7688 needs to communicate over I2C. Other user of this power rail is
> touchscreen controller for its normal power supply, and it needs to be able
> to disable it during system suspend.

This does not look like anything specific to this particular device...
but even without this, please think how you want it to be solved. Same
supply which has to be on always, because your anx7688 can talk over
I2C, and in the same time sometimes off, so touchscreen can be shutdown.

If this is regulator for the I2C bus, then I think we already had this
discussion some time ago. I think it is not a property of the I2C
device, but the controller.

> 
> Now for things to not fail during suspend/resume based on PM callbacks
> invocation order, anx7688 driver needs to enable this regulator too, as long
> as it needs it.

No, the I2C bus driver needs to manage it. Not one individual I2C
device. Again, why anx7688 is specific? If you next phone has anx8867,
using different driver, you also add there i2c-supply? And if it is
nxp,ptn5100 as well?

> 
> I can put bus-supply to I2C controller node, and read it from the ANX7688 
> driver
> I guess, by going up a DT node. Whether that's going to be acceptable, I don't
> know. 
> 
> 
> VCONN regulator I don't know where else to put either. It doesn't seem to 
> belong
> anywhere. It's not something directly connected to Type-C connector, so
> not part of connector bindings, and there's nothing else I can see, other
> than anx7688 device which needs it for core functionality.

That sounds like a GPIO, not regulator. anx7688 has GPIOs, right? On
Pinephone they go to regulator, but on FooPhone also using anx7688 they
go somewhere else, so why this anx7688 assumes this is a regulator?

You need to properly represent the hardware, not bend it to your one
use-case for one hardware.


> 
> ANX7688 chip desing doesn't have integrated VCONN mosfet switches so it always
> needs external supply + switches that are controlled by the chip itself. 
> There's
> no sensible design where someone would not want this and the driver needs
> to get this regulator reference from somewhere. The switches are sort of an
> extension of the chip.


Best regards,
Krzysztof

Re: [PATCH] dt-bindings: iio: imu: mpu6050: Improve i2c-gate disallow list

2024-04-08 Thread Jean-Baptiste Maneyrol

Hello Luca,

good catch, thanks for the patch!

Acked-by: Jean-Baptiste Maneyrol 


From: Luca Weiss 
Sent: Monday, April 8, 2024 18:34
To: ~postmarketos/upstream...@lists.sr.ht 
<~postmarketos/upstream...@lists.sr.ht>; phone-de...@vger.kernel.org 
; Jonathan Cameron ; Lars-Peter 
Clausen ; Rob Herring ; Krzysztof Kozlowski 
; Conor Dooley ; Jean-Baptiste 
Maneyrol 
Cc: linux-...@vger.kernel.org ; 
devicet...@vger.kernel.org ; 
linux-kernel@vger.kernel.org ; Luca Weiss 

Subject: [PATCH] dt-bindings: iio: imu: mpu6050: Improve i2c-gate disallow list
 
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
 
Before all supported sensors except for MPU{9150,9250,9255} were not
allowed to use i2c-gate in the bindings which excluded quite a few
supported sensors where this functionality is supported.

Switch the list of sensors to ones where the Linux driver explicitly
disallows support for the auxiliary bus ("inv_mpu_i2c_aux_bus"). Since
the driver is also based on "default: return true" this should scale
better into the future.

Signed-off-by: Luca Weiss 
---
This fixes dt validation error on qcom-msm8974-lge-nexus5-hammerhead
which uses mpu6515

arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dtb: mpu6515@68: 
i2c-gate: False schema does not allow {'#address-cells': [[1]], '#size-cells': 
[[0]], 'ak8963@f': {'compatible': ['asahi-kasei,ak8963'], 'reg': [[15]], 
'gpios': [[40, 67, 0]], 'vid-supply': [[50]], 'vdd-supply': [[49]]}, 
'bmp280@76': {'compatible': ['bosch,bmp280'], 'reg': [[118]], 'vdda-supply': 
[[50]], 'vddd-supply': [[49]]}}
from schema $id: 
https://urldefense.com/v3/__http://devicetree.org/schemas/iio/imu/invensense,mpu6050.yaml*__;Iw!!FtrhtPsWDhZ6tw!Athn1pwCL_LPpZ97exHEFSkirApIqFF2ISY01IuyHtFBxpbPkcPWh_FmzB_TiCzb8uv1HO0AHY4IeIlv1-o$[devicetree[.]org]
---
 .../devicetree/bindings/iio/imu/invensense,mpu6050.yaml | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml 
b/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
index 297b8a1a7ffb..587ff2bced2d 100644
--- a/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
+++ b/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
@@ -62,14 +62,15 @@ properties:
 allOf:
   - $ref: /schemas/spi/spi-peripheral-props.yaml#
   - if:
-  not:
-properties:
-  compatible:
-contains:
-  enum:
-- invensense,mpu9150
-- invensense,mpu9250
-- invensense,mpu9255
+  properties:
+compatible:
+  contains:
+enum:
+  - invensense,iam20680
+  - invensense,icm20602
+  - invensense,icm20608
+  - invensense,icm20609
+  - invensense,icm20689
 then:
   properties:
 i2c-gate: false

---
base-commit: 8568bb2ccc278f344e6ac44af6ed010a90aa88dc
change-id: 20240408-mpu6050-i2c-gate-4ea473e492f4

Best regards,
-- 
Luca Weiss

[PATCH 2/2] ARM: dts: qcom: msm8974: Use proper compatible for APCS syscon

2024-04-08 Thread Luca Weiss

Use the apcs-kpss-global compatible for the APCS global mailbox block
found on this SoC.

This also resolves a dt-binding checker warning:

  arch/arm/boot/dts/qcom/qcom-msm8974pro-fairphone-fp2.dtb: syscon@f9011000: 
compatible: 'anyOf' conditional failed, one must be fixed:
  ['syscon'] is too short
  'syscon' is not one of ['allwinner,sun8i-a83t-system-controller', 
'allwinner,sun8i-h3-system-controller', 
'allwinner,sun8i-v3s-system-controller', 
'allwinner,sun50i-a64-system-controller', 'amd,pensando-elba-syscon', 
'brcm,cru-clkset', 'freecom,fsg-cs2-system-controller', 
'fsl,imx93-aonmix-ns-syscfg', 'fsl,imx93-wakeupmix-syscfg', 
'hisilicon,dsa-subctrl', 'hisilicon,hi6220-sramctrl', 
'hisilicon,pcie-sas-subctrl', 'hisilicon,peri-subctrl', 'hpe,gxp-sysreg', 
'intel,lgm-syscon', 'loongson,ls1b-syscon', 'loongson,ls1c-syscon', 
'marvell,armada-3700-usb2-host-misc', 'mediatek,mt8135-pctl-a-syscfg', 
'mediatek,mt8135-pctl-b-syscfg', 'mediatek,mt8365-syscfg', 
'microchip,lan966x-cpu-syscon', 'microchip,sparx5-cpu-syscon', 
'mstar,msc313-pmsleep', 'nuvoton,ma35d1-sys', 'nuvoton,wpcm450-shm', 
'rockchip,px30-qos', 'rockchip,rk3036-qos', 'rockchip,rk3066-qos', 
'rockchip,rk3128-qos', 'rockchip,rk3228-qos', 'rockchip,rk3288-qos', 
'rockchip,rk3368-qos', 'rockchip,rk3399-qos', 'rockchip,rk356
 8-qos', 'rockchip,rk3588-qos', 'rockchip,rv1126-qos', 
'starfive,jh7100-sysmain', 'ti,am62-usb-phy-ctrl', 'ti,am654-dss-oldi-io-ctrl', 
'ti,am654-serdes-ctrl', 'ti,j784s4-pcie-ctrl']
  from schema $id: http://devicetree.org/schemas/mfd/syscon.yaml#

Signed-off-by: Luca Weiss 
---
 arch/arm/boot/dts/qcom/qcom-msm8974.dtsi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/qcom/qcom-msm8974.dtsi 
b/arch/arm/boot/dts/qcom/qcom-msm8974.dtsi
index 233d9bf42298..7e0224006b1f 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974.dtsi
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974.dtsi
@@ -341,9 +341,11 @@ intc: interrupt-controller@f900 {
  <0xf9002000 0x1000>;
};
 
-   apcs: syscon@f9011000 {
-   compatible = "syscon";
+   apcs: mailbox@f9011000 {
+   compatible = "qcom,msm8974-apcs-kpss-global",
+"qcom,msm8994-apcs-kpss-global", "syscon";
reg = <0xf9011000 0x1000>;
+   #mbox-cells = <1>;
};
 
saw_l2: power-manager@f9012000 {

-- 
2.44.0

[PATCH 1/2] dt-bindings: mailbox: qcom: Add MSM8974 APCS compatible

2024-04-08 Thread Luca Weiss

Add compatible for the Qualcomm MSM8974 APCS block.

Signed-off-by: Luca Weiss 
---
 Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml 
b/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml
index 79eb523b8436..982c741e6225 100644
--- a/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml
+++ b/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml
@@ -30,6 +30,7 @@ properties:
   - const: syscon
   - items:
   - enum:
+  - qcom,msm8974-apcs-kpss-global
   - qcom,msm8976-apcs-kpss-global
   - const: qcom,msm8994-apcs-kpss-global
   - const: syscon

-- 
2.44.0

[PATCH 0/2] Fix msm8974 apcs syscon compatible

2024-04-08 Thread Luca Weiss

Finally fix a warning about the apcs-global syscon used on msm8974 that
has been around forever.

Signed-off-by: Luca Weiss 
---
Luca Weiss (2):
  dt-bindings: mailbox: qcom: Add MSM8974 APCS compatible
  ARM: dts: qcom: msm8974: Use proper compatible for APCS syscon

 .../devicetree/bindings/mailbox/qcom,apcs-kpss-global.yaml  | 1 +
 arch/arm/boot/dts/qcom/qcom-msm8974.dtsi| 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 8568bb2ccc278f344e6ac44af6ed010a90aa88dc
change-id: 20240408-msm8974-apcs-b7765f6bab99

Best regards,
-- 
Luca Weiss

Re: [PATCH fs/proc/bootconfig] remove redundant comments from /proc/bootconfig

2024-04-08 Thread Paul E. McKenney

On Sat, Apr 06, 2024 at 11:11:08AM +0900, Masami Hiramatsu wrote:
> On Thu, 4 Apr 2024 21:25:41 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Apr 05, 2024 at 11:57:45AM +0900, Masami Hiramatsu wrote:
> > > On Fri, 5 Apr 2024 10:23:24 +0900
> > > Masami Hiramatsu (Google)  wrote:
> > > 
> > > > On Thu, 4 Apr 2024 10:43:14 -0700
> > > > "Paul E. McKenney"  wrote:
> > > > 
> > > > > On Thu, Apr 04, 2024 at 08:55:22AM +0900, Masami Hiramatsu wrote:
> > > > > > On Wed, 3 Apr 2024 12:16:28 -0700
> > > > > > "Paul E. McKenney"  wrote:
> > > > > > 
> > > > > > > commit 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > comment to
> > > > > > > /proc/bootconfig") adds bootloader argument comments into 
> > > > > > > /proc/bootconfig.
> > > > > > > 
> > > > > > > /proc/bootconfig shows boot_command_line[] multiple times 
> > > > > > > following
> > > > > > > every xbc key value pair, that's duplicated and not necessary.
> > > > > > > Remove redundant ones.
> > > > > > > 
> > > > > > > Output before and after the fix is like:
> > > > > > > key1 = value1
> > > > > > > *bootloader argument comments*
> > > > > > > key2 = value2
> > > > > > > *bootloader argument comments*
> > > > > > > key3 = value3
> > > > > > > *bootloader argument comments*
> > > > > > > ...
> > > > > > > 
> > > > > > > key1 = value1
> > > > > > > key2 = value2
> > > > > > > key3 = value3
> > > > > > > *bootloader argument comments*
> > > > > > > ...
> > > > > > > 
> > > > > > > Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as 
> > > > > > > comment to /proc/bootconfig")
> > > > > > > Signed-off-by: Zhenhua Huang 
> > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > Cc: Masami Hiramatsu 
> > > > > > > Cc: 
> > > > > > > Cc: 
> > > > > > 
> > > > > > OOps, good catch! Let me pick it.
> > > > > > 
> > > > > > Acked-by: Masami Hiramatsu (Google) 
> > > > > 
> > > > > Thank you, and I have applied your ack and pulled this into its own
> > > > > bootconfig.2024.04.04a.
> > > > > 
> > > > > My guess is that you will push this via your own tree, and so I will
> > > > > drop my copy as soon as yours hits -next.
> > > > 
> > > > Thanks! I would like to make PR this soon as bootconfig fixes for 
> > > > v6.9-rc2.
> > > 
> > > Hmm I found that this always shows the command line comment in
> > > /proc/bootconfig even without "bootconfig" option.
> > > I think that is easier for user-tools but changes the behavior and
> > > a bit redundant.
> > > 
> > > We should skip showing this original argument comment if bootconfig is
> > > not initialized (no "bootconfig" in cmdline) as it is now.
> > 
> > So something like this folded into that patch?
> 
> Hm, I expected just checking it in the loop as below.
> 
> 
> diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> index e5635a6b127b..98e0780f7e07 100644
> --- a/fs/proc/bootconfig.c
> +++ b/fs/proc/bootconfig.c
> @@ -27,6 +27,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
> size)
>  {
>   struct xbc_node *leaf, *vnode;
>   char *key, *end = dst + size;
> + bool empty = true;
>   const char *val;
>   char q;
>   int ret = 0;
> @@ -62,8 +63,9 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
> size)
>   break;
>   dst += ret;
>   }
> + empty = false;
>   }
> - if (ret >= 0 && boot_command_line[0]) {
> + if (!empty && ret >= 0 && boot_command_line[0]) {
>   ret = snprintf(dst, rest(dst, end), "# Parameters from 
> bootloader:\n# %s\n",
>  boot_command_line);
>   if (ret > 0)
> 
> 
> 
> The difference is checking "bootconfig" cmdline option or checking
> the "bootconfig" is actually empty. So the behaviors are different
> when the "bootconfig" is specified but there is no bootconfig data.

Ah, understood, the point is to avoid the comment in cases where its
content would be identical to /proc/cmdline.

> Another idea is to check whether the cmdline is actually updated by
> bootconfig and show original one only if it is updated.
> (I think this fits the purpose of the original patch better.)
> 
> 
> diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
> index e5635a6b127b..95d6a231210c 100644
> --- a/fs/proc/bootconfig.c
> +++ b/fs/proc/bootconfig.c
> @@ -10,6 +10,9 @@
>  #include 
>  #include 
>  
> +/* defined in main/init.c */
> +bool __init cmdline_has_extra_options(void);
> +
>  static char *saved_boot_config;
>  
>  static int boot_config_proc_show(struct seq_file *m, void *v)
> @@ -63,7 +66,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
> size)
>   dst += ret;
>   }
>   }
> - if (ret >= 0 &&

Re: [PATCH] drivers/virtio: delayed configuration descriptor flags

2024-04-08 Thread Chaitanya Kulkarni

On 4/8/24 10:02, ni.liqiang wrote:
> In our testing of the virtio hardware accelerator, we found that
> configuring the flags of the descriptor after addr and len,
> as implemented in DPDK, seems to be more friendly to the hardware.
please describe in detail "friendly to the hardware" means ..

> In our Virtio hardware implementation tests, using the default
> open-source code, the hardware's bulk reads ensure performance
> but correctness is compromised. If we refer to the implementation code
> of DPDK, placing the flags configuration of the descriptor
> after addr and len, virtio backend can function properly based on
> our hardware accelerator.

you are not specifying how the correctness is compromised ..
again what the "properly" mean here ? what is the exact failure that
you are seeing ? please document ..

> I am somewhat puzzled by this. From a software process perspective,
> it seems that there should be no difference whether
> the flags configuration of the descriptor is before or after addr and len.
> However, this is not the case according to experimental test results.
> We would like to know if such a change in the configuration order
> is reasonable and acceptable?

where are the experimental results ? any particular reason those
results are not documented here ?

>
> Thanks.
>
> Signed-off-by: ni.liqiang 
> Reviewed-by: jin.qi 
> Tested-by: jin.qi 
> Cc: ni.liqiang 
> ---
>   drivers/virtio/virtio_ring.c | 9 +
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 6f7e5010a673..bea2c2fb084e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1472,15 +1472,16 @@ static inline int virtqueue_add_packed(struct 
> virtqueue *_vq,
>   flags = cpu_to_le16(vq->packed.avail_used_flags |
>   (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
>   (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
> - if (i == head)
> - head_flags = flags;
> - else
> - desc[i].flags = flags;
>   
>   desc[i].addr = cpu_to_le64(addr);
>   desc[i].len = cpu_to_le32(sg->length);
>   desc[i].id = cpu_to_le16(id);
>   
> + if (i == head)
> + head_flags = flags;
> + else
> + desc[i].flags = flags;
> +
>   if (unlikely(vq->use_dma_api)) {
>   vq->packed.desc_extra[curr].addr = addr;
>   vq->packed.desc_extra[curr].len = sg->length;

-ck

Re: [PATCH 1/2] dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes

2024-04-08 Thread Luca Weiss

On Montag, 8. April 2024 19:26:49 CEST Konrad Dybcio wrote:
> 
> On 4/8/24 18:39, Luca Weiss wrote:
> > Allow specifying a GPIO hog, as already used on
> > qcom-msm8974-lge-nexus5-hammerhead.dts.
> > 
> > Signed-off-by: Luca Weiss 
> > ---
> >   .../devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml  | 12 
> > 
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml 
> > b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
> > index a786357ed1af..510a05369dbb 100644
> > --- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
> > +++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
> > @@ -424,6 +424,10 @@ patternProperties:
> >   $ref: "#/$defs/qcom-pmic-gpio-state"
> >   additionalProperties: false
> >   
> > +  "^(hog-[0-9]+|.+-hog(-[0-9]+)?)$":
> 
> I see a couple bindings do this, but I'm not sure if we want two
> allow two styles for no reason.. Rob?

This regex is actually from the gpio-hog.yaml base
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/gpio/gpio-hog.yaml#L23

Why it's made this way I cannot tell you, but I didn't want to 'artifically'
restrict the pattern for qcom,pmic-gpio.

> 
> Konrad
>

Re: [PATCH v10 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

2024-04-08 Thread Haitao Huang


On Mon, 08 Apr 2024 07:20:23 -0500, Huang, Kai  wrote:




--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -28,6 +28,10 @@ static inline int sgx_cgroup_try_charge(struct  
sgx_cgroup *sgx_cg, enum sgx_recl

 static inline void sgx_cgroup_uncharge(struct sgx_cgroup *sgx_cg) { }

 static inline void sgx_cgroup_init(void) { }
+
+static inline void sgx_cgroup_reclaim_pages(struct misc_cg *root,  
struct mm_struct *charge_mm)

+{
+}
 #else
 struct sgx_cgroup {
struct misc_cg *cg;
@@ -65,6 +69,9 @@ static inline void sgx_put_cg(struct sgx_cgroup  
*sgx_cg)


 int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg, enum sgx_reclaim  
reclaim);

 void sgx_cgroup_uncharge(struct sgx_cgroup *sgx_cg);
+bool sgx_cgroup_lru_empty(struct misc_cg *root);
+bool sgx_cgroup_should_reclaim(struct sgx_cgroup *sgx_cg);
+void sgx_cgroup_reclaim_pages(struct misc_cg *root,  struct mm_struct  
*charge_mm);


Seems the decision to choose whether to implement a stub function for  
some

function is quite random to me.

My impression is people in general don't like #ifdef in the C file but  
prefer to

implementing it in the header in some helper function.

I guess you might want to just implement a stub function for each of the  
3
functions exposed, so that we can eliminate some #ifdefs in the  
sgx/main.c (see

below).


 void sgx_cgroup_init(void);

 #endif
diff --git a/arch/x86/kernel/cpu/sgx/main.c  
b/arch/x86/kernel/cpu/sgx/main.c

index 7f92455d957d..68f28ff2d5ef 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -34,6 +34,16 @@ static struct sgx_epc_lru_list sgx_global_lru;

 static inline struct sgx_epc_lru_list *sgx_lru_list(struct  
sgx_epc_page *epc_page)

 {
+#ifdef CONFIG_CGROUP_SGX_EPC
+   if (epc_page->sgx_cg)
+   return _page->sgx_cg->lru;
+
+   /*
+* This should not happen when cgroup is enabled: Every page belongs
+* to a cgroup, or the root by default.
+*/
+   WARN_ON_ONCE(1);


In the case MISC cgroup is enabled in Kconfig but disabled by command  
line, I

think this becomes legal now?



I'm not sure actually. Never saw the warning even if I set  
"cgroup_disable=misc" in commandlibe. Tried both v1 and v2. Anyway, I  
think we can remove this warning and we handle the NULL sgx_cg now.



+#endif
return _global_lru;
 }

@@ -42,7 +52,11 @@ static inline struct sgx_epc_lru_list  
*sgx_lru_list(struct sgx_epc_page *epc_pag

  */
 static inline bool sgx_can_reclaim(void)
 {
+#ifdef CONFIG_CGROUP_SGX_EPC
+   return !sgx_cgroup_lru_empty(misc_cg_root());
+#else
return !list_empty(_global_lru.reclaimable);
+#endif
 }



Here you are using #ifdef  CONFIG_CGRUP_SGX_EPC, but ...


 static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
@@ -404,7 +418,10 @@ static bool sgx_should_reclaim(unsigned long  
watermark)


 static void sgx_reclaim_pages_global(struct mm_struct *charge_mm)
 {
-   sgx_reclaim_pages(_global_lru, charge_mm);
+   if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC))
+   sgx_cgroup_reclaim_pages(misc_cg_root(), charge_mm);
+   else
+   sgx_reclaim_pages(_global_lru, charge_mm);
 }


... here you are using IS_ENABLED(CONFIG_CGROUP_SGX_EPC).

Any reason they are not consistent?


I was hesitant to expose sgx_global_lru needed for implementing  
sgx_cgroup_lru_empty() stub which would also be a random decision and  
overkill to just remove couple of #ifdefs in short functions.




Also, in the case where MISC cgroup is disabled via commandline, I think  
it

won't work, because misc_cg_root() should be NULL in this case while
IS_ENABLED(CONFIG_CGROUP_SGX_EPC) is true.





The misc root cgroup is a static similar to sgx_cg_root. So  
misc_cg_root()  won't be NULL
However, based on how css_misc() was check NULL, I suppose  
sgx_get_current_cg() may be NULL when cgroup is disabled (again not 100%  
sure but we handle it anyway)



 /*
@@ -414,6 +431,16 @@ static void sgx_reclaim_pages_global(struct  
mm_struct *charge_mm)

  */
 void sgx_reclaim_direct(void)
 {
+#ifdef CONFIG_CGROUP_SGX_EPC
+   struct sgx_cgroup *sgx_cg = sgx_get_current_cg();
+
+   /* Make sure there are some free pages at cgroup level */
+   if (sgx_cg && sgx_cgroup_should_reclaim(sgx_cg)) {
+   sgx_cgroup_reclaim_pages(sgx_cg->cg, current->mm);
+   sgx_put_cg(sgx_cg);
+   }
+#endif


This #ifdef CONFIG_CGROUP_SGX_EPC can be removed if we implement a stub  
function

for sgx_cgroup_should_reclaim().


Yes.

+   /* Make sure there are some free pages at global level */
if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
sgx_reclaim_pages_global(current->mm);
 }




Thanks
Haitao

[PATCH v2] rethook: Remove warning messages printed for finding return address of a frame.

2024-04-08 Thread Kui-Feng Lee

The function rethook_find_ret_addr() prints a warning message and returns 0
when the target task is running and is not the "current" task in order to
prevent the incorrect return address, although it still may return an
incorrect address.

However, the warning message turns into noise when BPF profiling programs
call bpf_get_task_stack() on running tasks in a firm with a large number of
hosts.

The callers should be aware and willing to take the risk of receiving an
incorrect return address from a task that is currently running other than
the "current" one. A warning is not needed here as the callers are intent
on it.

Acked-by: Andrii Nakryiko 
Acked-by: John Fastabend 
Signed-off-by: Kui-Feng Lee 

---
Changes from v1:

 - Rephrased the commit log.

   - Removed the confusing last part of the first paragraph.

   - Removed "frequently" from the 2nd paragraph, replaced by "a firm with
 a large number of hosts".

v1: https://lore.kernel.org/all/20240401191621.758056-1-thinker...@gmail.com/
---
 kernel/trace/rethook.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index fa03094e9e69..4297a132a7ae 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -248,7 +248,7 @@ unsigned long rethook_find_ret_addr(struct task_struct 
*tsk, unsigned long frame
if (WARN_ON_ONCE(!cur))
return 0;
 
-   if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
+   if (tsk != current && task_is_running(tsk))
return 0;
 
do {
-- 
2.34.1

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-08 Thread SeongJae Park

On Mon,  8 Apr 2024 21:06:44 +0900 Honggyu Kim  wrote:

> On Fri,  5 Apr 2024 12:24:30 -0700 SeongJae Park  wrote:
> > On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  wrote:
[...]
> > > Here is one of the example usage of this 'migrate_cold' action.
> > > 
> > >   $ cd /sys/kernel/mm/damon/admin/kdamonds/
> > >   $ cat contexts//schemes//action
> > >   migrate_cold
> > >   $ echo 2 > contexts//schemes//target_nid
> > >   $ echo commit > state
> > >   $ numactl -p 0 ./hot_cold 500M 600M &
> > >   $ numastat -c -p hot_cold
> > > 
> > >   Per-node process memory usage (in MBs)
> > >   PID Node 0 Node 1 Node 2 Total
> > >   --  -- -- -- -
> > >   701 (hot_cold) 501  0601  1101
> > > 
> > > Since there are some common routines with pageout, many functions have
> > > similar logics between pageout and migrate cold.
> > > 
> > > damon_pa_migrate_folio_list() is a minimized version of
> > > shrink_folio_list(), but it's minified only for demotion.
> > 
> > MIGRATE_COLD is not only for demotion, right?  I think the last two words 
> > are
> > better to be removed for reducing unnecessary confuses.
> 
> You mean the last two sentences?  I will remove them if you feel it's
> confusing.

Yes.  My real intended suggestion was 's/only for demotion/only for
migration/', but entirely removing the sentences is also ok for me.

> 
> > > 
> > > Signed-off-by: Honggyu Kim 
> > > Signed-off-by: Hyeongtak Ji 
> > > ---
> > >  include/linux/damon.h|   2 +
> > >  mm/damon/paddr.c | 146 ++-
> > >  mm/damon/sysfs-schemes.c |   4 ++
> > >  3 files changed, 151 insertions(+), 1 deletion(-)
[...]
> > > --- a/mm/damon/paddr.c
> > > +++ b/mm/damon/paddr.c
[...]
> > > +{
> > > + unsigned int nr_succeeded;
> > > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > > +
> > 
> > I personally prefer not having empty lines in the middle of variable
> > declarations/definitions.  Could we remove this empty line?
> 
> I can remove it, but I would like to have more discussion about this
> issue.  The current implementation allows only a single migration
> target with "target_nid", but users might want to provide fall back
> migration target nids.
> 
> For example, if more than two CXL nodes exist in the system, users might
> want to migrate cold pages to any CXL nodes.  In such cases, we might
> have to make "target_nid" accept comma separated node IDs.  nodemask can
> be better but we should provide a way to change the scanning order.
> 
> I would like to hear how you think about this.

Good point.  I think we could later extend the sysfs file to receive the
comma-separated numbers, or even mask.  For simplicity, adding sysfs files
dedicated for the different format of inputs could also be an option (e.g.,
target_nids_list, target_nids_mask).  But starting from this single node as is
now looks ok to me.

[...]
> > > + /* 'folio_list' is always empty here */
> > > +
> > > + /* Migrate folios selected for migration */
> > > + nr_migrated += migrate_folio_list(_folios, pgdat, target_nid);
> > > + /* Folios that could not be migrated are still in @migrate_folios */
> > > + if (!list_empty(_folios)) {
> > > + /* Folios which weren't migrated go back on @folio_list */
> > > + list_splice_init(_folios, folio_list);
> > > + }
> > 
> > Let's not use braces for single statement
> > (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
> 
> Hmm.. I know the convention but left it as is because of the comment.
> If I remove the braces, it would have a weird alignment for the two
> lines for comment and statement lines.

I don't really hate such alignment.  But if you don't like it, how about moving
the comment out of the if statement?  Having one comment for one-line if
statement looks not bad to me.

> 
> > > +
> > > + try_to_unmap_flush();
> > > +
> > > + list_splice(_folios, folio_list);
> > 
> > Can't we move remaining folios in migrate_folios to ret_folios at once?
> 
> I will see if it's possible.

Thank you.  Not a strict request, though.

[...]
> > > + nid = folio_nid(lru_to_folio(folio_list));
> > > + do {
> > > + struct folio *folio = lru_to_folio(folio_list);
> > > +
> > > + if (nid == folio_nid(folio)) {
> > > + folio_clear_active(folio);
> > 
> > I think this was necessary for demotion, but now this should be removed 
> > since
> > this function is no more for demotion but for migrating random pages, right?
> 
> Yeah,  it can be removed because we do migration instead of demotion,
> but I need to make sure if it doesn't change the performance evaluation
> results.

Yes, please ensure the test results are valid :)


Thanks,
SJ

[...]

Re: [PATCH 1/2] dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes

2024-04-08 Thread Konrad Dybcio





On 4/8/24 18:39, Luca Weiss wrote:

Allow specifying a GPIO hog, as already used on
qcom-msm8974-lge-nexus5-hammerhead.dts.

Signed-off-by: Luca Weiss 
---
  .../devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml  | 12 
  1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml 
b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
index a786357ed1af..510a05369dbb 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
@@ -424,6 +424,10 @@ patternProperties:
  $ref: "#/$defs/qcom-pmic-gpio-state"
  additionalProperties: false
  
+  "^(hog-[0-9]+|.+-hog(-[0-9]+)?)$":


I see a couple bindings do this, but I'm not sure if we want two
allow two styles for no reason.. Rob?

Konrad

[PATCH] drivers/virtio: delayed configuration descriptor flags

2024-04-08 Thread ni.liqiang

In our testing of the virtio hardware accelerator, we found that
configuring the flags of the descriptor after addr and len,
as implemented in DPDK, seems to be more friendly to the hardware.

In our Virtio hardware implementation tests, using the default
open-source code, the hardware's bulk reads ensure performance
but correctness is compromised. If we refer to the implementation code
of DPDK, placing the flags configuration of the descriptor
after addr and len, virtio backend can function properly based on
our hardware accelerator.

I am somewhat puzzled by this. From a software process perspective,
it seems that there should be no difference whether
the flags configuration of the descriptor is before or after addr and len.
However, this is not the case according to experimental test results.
We would like to know if such a change in the configuration order
is reasonable and acceptable?

Thanks.

Signed-off-by: ni.liqiang 
Reviewed-by: jin.qi 
Tested-by: jin.qi 
Cc: ni.liqiang 
---
 drivers/virtio/virtio_ring.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6f7e5010a673..bea2c2fb084e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1472,15 +1472,16 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
flags = cpu_to_le16(vq->packed.avail_used_flags |
(++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
(n < out_sgs ? 0 : VRING_DESC_F_WRITE));
-   if (i == head)
-   head_flags = flags;
-   else
-   desc[i].flags = flags;
 
desc[i].addr = cpu_to_le64(addr);
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
 
+   if (i == head)
+   head_flags = flags;
+   else
+   desc[i].flags = flags;
+
if (unlikely(vq->use_dma_api)) {
vq->packed.desc_extra[curr].addr = addr;
vq->packed.desc_extra[curr].len = sg->length;
-- 
2.34.1

Re: [PATCH bpf-next] rethook: Remove warning messages printed for finding return address of a frame.

2024-04-08 Thread Kui-Feng Lee





On 4/7/24 18:13, Masami Hiramatsu (Google) wrote:

On Wed, 3 Apr 2024 16:36:25 +0200
Daniel Borkmann  wrote:


On 4/2/24 6:58 PM, Andrii Nakryiko wrote:

On Mon, Apr 1, 2024 at 12:16 PM Kui-Feng Lee  wrote:


rethook_find_ret_addr() prints a warning message and returns 0 when the
target task is running and not the "current" task to prevent returning an
incorrect return address. However, this check is incomplete as the target
task can still transition to the running state when finding the return
address, although it is safe with RCU.


Could you tell me more about this last part? This change just remove
WARN_ON_ONCE() which warns that the user tries to unwind stack of a running
task. This means the task can change the stack in parallel if the task is
running on other CPU.
Does the BPF stop the task? or do you have any RCU magic to copy the stack?



No, the BPF doesn't stop the task or copy the stack. The last part tries
to explain that this function can still return an incorrect address even
with this check. And calling this function on a target task that is not
"current" is safe.  Since you think it is confusing. I will remove this
part.





The issue we encounter is that the kernel frequently prints warning
messages when BPF profiling programs call to bpf_get_task_stack() on
running tasks.


Hmm, WARN_ON_ONCE should print it once, not frequently.


You are right! I should rephrase it. In a firm with a large number of 
hosts, this warning message become a noise.






The callers should be aware and willing to take the risk of receiving an
incorrect return address from a task that is currently running other than
the "current" one. A warning is not needed here as the callers are intent
on it.

Signed-off-by: Kui-Feng Lee 
---
   kernel/trace/rethook.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index fa03094e9e69..4297a132a7ae 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -248,7 +248,7 @@ unsigned long rethook_find_ret_addr(struct task_struct 
*tsk, unsigned long frame
  if (WARN_ON_ONCE(!cur))
  return 0;

-   if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
+   if (tsk != current && task_is_running(tsk))
  return 0;



This should probably go through Masami's tree, but the change makes
sense to me, given this is an expected condition.

Acked-by: Andrii Nakryiko 


Masami, I assume you'll pick this up?


OK, anyway it will just return 0 if this situation happens, and caller will
get the trampoline address instead of correct return address in this case.
I think it does not do any unsafe things. So I agree removing it.
But I think the explanation is a bit confusing.

Thank you,



Thanks,
Daniel

[PATCH 2/2] ARM: dts: qcom: msm8974-hammerhead: Update gpio hog node name

2024-04-08 Thread Luca Weiss

Follow the gpio-hog bindings and use otg-hog as node name.

Signed-off-by: Luca Weiss 
---
 arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts
index 4aaae8537a3f..06549051be50 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts
@@ -328,7 +328,7 @@ wlan_regulator_pin: wl-reg-active-state {
power-source = ;
};
 
-   otg {
+   otg-hog {
gpio-hog;
gpios = <35 GPIO_ACTIVE_HIGH>;
output-high;

-- 
2.44.0

[PATCH 0/2] Allow gpio-hog nodes in qcom,pmic-gpio bindings (& dt fixup)

2024-04-08 Thread Luca Weiss

Resolve the dt validation failure on Nexus 5.

Signed-off-by: Luca Weiss 
---
Luca Weiss (2):
  dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes
  ARM: dts: qcom: msm8974-hammerhead: Update gpio hog node name

 .../devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml  | 12 
 .../arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dts |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
---
base-commit: 8568bb2ccc278f344e6ac44af6ed010a90aa88dc
change-id: 20240408-qcom-pmic-gpio-hog-2b4c5f103126

Best regards,
-- 
Luca Weiss

[PATCH 1/2] dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes

2024-04-08 Thread Luca Weiss

Allow specifying a GPIO hog, as already used on
qcom-msm8974-lge-nexus5-hammerhead.dts.

Signed-off-by: Luca Weiss 
---
 .../devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml  | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml 
b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
index a786357ed1af..510a05369dbb 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
@@ -424,6 +424,10 @@ patternProperties:
 $ref: "#/$defs/qcom-pmic-gpio-state"
 additionalProperties: false
 
+  "^(hog-[0-9]+|.+-hog(-[0-9]+)?)$":
+required:
+  - gpio-hog
+
 $defs:
   qcom-pmic-gpio-state:
 type: object
@@ -571,6 +575,7 @@ $defs:
 
 examples:
   - |
+#include 
 #include 
 
 pm8921_gpio: gpio@150 {
@@ -594,5 +599,12 @@ examples:
   power-source = ;
 };
   };
+
+  otg-hog {
+gpio-hog;
+gpios = <35 GPIO_ACTIVE_HIGH>;
+output-high;
+line-name = "otg-gpio";
+  };
 };
 ...

-- 
2.44.0

[PATCH] dt-bindings: iio: imu: mpu6050: Improve i2c-gate disallow list

2024-04-08 Thread Luca Weiss

Before all supported sensors except for MPU{9150,9250,9255} were not
allowed to use i2c-gate in the bindings which excluded quite a few
supported sensors where this functionality is supported.

Switch the list of sensors to ones where the Linux driver explicitly
disallows support for the auxiliary bus ("inv_mpu_i2c_aux_bus"). Since
the driver is also based on "default: return true" this should scale
better into the future.

Signed-off-by: Luca Weiss 
---
This fixes dt validation error on qcom-msm8974-lge-nexus5-hammerhead
which uses mpu6515

arch/arm/boot/dts/qcom/qcom-msm8974-lge-nexus5-hammerhead.dtb: mpu6515@68: 
i2c-gate: False schema does not allow {'#address-cells': [[1]], '#size-cells': 
[[0]], 'ak8963@f': {'compatible': ['asahi-kasei,ak8963'], 'reg': [[15]], 
'gpios': [[40, 67, 0]], 'vid-supply': [[50]], 'vdd-supply': [[49]]}, 
'bmp280@76': {'compatible': ['bosch,bmp280'], 'reg': [[118]], 'vdda-supply': 
[[50]], 'vddd-supply': [[49]]}}
from schema $id: 
http://devicetree.org/schemas/iio/imu/invensense,mpu6050.yaml#
---
 .../devicetree/bindings/iio/imu/invensense,mpu6050.yaml | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml 
b/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
index 297b8a1a7ffb..587ff2bced2d 100644
--- a/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
+++ b/Documentation/devicetree/bindings/iio/imu/invensense,mpu6050.yaml
@@ -62,14 +62,15 @@ properties:
 allOf:
   - $ref: /schemas/spi/spi-peripheral-props.yaml#
   - if:
-  not:
-properties:
-  compatible:
-contains:
-  enum:
-- invensense,mpu9150
-- invensense,mpu9250
-- invensense,mpu9255
+  properties:
+compatible:
+  contains:
+enum:
+  - invensense,iam20680
+  - invensense,icm20602
+  - invensense,icm20608
+  - invensense,icm20609
+  - invensense,icm20689
 then:
   properties:
 i2c-gate: false

---
base-commit: 8568bb2ccc278f344e6ac44af6ed010a90aa88dc
change-id: 20240408-mpu6050-i2c-gate-4ea473e492f4

Best regards,
-- 
Luca Weiss

Re: [PATCHv2 1/3] uprobe: Add uretprobe syscall to speed up return probe

2024-04-08 Thread Oleg Nesterov

On 04/08, Jiri Olsa wrote:
>
> On Fri, Apr 05, 2024 at 01:02:30PM +0200, Oleg Nesterov wrote:
> >
> > And what should sys_uretprobe() do if it is not called from the trampoline?
> > I'd prefer force_sig(SIGILL) to punish the abuser ;) OK, OK, EINVAL.
>
> so the similar behaviour with int3 ends up with immediate SIGTRAP
> and not invoking pending uretprobe consumers, like:
>
>   - setup uretprobe for foo
>   - foo() {
>   executes int 3 -> sends SIGTRAP
> }
>
> because the int3 handler checks if it got executed from the uretprobe's
> trampoline.

... or the task has uprobe at this address

> if not it treats that int3 as regular trap

Yes this mimics the "default" behaviour without uprobes/uretprobes

> so I think we should mimic int3 behaviour and:
>
>   - setup uretprobe for foo
>   - foo() {
>  uretprobe_syscall -> check if we got executed from uretprobe's
>  trampoline and send SIGILL if that's not the case

Agreed,

> I think it's better to have the offending process killed right away,
> rather than having more undefined behaviour, waiting for final 'ret'
> instruction that jumps to uretprobe trampoline and causes SIGILL

Agreed. In fact I think it should be also killed if copy_to/from_user()
fails by the same reason.

Oleg.

Re: [PATCHv2 1/3] uprobe: Add uretprobe syscall to speed up return probe

2024-04-08 Thread Jiri Olsa

On Fri, Apr 05, 2024 at 01:02:30PM +0200, Oleg Nesterov wrote:
> On 04/05, Jiri Olsa wrote:
> >
> > On Fri, Apr 05, 2024 at 10:22:03AM +0900, Masami Hiramatsu wrote:
> > >
> > > I think this expects setjmp/longjmp as below
> > >
> > > foo() { <- retprobe1
> > >   setjmp()
> > >   bar() { <- retprobe2
> > >   longjmp()
> > >   }
> > > } <- return to trampoline
> > >
> > > In this case, we need to skip retprobe2's instance.
> 
> Yes,
> 
> > > My concern is, if we can not find appropriate return instance, what 
> > > happen?
> > > e.g.
> > >
> > > foo() { <-- retprobe1
> > >bar() { # sp is decremented
> > >sys_uretprobe() <-- ??
> > > }
> > > }
> > >
> > > It seems sys_uretprobe() will handle retprobe1 at that point instead of
> > > SIGILL.
> >
> > yes, and I think it's fine, you get the consumer called in wrong place,
> > but it's your fault and kernel won't crash
> 
> Agreed.
> 
> With or without this patch userpace can also do
> 
>   foo() { <-- retprobe1
>   bar() {
>   jump to xol_area
>   }
>   }
> 
> handle_trampoline() will handle retprobe1.
> 
> > this can be fixed by checking the syscall is called from the trampoline
> > and prevent handle_trampoline call if it's not
> 
> Yes, but I still do not think this makes a lot of sense. But I won't argue.
> 
> And what should sys_uretprobe() do if it is not called from the trampoline?
> I'd prefer force_sig(SIGILL) to punish the abuser ;) OK, OK, EINVAL.

so the similar behaviour with int3 ends up with immediate SIGTRAP
and not invoking pending uretprobe consumers, like:

  - setup uretprobe for foo
  - foo() {
  executes int 3 -> sends SIGTRAP
}

because the int3 handler checks if it got executed from the uretprobe's
trampoline.. if not it treats that int3 as regular trap

while for uretprobe syscall we have at the moment following behaviour:

  - setup uretprobe for foo
  - foo() {
 uretprobe_syscall -> executes foo's uretprobe consumers
}
  - at some point we get to the 'ret' instruction that jump into uretprobe
trampoline and the uretprobe_syscall won't find pending uretprobe and
will send SIGILL


so I think we should mimic int3 behaviour and:

  - setup uretprobe for foo
  - foo() {
 uretprobe_syscall -> check if we got executed from uretprobe's
 trampoline and send SIGILL if that's not the case

I think it's better to have the offending process killed right away,
rather than having more undefined behaviour, waiting for final 'ret'
instruction that jumps to uretprobe trampoline and causes SIGILL

> 
> I agree very much with Andrii,
> 
>sigreturn()  exists only to allow the implementation of signal 
> handlers.  It should never be
>called directly.  Details of the arguments (if any) passed to 
> sigreturn() vary depending  on
>the architecture.
> 
> this is how sys_uretprobe() should be treated/documented.

yes, will include man page patch in new version

jirka

> 
> sigreturn() can be "improved" too. Say, it could validate sigcontext->ip
> and return -EINVAL if this addr is not valid. But why?
> 
> Oleg.
>

Re: [PATCH] [v4] module: don't ignore sysfs_create_link() failures

2024-04-08 Thread Luis Chamberlain

On Mon, Apr 08, 2024 at 10:05:58AM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The sysfs_create_link() return code is marked as __must_check, but the
> module_add_driver() function tries hard to not care, by assigning the
> return code to a variable. When building with 'make W=1', gcc still
> warns because this variable is only assigned but not used:
> 
> drivers/base/module.c: In function 'module_add_driver':
> drivers/base/module.c:36:6: warning: variable 'no_warn' set but not used 
> [-Wunused-but-set-variable]
> 
> Rework the code to properly unwind and return the error code to the
> caller. My reading of the original code was that it tries to
> not fail when the links already exist, so keep ignoring -EEXIST
> errors.
> 
> Fixes: e17e0f51aeea ("Driver core: show drivers in /sys/module/")
> Reviewed-by: Greg Kroah-Hartman 

Reviewed-by: Luis Chamberlain 

  Luis

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Ondřej Jirman

On Mon, Apr 08, 2024 at 03:27:00PM GMT, Krzysztof Kozlowski wrote:
> On 08/04/2024 14:48, Ondřej Jirman wrote:
> > Yeah, I understand where the confusion is. The driver is not for anx7688 
> > chip
> > really. The driver is named anx7688, but that's mostly a historical 
> > accident at
> > this point.
> > 
> > I guess there can be a driver for anx7688 chip that can directly use the 
> > chip's
> > resources from the host by directly manipulating its registers and 
> > implementing
> > type-c functionality via eg. Linux's TCPM or TCPCI stack, etc. (eg. like
> > fusb302 driver, or various tcpci subdrivers).
> > 
> > But in this case the chip is driven by an optional on-chip microcontroller's
> > firmware and *this driver* is specifically for *the Type-C port on 
> > Pinephone*
> 
> We do not talk here about the driver, but bindings, so hardware.

Got it. Bindings should be the same regardless of what driver would be used,
whether this OCM based one, or some future one based on the above mentioned
TCPCI in-kernel implementation. Hardware is the same in both cases.

Just trying to imagine how to actually solve the issues...

Basic thing with the I2C regulator thing is that needs to be enabled as long
as anx7688 needs to communicate over I2C. Other user of this power rail is
touchscreen controller for its normal power supply, and it needs to be able
to disable it during system suspend.

Now for things to not fail during suspend/resume based on PM callbacks
invocation order, anx7688 driver needs to enable this regulator too, as long
as it needs it.

I can put bus-supply to I2C controller node, and read it from the ANX7688 driver
I guess, by going up a DT node. Whether that's going to be acceptable, I don't
know. 

VCONN regulator I don't know where else to put either. It doesn't seem to belong
anywhere. It's not something directly connected to Type-C connector, so
not part of connector bindings, and there's nothing else I can see, other
than anx7688 device which needs it for core functionality.

ANX7688 chip desing doesn't have integrated VCONN mosfet switches so it always
needs external supply + switches that are controlled by the chip itself. There's
no sensible design where someone would not want this and the driver needs
to get this regulator reference from somewhere. The switches are sort of an
extension of the chip.

kind regards,
o.

> > and serves as an integration driver for quite a bunch of things that need to
> > work together on Pinephone for all of the Type-C port's features to operate
> > reasonably well (and one of those is some communication with anx7688 
> > firmware
> > that we use, and enabling power to this chip and other things as 
> > appropriate,
> > based on the communication from the firmware).
> 
> That's still looking like putting board design into particular device
> binding.
> 
> > 
> > It handles the specific needs of the Pinephone's Type-C implementation, all 
> > of
> > its quirks (of which there are many over several HW revisions) that can't be
> > handled by the particular implementation of on-chip microcontroller firmware
> > directly and need host side interaction.
> > 
> > In an ideal world, many of the things this driver handles would be handled 
> > by
> > embedded microcontroller on the board (like it is with some RK3399 based 
> > Google
> > devices), but Pinephone has no such thing and this glue needs to be 
> > implemented
> > somewhere in the kernel.
> 
> You might need multiple schemas, because this is for anx7688, not for
> Pinephone type-c implementation.
> 
> However I still do not see yet a limitation of DTS requiring stuffing
> some other properties into anx7688 or creating some other, virtual entity.
> 
> 
> Best regards,
> Krzysztof
>

[PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný

pid_max is a per-pidns (thus global too) limit on a number of tasks the
kernel admits. The knob can be configured by admin in the range between
pid_max_min and pid_max_max (sic). The default value sits between
those and it typically equals max(32k, 1k*nr_cpus).

The nr_cpu scaling was introduced in commit 72680a191b93 ("pids:
increase pid_max based on num_possible_cpus") to accommodate kernel's own
helper tasks (before workqueues). Generally, 1024 tasks/cpu cap is too
much if they were all running and it is also too little when they are
idle (memory being bottleneck).

The kernel also provides other mechanisms to restrict number of tasks --
threads-max sysctl and RLIMIT_NPROC with memory-scaled defaults and
generic pids cgroup controller (the last one being the solution of
fork-bombs, with qualified limits set up by admin).

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

This change thus increases initial pid_max from 32k to 4M (x86_64
defconfig).

This has effect on size of structure that alloc_pid/idr_alloc_cyclic
eventually uses and structure that kernel tracing uses with
'record-tgid' (~16 MiB).

Signed-off-by: Michal Koutný 
---
 include/linux/pid.h |  4 ++--
 include/linux/threads.h | 15 ---
 kernel/pid.c|  8 +++-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index a3aad9b4074c..0d191ac02958 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -106,8 +106,8 @@ extern void exchange_tids(struct task_struct *task, struct 
task_struct *old);
 extern void transfer_pid(struct task_struct *old, struct task_struct *new,
 enum pid_type);
 
-extern int pid_max;
-extern int pid_max_min, pid_max_max;
+extern int pid_max_min, pid_max;
+extern const int pid_max_max;
 
 /*
  * look up a PID in the hash table. Must be called with the tasklist_lock
diff --git a/include/linux/threads.h b/include/linux/threads.h
index c34173e6c5f1..43f8f38a0c13 100644
--- a/include/linux/threads.h
+++ b/include/linux/threads.h
@@ -22,25 +22,18 @@
 
 #define MIN_THREADS_LEFT_FOR_ROOT 4
 
-/*
- * This controls the default maximum pid allocated to a process
- */
-#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
-
 /*
  * A maximum of 4 million PIDs should be enough for a while.
  * [NOTE: PID/TIDs are limited to 2^30 ~= 1 billion, see FUTEX_TID_MASK.]
  */
 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
-   (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
+   (sizeof(long) > 4 ? 4 * 1024 * 1024 : 0x8000))
 
 /*
- * Define a minimum number of pids per cpu.  Heuristically based
- * on original pid max of 32k for 32 cpus.  Also, increase the
- * minimum settable value for pid_max on the running system based
- * on similar defaults.  See kernel/pid.c:pid_idr_init() for details.
+ * Define a minimum number of pids per cpu. Mainly to accommodate
+ * smpboot_register_percpu_thread() kernel threads.
+ * See kernel/pid.c:pid_idr_init() for details.
  */
-#define PIDS_PER_CPU_DEFAULT   1024
 #define PIDS_PER_CPU_MIN   8
 
 #endif
diff --git a/kernel/pid.c b/kernel/pid.c
index da76ed1873f7..24ae505ac3b0 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -60,10 +60,10 @@ struct pid init_struct_pid = {
}, }
 };
 
-int pid_max = PID_MAX_DEFAULT;
+int pid_max = PID_MAX_LIMIT;
 
 int pid_max_min = RESERVED_PIDS + 1;
-int pid_max_max = PID_MAX_LIMIT;
+const int pid_max_max = PID_MAX_LIMIT;
 /*
  * Pseudo filesystems start inode numbering after one. We use Reserved
  * PIDs as a natural offset.
@@ -652,9 +652,7 @@ void __init pid_idr_init(void)
/* Verify no one has done anything silly: */
BUILD_BUG_ON(PID_MAX_LIMIT >= PIDNS_ADDING);
 
-   /* bump default and minimum pid_max based on number of cpus */
-   pid_max = min(pid_max_max, max_t(int, pid_max,
-   PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+   /* bump minimum pid_max based on number of cpus */
pid_max_min = max_t(int, pid_max_min,
PIDS_PER_CPU_MIN * num_possible_cpus());
pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
-- 
2.44.0

[PATCH 1/3] tracing: Remove dependency of saved_cmdlines_buffer on PID_MAX_DEFAULT

2024-04-08 Thread Michal Koutný

Calculations into map_pid_to_cmdline use PID_MAX_DEFAULT but they
actually depend on the size of map_pid_to_cmdline. The size of the map
may be arbitrary. First, refer to the map size where necessary, second,
pick a good value for the size of the map.
Since the buffer is allocated at boot (i.e. user cannot affect its size
later), accounting for full PID_MAX_LIMIT would inflate map's size
unnecessarily (4*4M) for all users. Stick to the original value of
4*32k, the commit 785e3c0a3a87 ("tracing: Map all PIDs to command
lines") explains why it still works for higher pids.

The point of this exercise is to remove dependency on PID_MAX_DEFAULT.

Signed-off-by: Michal Koutný 
---
 kernel/trace/trace_sched_switch.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_sched_switch.c 
b/kernel/trace/trace_sched_switch.c
index 8a407adb0e1c..aca2dafdd97a 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -161,6 +161,7 @@ static size_t tgid_map_max;
 
 #define SAVED_CMDLINES_DEFAULT 128
 #define NO_CMDLINE_MAP UINT_MAX
+#define PID_MAP_SIZE (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
 /*
  * Preemption must be disabled before acquiring trace_cmdline_lock.
  * The various trace_arrays' max_lock must be acquired in a context
@@ -168,7 +169,7 @@ static size_t tgid_map_max;
  */
 static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 struct saved_cmdlines_buffer {
-   unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
+   unsigned map_pid_to_cmdline[PID_MAP_SIZE];
unsigned *map_cmdline_to_pid;
unsigned cmdline_num;
int cmdline_idx;
@@ -248,7 +249,7 @@ int trace_save_cmdline(struct task_struct *tsk)
if (!tsk->pid)
return 1;
 
-   tpid = tsk->pid & (PID_MAX_DEFAULT - 1);
+   tpid = tsk->pid % PID_MAP_SIZE;
 
/*
 * It's not the end of the world if we don't get
@@ -294,7 +295,7 @@ static void __trace_find_cmdline(int pid, char comm[])
return;
}
 
-   tpid = pid & (PID_MAX_DEFAULT - 1);
+   tpid = pid % PID_MAP_SIZE;
map = savedcmd->map_pid_to_cmdline[tpid];
if (map != NO_CMDLINE_MAP) {
tpid = savedcmd->map_cmdline_to_pid[map];
@@ -645,8 +646,8 @@ tracing_saved_cmdlines_size_write(struct file *filp, const 
char __user *ubuf,
if (ret)
return ret;
 
-   /* must have at least 1 entry or less than PID_MAX_DEFAULT */
-   if (!val || val > PID_MAX_DEFAULT)
+   /* must have at least 1 entry or fit into map_pid_to_cmdline */
+   if (!val || val >= PID_MAP_SIZE)
return -EINVAL;
 
ret = tracing_resize_saved_cmdlines((unsigned int)val);
-- 
2.44.0

[PATCH 3/3] tracing: Compare pid_max against pid_list capacity

2024-04-08 Thread Michal Koutný

trace_pid_list_alloc() checks pid_max against a magic number referencing
an (obsolete) source file when it actually should check against the
capacity of pid_list tree. Turn definition of MAX_PID around -- derive
it from tree parameters and replace references to magic value and
header files with so defined MAX_PID.
Should PID_MAX_LIMIT change in future or pid_max escapes PID_MAX_LIMIT,
appropriate checks remain in place. No functional change intended.

Signed-off-by: Michal Koutný 
---
 kernel/trace/pid_list.c | 6 +++---
 kernel/trace/pid_list.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c
index 95106d02b32d..b968f0b65dc1 100644
--- a/kernel/trace/pid_list.c
+++ b/kernel/trace/pid_list.c
@@ -93,7 +93,7 @@ static inline bool upper_empty(union upper_chunk *chunk)
 static inline int pid_split(unsigned int pid, unsigned int *upper1,
 unsigned int *upper2, unsigned int *lower)
 {
-   /* MAX_PID should cover all pids */
+   /* MAX_PID must cover all possible pids */
BUILD_BUG_ON(MAX_PID < PID_MAX_LIMIT);
 
/* In case a bad pid is passed in, then fail */
@@ -413,8 +413,8 @@ struct trace_pid_list *trace_pid_list_alloc(void)
struct trace_pid_list *pid_list;
int i;
 
-   /* According to linux/thread.h, pids can be no bigger that 30 bits */
-   WARN_ON_ONCE(pid_max > (1 << 30));
+   /* See pid_split(), equal to pid_max > PID_MAX_LIMIT */
+   WARN_ON_ONCE(pid_max > MAX_PID);
 
pid_list = kzalloc(sizeof(*pid_list), GFP_KERNEL);
if (!pid_list)
diff --git a/kernel/trace/pid_list.h b/kernel/trace/pid_list.h
index 62e73f1ac85f..28562a9a3d01 100644
--- a/kernel/trace/pid_list.h
+++ b/kernel/trace/pid_list.h
@@ -56,8 +56,8 @@
 
 #define UPPER_MASK (UPPER_MAX - 1)
 
-/* According to linux/thread.h pids can not be bigger than or equal to 1 << 30 
*/
-#define MAX_PID(1 << 30)
+/* Structure can hold only pids strictly below this limit */
+#define MAX_PID(1 << (UPPER_BITS + UPPER_BITS + LOWER_BITS))
 
 /* Just keep 6 chunks of both upper and lower in the cache on alloc */
 #define CHUNK_ALLOC 6
-- 
2.44.0

[PATCH 0/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný

TL;DR excerpt from commit 02/03:

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

More details are in that commit's message. The other two commits are
related preparation and less related refresh in code that somewhat
references pid_max.

Michal Koutný (3):
  tracing: Remove dependency of saved_cmdlines_buffer on PID_MAX_DEFAULT
  kernel/pid: Remove default pid_max value
  tracing: Compare pid_max against pid_list capacity

 include/linux/pid.h   |  4 ++--
 include/linux/threads.h   | 15 ---
 kernel/pid.c  |  8 +++-
 kernel/trace/pid_list.c   |  6 +++---
 kernel/trace/pid_list.h   |  4 ++--
 kernel/trace/trace_sched_switch.c | 11 ++-
 6 files changed, 20 insertions(+), 28 deletions(-)


base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
-- 
2.44.0

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-08 Thread Peter Xu

On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> The .change_pte() MMU notifier callback was intended as an
> optimization. The original point of it was that KSM could tell KVM to flip
> its secondary PTE to a new location without having to first zap it. At
> the time there was also an .invalidate_page() callback; both of them were
> *not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
> and .invalidate_page() also doubled as a fallback implementation of
> .change_pte().
> 
> Later on, however, both callbacks were changed to occur within an
> invalidate_range_start/end() block.
> 
> In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
> set_pte_at_notify with invalidate_range_start and invalidate_range_end",
> 2012-10-09) did so to remove the fallback from .invalidate_page() to
> .change_pte() and allow sleepable .invalidate_page() hooks.
> 
> This however made KVM's usage of the .change_pte() callback completely
> moot, because KVM unmaps the sPTEs during .invalidate_range_start()
> and therefore .change_pte() has no hope of finding a sPTE to change.
> Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
> well as all the architecture specific implementations.

Paolo,

I may miss a bunch of details here (as I still remember some change_pte
patches previously on the list..), however not sure whether we considered
enable it?  Asked because I remember Andrea used to have a custom tree
maintaining that part:

https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968

Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?

Thanks,

-- 
Peter Xu

Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL

2024-04-08 Thread Honggyu Kim

Hi Gregory,

On Fri, 5 Apr 2024 12:56:14 -0400 Gregory Price  
wrote:
> On Fri, Apr 05, 2024 at 03:08:49PM +0900, Honggyu Kim wrote:
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> > 
> >   1. YCSB zipfian distribution read only workload
> >   memory pressure with cold memory on node0 with 512GB of local DRAM.
> >   =++=
> >|   cold memory occupied by mmap and memset  |
> >|   0G  440G  450G  460G  470G  480G  490G  500G |
> >   =++=
> >   Execution time normalized to DRAM-only values | GEOMEAN
> >   -++-
> >   DRAM-only| 1.00 - - - - - - - | 1.00
> >   CXL-only | 1.22 - - - - - - - | 1.22
> >   default  |-  1.12  1.13  1.14  1.16  1.19  1.21  1.21 | 1.17 
> >   DAMON tiered |-  1.04  1.03  1.04  1.06  1.05  1.05  1.05 | 1.05 
> >   =++=
> >   CXL usage of redis-server in GB   | AVERAGE
> >   -++-
> >   DRAM-only|  0.0 - - - - - - - |  0.0
> >   CXL-only | 52.6 - - - - - - - | 52.6
> >   default  |-  20.4  27.0  33.1  39.5  45.6  50.5  50.3 | 38.1
> >   DAMON tiered |-   0.1   0.3   0.8   0.6   0.7   1.3   0.9 |  0.7
> >   =++=
> > 
> > Each test result is based on the exeuction environment as follows.
> > 
> >   DRAM-only   : redis-server uses only local DRAM memory.
> >   CXL-only: redis-server uses only CXL memory.
> >   default : default memory policy(MPOL_DEFAULT).
> > numa balancing disabled.
> >   DAMON tiered: DAMON enabled with DAMOS_MIGRATE_COLD for DRAM nodes and
> > DAMOS_MIGRATE_HOT for CXL nodes.
> > 
> > The above result shows the "default" execution time goes up as the size
> > of cold memory is increased from 440G to 500G because the more cold
> > memory used, the more CXL memory is used for the target redis workload
> > and this makes the execution time increase.
> > 
> > However, "DAMON tiered" result shows less slowdown because the
> > DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated
> > cold memory to CXL node and this free space at DRAM increases more
> > chance to allocate hot or warm pages of redis-server to fast DRAM node.
> > Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages
> > of redis-server to DRAM node actively.
> > 
> > As a result, it makes more memory of redis-server stay in DRAM node
> > compared to "default" memory policy and this makes the performance
> > improvement.
> > 
> > The following result of latest distribution workload shows similar data.
> > 
> >   2. YCSB latest distribution read only workload
> >   memory pressure with cold memory on node0 with 512GB of local DRAM.
> >   =++=
> >|   cold memory occupied by mmap and memset  |
> >|   0G  440G  450G  460G  470G  480G  490G  500G |
> >   =++=
> >   Execution time normalized to DRAM-only values | GEOMEAN
> >   -++-
> >   DRAM-only| 1.00 - - - - - - - | 1.00
> >   CXL-only | 1.18 - - - - - - - | 1.18
> >   default  |-  1.18  1.19  1.18  1.18  1.17  1.19  1.18 | 1.18 
> >   DAMON tiered |-  1.04  1.04  1.04  1.05  1.04  1.05  1.05 | 1.04 
> >   =++=
> >   CXL usage of redis-server in GB   | AVERAGE
> >   -++-
> >   DRAM-only|  0.0 - - - - - - - |  0.0
> >   CXL-only | 52.6 - - - - - - - | 52.6
> >   default  |-  20.5  27.1  33.2  39.5  45.5  50.4  50.5 | 38.1
> >   DAMON tiered |-   0.2   0.4   0.7   1.6   1.2   1.1   3.4 |  1.2
> >   =++=
> > 
> > In summary of both results, our evaluation shows that "DAMON tiered"
> > memory management reduces the performance slowdown compared to the
> > "default" memory policy from 17~18% to 4~5% when the system runs with
> > high memory pressure on its fast tier DRAM nodes.
> > 
> > Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
> > tiered memory systems run more

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 14:48, Ondřej Jirman wrote:
> On Mon, Apr 08, 2024 at 01:59:12PM GMT, Krzysztof Kozlowski wrote:
>> On 08/04/2024 13:52, Ondřej Jirman wrote:
>>> On Mon, Apr 08, 2024 at 01:24:03PM GMT, Krzysztof Kozlowski wrote:
 On 08/04/2024 13:21, Pavel Machek wrote:
> Hi!
>
>>> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
>>> but I did best I could.
>>>
>>> Signed-off-by: Pavel Machek 
>>
>> ...
>>
>>> +  cabledet-gpios:
>>> +maxItems: 1
>>> +description: GPIO controlling CABLE_DET (C3) pin.
>>> +
>>> +  avdd10-supply:
>>> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
>>> +
>>> +  dvdd10-supply:
>>> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
>>> +
>>> +  avdd18-supply:
>>> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
>>> +
>>> +  dvdd18-supply:
>>> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
>>> +
>>> +  avdd33-supply:
>>> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
>>> +
>>> +  i2c-supply: true
>>> +  vconn-supply: true
>>
>> There are no such supplies like i2c and vconn on the schematics.
>>
>> I think this represents some other part of component which was added
>> here only for convenience.
>
> Can you give me pointer to documentation you are looking at?

 The schematics you linked in the document at the beginning. Page 13. Do
 you see these pins there? I saw only VCONN1_EN, but that's not a supply.
>>>
>>> The supply is U1308.
>>
>> That's not a supply to anx7688.
> 
> Yeah, I understand where the confusion is. The driver is not for anx7688 chip
> really. The driver is named anx7688, but that's mostly a historical accident 
> at
> this point.
> 
> I guess there can be a driver for anx7688 chip that can directly use the 
> chip's
> resources from the host by directly manipulating its registers and 
> implementing
> type-c functionality via eg. Linux's TCPM or TCPCI stack, etc. (eg. like
> fusb302 driver, or various tcpci subdrivers).
> 
> But in this case the chip is driven by an optional on-chip microcontroller's
> firmware and *this driver* is specifically for *the Type-C port on Pinephone*

We do not talk here about the driver, but bindings, so hardware.

> and serves as an integration driver for quite a bunch of things that need to
> work together on Pinephone for all of the Type-C port's features to operate
> reasonably well (and one of those is some communication with anx7688 firmware
> that we use, and enabling power to this chip and other things as appropriate,
> based on the communication from the firmware).

That's still looking like putting board design into particular device
binding.

> 
> It handles the specific needs of the Pinephone's Type-C implementation, all of
> its quirks (of which there are many over several HW revisions) that can't be
> handled by the particular implementation of on-chip microcontroller firmware
> directly and need host side interaction.
> 
> In an ideal world, many of the things this driver handles would be handled by
> embedded microcontroller on the board (like it is with some RK3399 based 
> Google
> devices), but Pinephone has no such thing and this glue needs to be 
> implemented
> somewhere in the kernel.

You might need multiple schemas, because this is for anx7688, not for
Pinephone type-c implementation.

However I still do not see yet a limitation of DTS requiring stuffing
some other properties into anx7688 or creating some other, virtual entity.


Best regards,
Krzysztof

Re: [PATCH v3] Documentation: Add reconnect process for VDUSE

2024-04-08 Thread Michael S. Tsirkin

On Mon, Apr 08, 2024 at 08:39:21PM +0800, Cindy Lu wrote:
> On Mon, Apr 8, 2024 at 3:40 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Apr 04, 2024 at 01:56:31PM +0800, Cindy Lu wrote:
> > > Add a document explaining the reconnect process, including what the
> > > Userspace App needs to do and how it works with the kernel.
> > >
> > > Signed-off-by: Cindy Lu 
> > > ---
> > >  Documentation/userspace-api/vduse.rst | 41 +++
> > >  1 file changed, 41 insertions(+)
> > >
> > > diff --git a/Documentation/userspace-api/vduse.rst 
> > > b/Documentation/userspace-api/vduse.rst
> > > index bdb880e01132..7faa83462e78 100644
> > > --- a/Documentation/userspace-api/vduse.rst
> > > +++ b/Documentation/userspace-api/vduse.rst
> > > @@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
> > > after the used ring is filled.
> > >
> > >  For more details on the uAPI, please see include/uapi/linux/vduse.h.
> > > +
> > > +HOW VDUSE devices reconnection works
> > > +
> > > +1. What is reconnection?
> > > +
> > > +   When the userspace application loads, it should establish a connection
> > > +   to the vduse kernel device. Sometimes,the userspace application 
> > > exists,
> > > +   and we want to support its restart and connect to the kernel device 
> > > again
> > > +
> > > +2. How can I support reconnection in a userspace application?
> > > +
> > > +2.1 During initialization, the userspace application should first verify 
> > > the
> > > +existence of the device "/dev/vduse/vduse_name".
> > > +If it doesn't exist, it means this is the first-time for connection. 
> > > goto step 2.2
> > > +If it exists, it means this is a reconnection, and we should goto 
> > > step 2.3
> > > +
> > > +2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> > > +/dev/vduse/control.
> > > +When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
> > > +the reconnect information. The total memory size is 
> > > PAGE_SIZE*vq_mumber.
> >
> > Confused. Where is that allocation, in code?
> >
> > Thanks!
> >
> this should allocated in function vduse_create_dev(),

I mean, it's not allocated there ATM right? This is just doc patch
to become part of a larger patchset?

> I will rewrite
> this part  to make it more clearer
> will send a new version soon
> Thanks
> cindy
> 
> > > +2.3 Check if the information is suitable for reconnect
> > > +If this is reconnection :
> > > +Before attempting to reconnect, The userspace application needs to 
> > > use the
> > > +ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
> > > VDUSE_DEV_GET_FEATURES...)
> > > +to get the information from kernel.
> > > +Please review the information and confirm if it is suitable to 
> > > reconnect.
> > > +
> > > +2.4 Userspace application needs to mmap the memory to userspace
> > > +The userspace application requires mapping one page for every vq. 
> > > These pages
> > > +should be used to save vq-related information during system running. 
> > > Additionally,
> > > +the application must define its own structure to store information 
> > > for reconnection.
> > > +
> > > +2.5 Completed the initialization and running the application.
> > > +While the application is running, it is important to store relevant 
> > > information
> > > +about reconnections in mapped pages. When calling the ioctl 
> > > VDUSE_VQ_GET_INFO to
> > > +get vq information, it's necessary to check whether it's a 
> > > reconnection. If it is
> > > +a reconnection, the vq-related information must be get from the 
> > > mapped pages.
> > > +
> > > +2.6 When the Userspace application exits, it is necessary to unmap all 
> > > the
> > > +pages for reconnection
> > > --
> > > 2.43.0
> >

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Ondřej Jirman

On Mon, Apr 08, 2024 at 01:59:12PM GMT, Krzysztof Kozlowski wrote:
> On 08/04/2024 13:52, Ondřej Jirman wrote:
> > On Mon, Apr 08, 2024 at 01:24:03PM GMT, Krzysztof Kozlowski wrote:
> >> On 08/04/2024 13:21, Pavel Machek wrote:
> >>> Hi!
> >>>
> > Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> > but I did best I could.
> >
> > Signed-off-by: Pavel Machek 
> 
>  ...
> 
> > +  cabledet-gpios:
> > +maxItems: 1
> > +description: GPIO controlling CABLE_DET (C3) pin.
> > +
> > +  avdd10-supply:
> > +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> > +
> > +  dvdd10-supply:
> > +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> > +
> > +  avdd18-supply:
> > +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> > +
> > +  dvdd18-supply:
> > +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> > +
> > +  avdd33-supply:
> > +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> > +
> > +  i2c-supply: true
> > +  vconn-supply: true
> 
>  There are no such supplies like i2c and vconn on the schematics.
> 
>  I think this represents some other part of component which was added
>  here only for convenience.
> >>>
> >>> Can you give me pointer to documentation you are looking at?
> >>
> >> The schematics you linked in the document at the beginning. Page 13. Do
> >> you see these pins there? I saw only VCONN1_EN, but that's not a supply.
> > 
> > The supply is U1308.
> 
> That's not a supply to anx7688.

Yeah, I understand where the confusion is. The driver is not for anx7688 chip
really. The driver is named anx7688, but that's mostly a historical accident at
this point.

I guess there can be a driver for anx7688 chip that can directly use the chip's
resources from the host by directly manipulating its registers and implementing
type-c functionality via eg. Linux's TCPM or TCPCI stack, etc. (eg. like
fusb302 driver, or various tcpci subdrivers).

But in this case the chip is driven by an optional on-chip microcontroller's
firmware and *this driver* is specifically for *the Type-C port on Pinephone*
and serves as an integration driver for quite a bunch of things that need to
work together on Pinephone for all of the Type-C port's features to operate
reasonably well (and one of those is some communication with anx7688 firmware
that we use, and enabling power to this chip and other things as appropriate,
based on the communication from the firmware).

It handles the specific needs of the Pinephone's Type-C implementation, all of
its quirks (of which there are many over several HW revisions) that can't be
handled by the particular implementation of on-chip microcontroller firmware
directly and need host side interaction.

In an ideal world, many of the things this driver handles would be handled by
embedded microcontroller on the board (like it is with some RK3399 based Google
devices), but Pinephone has no such thing and this glue needs to be implemented
somewhere in the kernel.

Kind regards,
o.

> Best regards,
> Krzysztof
>

Re: [PATCH v2] kprobes: Avoid possible warn in __arm_kprobe_ftrace()

2024-04-08 Thread Google

Hi Zheng,

On Mon, 8 Apr 2024 16:34:03 +0800
Zheng Yejian  wrote:

> There is once warn in __arm_kprobe_ftrace() on:
> 
>  ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
>  if (WARN_ONCE(..., "Failed to arm kprobe-ftrace at %pS (error %d)\n", ...)
>return ret;
> 
> This warning is generated because 'p->addr' is detected to be not a valid
> ftrace location in ftrace_set_filter_ip(). The ftrace address check is done
> by check_ftrace_location() at the beginning of check_kprobe_address_safe().
> At that point, ftrace_location(addr) == addr should return true if the
> module is loaded. Then the module is searched twice:
>   1. in is_module_text_address(), we find that 'p->addr' is in a module;
>   2. in __module_text_address(), we find the module;
> 
> If the module has just been unloaded before the second search, then
> '*probed_mod' is NULL and we would not go to get the module refcount,
> then the return value of check_kprobe_address_safe() would be 0, but
> actually we need to return -EINVAL.

OK, so you found a race window in check_kprobe_address_safe().

It does something like below.

check_kprobe_address_safe() {
...

/* Timing [A] */

if (!(core_kernel_text(p->addr) ||
is_module_text_address(p->addr)) ||
...(other reserved address check)) {
return -EINVAL;
}

/* Timing [B] */

*probed_mod = __module_text_address(p->addr):
if (*probe_mod) {
if (!try_module_get(*probed_mod)) {
return -ENOENT;
}
... 
}
}

So, if p->addr is in a module which is alive at the timing [A], but
unloaded at timing [B], 'p->addr' is passed the
'is_module_text_address(p->addr)' check, but *probed_mod becomes NULL.
Thus the corresponding module is not referenced and kprobe_arm(p) will
access a wrong address (use after free).
This happens either kprobe on ftrace is enabled or not.

To fix this problem, we should move the mutex_lock(kprobe_mutex) before
check_kprobe_address_safe() because kprobe_module_callback() also lock it
so it can stop module unloading.

Can you ensure this will fix your problem?
I think your patch is just optimizing but not fixing the fundamental
problem, which is we don't have an atomic search symbol and get module
API. In that case, we should stop a whole module unloading system until
registering a new kprobe on a module. (After registering the kprobe,
the callback can mark it gone and disarm_kprobe does not work anymore.)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9d9095e81792..94eaefd1bc51 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1633,11 +1633,11 @@ int register_kprobe(struct kprobe *p)
p->nmissed = 0;
INIT_LIST_HEAD(>list);
 
+   mutex_lock(_mutex);
+
ret = check_kprobe_address_safe(p, _mod);
if (ret)
-   return ret;
-
-   mutex_lock(_mutex);
+   goto out;
 
if (on_func_entry)
p->flags |= KPROBE_FLAG_ON_FUNC_ENTRY;



Thank you,

> 
> To fix it, originally we can simply check 'p->addr' is out of text again,
> like below. But that would check twice respectively in kernel text and
> module text, so finally I reduce them to be once.
> 
>   if (!(core_kernel_text((unsigned long) p->addr) ||
>   is_module_text_address((unsigned long) p->addr)) || ...) {
>   ret = -EINVAL;
>   goto out;
>   }
>   ...
>   *probed_mod = __module_text_address((unsigned long) p->addr);
>   if (*probed_mod) {
>   ...
>   } else if (!core_kernel_text((unsigned long) p->addr)) { // check again!
>   ret = -EINVAL;
>   goto out;
>   }
> 
> Signed-off-by: Zheng Yejian 
> ---
>  kernel/kprobes.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> v2:
>  - Update commit messages and comments as suggested by Masami.
>Link: 
> https://lore.kernel.org/all/20240408115038.b0c85767bf1f249eccc32...@kernel.org/
> 
> v1:
>  - Link: 
> https://lore.kernel.org/all/20240407035904.2556645-1-zhengyeji...@huawei.com/
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 9d9095e81792..65adc815fc6e 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1567,10 +1567,17 @@ static int check_kprobe_address_safe(struct kprobe *p,
>   jump_label_lock();
>   preempt_disable();
>  
> - /* Ensure it is not in reserved area nor out of text */
> - if (!(core_kernel_text((unsigned long) p->addr) ||
> - is_module_text_address((unsigned long) p->addr)) ||
> - in_gate_area_no_mm((unsigned long) p->addr) ||
> + /* Ensure the address is in a text area, and find a module if exists. */
> + *probed_mod = NULL;
> + if (!core_kernel_text((unsigned long) p->addr)) {
> + *probed_mod = __module_text_address((unsigned long) p->addr);
> + if (!(*probed_mod)) {
> + ret = -EINVAL;
> +

Re: [PATCH v3] Documentation: Add reconnect process for VDUSE

2024-04-08 Thread Cindy Lu

On Mon, Apr 8, 2024 at 3:40 PM Michael S. Tsirkin  wrote:
>
> On Thu, Apr 04, 2024 at 01:56:31PM +0800, Cindy Lu wrote:
> > Add a document explaining the reconnect process, including what the
> > Userspace App needs to do and how it works with the kernel.
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >  Documentation/userspace-api/vduse.rst | 41 +++
> >  1 file changed, 41 insertions(+)
> >
> > diff --git a/Documentation/userspace-api/vduse.rst 
> > b/Documentation/userspace-api/vduse.rst
> > index bdb880e01132..7faa83462e78 100644
> > --- a/Documentation/userspace-api/vduse.rst
> > +++ b/Documentation/userspace-api/vduse.rst
> > @@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
> > after the used ring is filled.
> >
> >  For more details on the uAPI, please see include/uapi/linux/vduse.h.
> > +
> > +HOW VDUSE devices reconnection works
> > +
> > +1. What is reconnection?
> > +
> > +   When the userspace application loads, it should establish a connection
> > +   to the vduse kernel device. Sometimes,the userspace application exists,
> > +   and we want to support its restart and connect to the kernel device 
> > again
> > +
> > +2. How can I support reconnection in a userspace application?
> > +
> > +2.1 During initialization, the userspace application should first verify 
> > the
> > +existence of the device "/dev/vduse/vduse_name".
> > +If it doesn't exist, it means this is the first-time for connection. 
> > goto step 2.2
> > +If it exists, it means this is a reconnection, and we should goto step 
> > 2.3
> > +
> > +2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> > +/dev/vduse/control.
> > +When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
> > +the reconnect information. The total memory size is 
> > PAGE_SIZE*vq_mumber.
>
> Confused. Where is that allocation, in code?
>
> Thanks!
>
this should allocated in function vduse_create_dev(), I will rewrite
this part  to make it more clearer
will send a new version soon
Thanks
cindy

> > +2.3 Check if the information is suitable for reconnect
> > +If this is reconnection :
> > +Before attempting to reconnect, The userspace application needs to use 
> > the
> > +ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
> > VDUSE_DEV_GET_FEATURES...)
> > +to get the information from kernel.
> > +Please review the information and confirm if it is suitable to 
> > reconnect.
> > +
> > +2.4 Userspace application needs to mmap the memory to userspace
> > +The userspace application requires mapping one page for every vq. 
> > These pages
> > +should be used to save vq-related information during system running. 
> > Additionally,
> > +the application must define its own structure to store information for 
> > reconnection.
> > +
> > +2.5 Completed the initialization and running the application.
> > +While the application is running, it is important to store relevant 
> > information
> > +about reconnections in mapped pages. When calling the ioctl 
> > VDUSE_VQ_GET_INFO to
> > +get vq information, it's necessary to check whether it's a 
> > reconnection. If it is
> > +a reconnection, the vq-related information must be get from the mapped 
> > pages.
> > +
> > +2.6 When the Userspace application exits, it is necessary to unmap all the
> > +pages for reconnection
> > --
> > 2.43.0
>

Re: [PATCH v10 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

2024-04-08 Thread Huang, Kai


> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
> @@ -28,6 +28,10 @@ static inline int sgx_cgroup_try_charge(struct sgx_cgroup 
> *sgx_cg, enum sgx_recl
>  static inline void sgx_cgroup_uncharge(struct sgx_cgroup *sgx_cg) { }
>  
>  static inline void sgx_cgroup_init(void) { }
> +
> +static inline void sgx_cgroup_reclaim_pages(struct misc_cg *root, struct 
> mm_struct *charge_mm)
> +{
> +}
>  #else
>  struct sgx_cgroup {
>   struct misc_cg *cg;
> @@ -65,6 +69,9 @@ static inline void sgx_put_cg(struct sgx_cgroup *sgx_cg)
>  
>  int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg, enum sgx_reclaim 
> reclaim);
>  void sgx_cgroup_uncharge(struct sgx_cgroup *sgx_cg);
> +bool sgx_cgroup_lru_empty(struct misc_cg *root);
> +bool sgx_cgroup_should_reclaim(struct sgx_cgroup *sgx_cg);
> +void sgx_cgroup_reclaim_pages(struct misc_cg *root,  struct mm_struct 
> *charge_mm);

Seems the decision to choose whether to implement a stub function for some
function is quite random to me.

My impression is people in general don't like #ifdef in the C file but prefer to
implementing it in the header in some helper function.

I guess you might want to just implement a stub function for each of the 3
functions exposed, so that we can eliminate some #ifdefs in the sgx/main.c (see
below).

>  void sgx_cgroup_init(void);
>  
>  #endif
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 7f92455d957d..68f28ff2d5ef 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -34,6 +34,16 @@ static struct sgx_epc_lru_list sgx_global_lru;
>  
>  static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page 
> *epc_page)
>  {
> +#ifdef CONFIG_CGROUP_SGX_EPC
> + if (epc_page->sgx_cg)
> + return _page->sgx_cg->lru;
> +
> + /*
> +  * This should not happen when cgroup is enabled: Every page belongs
> +  * to a cgroup, or the root by default.
> +  */
> + WARN_ON_ONCE(1);

In the case MISC cgroup is enabled in Kconfig but disabled by command line, I
think this becomes legal now?

> +#endif
>   return _global_lru;
>  }
>  
> @@ -42,7 +52,11 @@ static inline struct sgx_epc_lru_list *sgx_lru_list(struct 
> sgx_epc_page *epc_pag
>   */
>  static inline bool sgx_can_reclaim(void)
>  {
> +#ifdef CONFIG_CGROUP_SGX_EPC
> + return !sgx_cgroup_lru_empty(misc_cg_root());
> +#else
>   return !list_empty(_global_lru.reclaimable);
> +#endif
>  }
>  

Here you are using #ifdef  CONFIG_CGRUP_SGX_EPC, but ...

>  static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
> @@ -404,7 +418,10 @@ static bool sgx_should_reclaim(unsigned long watermark)
>  
>  static void sgx_reclaim_pages_global(struct mm_struct *charge_mm)
>  {
> - sgx_reclaim_pages(_global_lru, charge_mm);
> + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC))
> + sgx_cgroup_reclaim_pages(misc_cg_root(), charge_mm);
> + else
> + sgx_reclaim_pages(_global_lru, charge_mm);
>  }

... here you are using IS_ENABLED(CONFIG_CGROUP_SGX_EPC).

Any reason they are not consistent?

Also, in the case where MISC cgroup is disabled via commandline, I think it
won't work, because misc_cg_root() should be NULL in this case while
IS_ENABLED(CONFIG_CGROUP_SGX_EPC) is true.

>  
>  /*
> @@ -414,6 +431,16 @@ static void sgx_reclaim_pages_global(struct mm_struct 
> *charge_mm)
>   */
>  void sgx_reclaim_direct(void)
>  {
> +#ifdef CONFIG_CGROUP_SGX_EPC
> + struct sgx_cgroup *sgx_cg = sgx_get_current_cg();
> +
> + /* Make sure there are some free pages at cgroup level */
> + if (sgx_cg && sgx_cgroup_should_reclaim(sgx_cg)) {
> + sgx_cgroup_reclaim_pages(sgx_cg->cg, current->mm);
> + sgx_put_cg(sgx_cg);
> + }
> +#endif

This #ifdef CONFIG_CGROUP_SGX_EPC can be removed if we implement a stub function
for sgx_cgroup_should_reclaim().

> + /* Make sure there are some free pages at global level */
>   if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
>   sgx_reclaim_pages_global(current->mm);
>  }

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-08 Thread Honggyu Kim

On Fri,  5 Apr 2024 12:24:30 -0700 SeongJae Park  wrote:
> On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  wrote:
> 
> > This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
> > DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
> > instead of swapping them out.
> > 
> > The 'target_nid' sysfs knob is created by this patch to inform the
> > migration target node ID.
> 
> Isn't it created by the previous patch?

Right.  I didn't fix the commit message after split this patch.  I will
fix it.

> > 
> > Here is one of the example usage of this 'migrate_cold' action.
> > 
> >   $ cd /sys/kernel/mm/damon/admin/kdamonds/
> >   $ cat contexts//schemes//action
> >   migrate_cold
> >   $ echo 2 > contexts//schemes//target_nid
> >   $ echo commit > state
> >   $ numactl -p 0 ./hot_cold 500M 600M &
> >   $ numastat -c -p hot_cold
> > 
> >   Per-node process memory usage (in MBs)
> >   PID Node 0 Node 1 Node 2 Total
> >   --  -- -- -- -
> >   701 (hot_cold) 501  0601  1101
> > 
> > Since there are some common routines with pageout, many functions have
> > similar logics between pageout and migrate cold.
> > 
> > damon_pa_migrate_folio_list() is a minimized version of
> > shrink_folio_list(), but it's minified only for demotion.
> 
> MIGRATE_COLD is not only for demotion, right?  I think the last two words are
> better to be removed for reducing unnecessary confuses.

You mean the last two sentences?  I will remove them if you feel it's
confusing.

> > 
> > Signed-off-by: Honggyu Kim 
> > Signed-off-by: Hyeongtak Ji 
> > ---
> >  include/linux/damon.h|   2 +
> >  mm/damon/paddr.c | 146 ++-
> >  mm/damon/sysfs-schemes.c |   4 ++
> >  3 files changed, 151 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/damon.h b/include/linux/damon.h
> > index 24ea33a03d5d..df8671e69a70 100644
> > --- a/include/linux/damon.h
> > +++ b/include/linux/damon.h
> > @@ -105,6 +105,7 @@ struct damon_target {
> >   * @DAMOS_NOHUGEPAGE:  Call ``madvise()`` for the region with 
> > MADV_NOHUGEPAGE.
> >   * @DAMOS_LRU_PRIO:Prioritize the region on its LRU lists.
> >   * @DAMOS_LRU_DEPRIO:  Deprioritize the region on its LRU lists.
> > + * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
> 
> Whether it will be for cold region or not is depending on the target access
> pattern.  What about 'Migrate the regions in coldest regions first manner.'?
> Or, simply 'Migrate the regions (prioritize cold)' here, and explain about the
> prioritization under quota on the detailed comments part?

"Migrate the regions in coldest regions first manner under quota" sounds
better.  I will change it.

> Also, let's use tab consistently.

Yeah, it's a mistake.  will fix it.

> >   * @DAMOS_STAT:Do nothing but count the stat.
> >   * @NR_DAMOS_ACTIONS:  Total number of DAMOS actions
> >   *
> > @@ -122,6 +123,7 @@ enum damos_action {
> > DAMOS_NOHUGEPAGE,
> > DAMOS_LRU_PRIO,
> > DAMOS_LRU_DEPRIO,
> > +   DAMOS_MIGRATE_COLD,
> > DAMOS_STAT, /* Do nothing but only record the stat */
> > NR_DAMOS_ACTIONS,
> >  };
> > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> > index 277a1c4d833c..fe217a26f788 100644
> > --- a/mm/damon/paddr.c
> > +++ b/mm/damon/paddr.c
> > @@ -12,6 +12,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> >  
> >  #include "../internal.h"
> >  #include "ops-common.h"
> > @@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> > struct folio *folio)
> >  
> >  enum migration_mode {
> > MIG_PAGEOUT,
> > +   MIG_MIGRATE_COLD,
> >  };
> >  
> > +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> > +  struct pglist_data *pgdat,
> > +  int target_nid)
> 
> To avoid name collisions, I'd prefer having damon_pa_prefix.  I show this 
> patch
> is defining damon_pa_migrate_folio_list() below, though.  What about
> __damon_pa_migrate_folio_list()?

Ack.  I will change it to __damon_pa_migrate_folio_list().

> > +{
> > +   unsigned int nr_succeeded;
> > +   nodemask_t allowed_mask = NODE_MASK_NONE;
> > +
> 
> I personally prefer not having empty lines in the middle of variable
> declarations/definitions.  Could we remove this empty line?

I can remove it, but I would like to have more discussion about this
issue.  The current implementation allows only a single migration
target with "target_nid", but users might want to provide fall back
migration target nids.

For example, if more than two CXL nodes exist in the system, users might
want to migrate cold pages to any CXL nodes.  In such cases, we might
have to make "target_nid" accept comma separated node IDs.  nodemask can
be better but we should provide a way to change the scanning order.

I would like to hear how you think about

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 13:51, Ondřej Jirman wrote:
> Hi Krzysztof,
> 
> On Mon, Apr 08, 2024 at 01:17:32PM GMT, Krzysztof Kozlowski wrote:
>> On 08/04/2024 12:51, Pavel Machek wrote:
>>> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
>>> but I did best I could.
>>>
>>> Signed-off-by: Pavel Machek 
>>
>> ...
>>
>>> +  cabledet-gpios:
>>> +maxItems: 1
>>> +description: GPIO controlling CABLE_DET (C3) pin.
>>> +
>>> +  avdd10-supply:
>>> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
>>> +
>>> +  dvdd10-supply:
>>> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
>>> +
>>> +  avdd18-supply:
>>> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
>>> +
>>> +  dvdd18-supply:
>>> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
>>> +
>>> +  avdd33-supply:
>>> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
>>> +
>>> +  i2c-supply: true
>>> +  vconn-supply: true
>>
>> There are no such supplies like i2c and vconn on the schematics.
> 
> Which schematics?
> 
> ANX7688 has VCONN1/2_EN GPIOs that control a switching of VCONN power supply
> to resective CCx pins. That's just a switch signal. Power for VCONN needs
> to come from somewhere and the driver needs to enable the regulator at
> the appropriate time only.
> 
> On Pinephone it can't be an always on power supply and needs to be enabled
> only when used due to HW design of the circuit. (default without ANX driver
> initialized would be to shove 5V to both CC pins, which breaks Type-C spec)
> 
> I2C supply is needed for I2C bus to work, apparently. There's nothing
> that says that I2C pull-ups supply has to come from supplies powering the
> chip. I2C I/O is open drain and the device needs to enable a bus supply
> in order to communicate.

No, that's misunderstanding of DT. These are not supplies to anx7688.

Bus supply is not related to anx7688.

> 
> You can say that bus master should be managing the bus supply, but you'd still
> have a problem that each device may be behind a voltage translator, and
> logically, bus master driver should care only about its side of the bus then.
> Both side of level shifter need the pull-up power enabled.

Again, that's nothing related to anx7688. If you want to add it here,
why not to every device everywhere?

> 
> You can also make an argument that bus supply can be always on, but that
> causes several other issues on Pinephone due to shared nature of most
> resources like these. There are other devices on shared power rails, that
> need to be turned off during sleep, etc.
> 

No, do not add non-existing properties on this device as workaround for
other issues.

Please drop these two supplies *and all other which do not exist* on
anx7688.

Best regards,
Krzysztof

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 13:52, Ondřej Jirman wrote:
> On Mon, Apr 08, 2024 at 01:24:03PM GMT, Krzysztof Kozlowski wrote:
>> On 08/04/2024 13:21, Pavel Machek wrote:
>>> Hi!
>>>
> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> but I did best I could.
>
> Signed-off-by: Pavel Machek 

 ...

> +  cabledet-gpios:
> +maxItems: 1
> +description: GPIO controlling CABLE_DET (C3) pin.
> +
> +  avdd10-supply:
> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> +
> +  dvdd10-supply:
> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> +
> +  avdd18-supply:
> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> +
> +  dvdd18-supply:
> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> +
> +  avdd33-supply:
> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> +
> +  i2c-supply: true
> +  vconn-supply: true

 There are no such supplies like i2c and vconn on the schematics.

 I think this represents some other part of component which was added
 here only for convenience.
>>>
>>> Can you give me pointer to documentation you are looking at?
>>
>> The schematics you linked in the document at the beginning. Page 13. Do
>> you see these pins there? I saw only VCONN1_EN, but that's not a supply.
> 
> The supply is U1308.

That's not a supply to anx7688.

Best regards,
Krzysztof

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Ondřej Jirman

On Mon, Apr 08, 2024 at 01:24:03PM GMT, Krzysztof Kozlowski wrote:
> On 08/04/2024 13:21, Pavel Machek wrote:
> > Hi!
> > 
> >>> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> >>> but I did best I could.
> >>>
> >>> Signed-off-by: Pavel Machek 
> >>
> >> ...
> >>
> >>> +  cabledet-gpios:
> >>> +maxItems: 1
> >>> +description: GPIO controlling CABLE_DET (C3) pin.
> >>> +
> >>> +  avdd10-supply:
> >>> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> >>> +
> >>> +  dvdd10-supply:
> >>> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> >>> +
> >>> +  avdd18-supply:
> >>> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> >>> +
> >>> +  dvdd18-supply:
> >>> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> >>> +
> >>> +  avdd33-supply:
> >>> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> >>> +
> >>> +  i2c-supply: true
> >>> +  vconn-supply: true
> >>
> >> There are no such supplies like i2c and vconn on the schematics.
> >>
> >> I think this represents some other part of component which was added
> >> here only for convenience.
> > 
> > Can you give me pointer to documentation you are looking at?
> 
> The schematics you linked in the document at the beginning. Page 13. Do
> you see these pins there? I saw only VCONN1_EN, but that's not a supply.

The supply is U1308.

regards,
o.

> Best regards,
> Krzysztof
>

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Ondřej Jirman

Hi Krzysztof,

On Mon, Apr 08, 2024 at 01:17:32PM GMT, Krzysztof Kozlowski wrote:
> On 08/04/2024 12:51, Pavel Machek wrote:
> > Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> > but I did best I could.
> > 
> > Signed-off-by: Pavel Machek 
> 
> ...
> 
> > +  cabledet-gpios:
> > +maxItems: 1
> > +description: GPIO controlling CABLE_DET (C3) pin.
> > +
> > +  avdd10-supply:
> > +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> > +
> > +  dvdd10-supply:
> > +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> > +
> > +  avdd18-supply:
> > +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> > +
> > +  dvdd18-supply:
> > +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> > +
> > +  avdd33-supply:
> > +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> > +
> > +  i2c-supply: true
> > +  vconn-supply: true
> 
> There are no such supplies like i2c and vconn on the schematics.

Which schematics?

ANX7688 has VCONN1/2_EN GPIOs that control a switching of VCONN power supply
to resective CCx pins. That's just a switch signal. Power for VCONN needs
to come from somewhere and the driver needs to enable the regulator at
the appropriate time only.

On Pinephone it can't be an always on power supply and needs to be enabled
only when used due to HW design of the circuit. (default without ANX driver
initialized would be to shove 5V to both CC pins, which breaks Type-C spec)

I2C supply is needed for I2C bus to work, apparently. There's nothing
that says that I2C pull-ups supply has to come from supplies powering the
chip. I2C I/O is open drain and the device needs to enable a bus supply
in order to communicate.

You can say that bus master should be managing the bus supply, but you'd still
have a problem that each device may be behind a voltage translator, and
logically, bus master driver should care only about its side of the bus then.
Both side of level shifter need the pull-up power enabled.

You can also make an argument that bus supply can be always on, but that
causes several other issues on Pinephone due to shared nature of most
resources like these. There are other devices on shared power rails, that
need to be turned off during sleep, etc.

Kind regards,
o.

> I think this represents some other part of component which was added
> here only for convenience.
> 
> 
> 
> Best regards,
> Krzysztof
>

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-08 Thread Michael Ellerman

Paolo Bonzini  writes:
> The .change_pte() MMU notifier callback was intended as an
> optimization. The original point of it was that KSM could tell KVM to flip
> its secondary PTE to a new location without having to first zap it. At
> the time there was also an .invalidate_page() callback; both of them were
> *not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
> and .invalidate_page() also doubled as a fallback implementation of
> .change_pte().
>
> Later on, however, both callbacks were changed to occur within an
> invalidate_range_start/end() block.
>
> In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
> set_pte_at_notify with invalidate_range_start and invalidate_range_end",
> 2012-10-09) did so to remove the fallback from .invalidate_page() to
> .change_pte() and allow sleepable .invalidate_page() hooks.
>
> This however made KVM's usage of the .change_pte() callback completely
> moot, because KVM unmaps the sPTEs during .invalidate_range_start()
> and therefore .change_pte() has no hope of finding a sPTE to change.
> Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
> well as all the architecture specific implementations.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/arm64/kvm/mmu.c  | 34 -
>  arch/loongarch/include/asm/kvm_host.h |  1 -
>  arch/loongarch/kvm/mmu.c  | 32 
>  arch/mips/kvm/mmu.c   | 30 ---
>  arch/powerpc/include/asm/kvm_ppc.h|  1 -
>  arch/powerpc/kvm/book3s.c |  5 ---
>  arch/powerpc/kvm/book3s.h |  1 -
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
>  arch/powerpc/kvm/book3s_hv.c  |  1 -
>  arch/powerpc/kvm/book3s_pr.c  |  7 
>  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---

LGTM.

Acked-by: Michael Ellerman  (powerpc)

cheers

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 13:21, Pavel Machek wrote:
> Hi!
> 
>>> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
>>> but I did best I could.
>>>
>>> Signed-off-by: Pavel Machek 
>>
>> ...
>>
>>> +  cabledet-gpios:
>>> +maxItems: 1
>>> +description: GPIO controlling CABLE_DET (C3) pin.
>>> +
>>> +  avdd10-supply:
>>> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
>>> +
>>> +  dvdd10-supply:
>>> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
>>> +
>>> +  avdd18-supply:
>>> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
>>> +
>>> +  dvdd18-supply:
>>> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
>>> +
>>> +  avdd33-supply:
>>> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
>>> +
>>> +  i2c-supply: true
>>> +  vconn-supply: true
>>
>> There are no such supplies like i2c and vconn on the schematics.
>>
>> I think this represents some other part of component which was added
>> here only for convenience.
> 
> Can you give me pointer to documentation you are looking at?

The schematics you linked in the document at the beginning. Page 13. Do
you see these pins there? I saw only VCONN1_EN, but that's not a supply.

Best regards,
Krzysztof

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Pavel Machek

Hi!

> > Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> > but I did best I could.
> > 
> > Signed-off-by: Pavel Machek 
> 
> ...
> 
> > +  cabledet-gpios:
> > +maxItems: 1
> > +description: GPIO controlling CABLE_DET (C3) pin.
> > +
> > +  avdd10-supply:
> > +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> > +
> > +  dvdd10-supply:
> > +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> > +
> > +  avdd18-supply:
> > +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> > +
> > +  dvdd18-supply:
> > +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> > +
> > +  avdd33-supply:
> > +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> > +
> > +  i2c-supply: true
> > +  vconn-supply: true
> 
> There are no such supplies like i2c and vconn on the schematics.
> 
> I think this represents some other part of component which was added
> here only for convenience.

Can you give me pointer to documentation you are looking at?

Best regards,
Pavel
-- 
People of Russia, stop Putin before his war on Ukraine escalates.


signature.asc
Description: PGP signature

Re: [PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Krzysztof Kozlowski

On 08/04/2024 12:51, Pavel Machek wrote:
> Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
> but I did best I could.
> 
> Signed-off-by: Pavel Machek 

...

> +  cabledet-gpios:
> +maxItems: 1
> +description: GPIO controlling CABLE_DET (C3) pin.
> +
> +  avdd10-supply:
> +description: 1.0V power supply going to AVDD10 (A4, ...) pins
> +
> +  dvdd10-supply:
> +description: 1.0V power supply going to DVDD10 (D6, ...) pins
> +
> +  avdd18-supply:
> +description: 1.8V power supply going to AVDD18 (E3, ...) pins
> +
> +  dvdd18-supply:
> +description: 1.8V power supply going to DVDD18 (G4, ...) pins
> +
> +  avdd33-supply:
> +description: 3.3V power supply going to AVDD33 (C4, ...) pins
> +
> +  i2c-supply: true
> +  vconn-supply: true

There are no such supplies like i2c and vconn on the schematics.

I think this represents some other part of component which was added
here only for convenience.



Best regards,
Krzysztof

Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory

2024-04-08 Thread Honggyu Kim

Hi SeongJae,

On Fri,  5 Apr 2024 12:28:00 -0700 SeongJae Park  wrote:
> Hello Honggyu,
> 
> On Fri,  5 Apr 2024 15:08:49 +0900 Honggyu Kim  wrote:
> 
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> > 
> > It says there is no implementation of the demote/promote DAMOS action
> > are made.  This RFC is about its implementation for physical address
> > space.
> > 
> > 
> > Changes from RFC v2:
> >   1. Rename DAMOS_{PROMOTE,DEMOTE} actions to DAMOS_MIGRATE_{HOT,COLD}.
> >   2. Create 'target_nid' to set the migration target node instead of
> >  depending on node distance based information.
> >   3. Instead of having page level access check in this patch series,
> >  delegate the job to a new DAMOS filter type YOUNG[2].
> >   4. Introduce vmstat counters "damon_migrate_{hot,cold}".
> >   5. Rebase from v6.7 to v6.8.
> 
> Thank you for patiently keeping discussion and making this great version!  I
> left comments on each patch, but found no special concerns.  Per-page access
> recheck for MIGRATE_HOT and vmstat change are taking my eyes, though.  I doubt
> if those really needed.  It would be nice if you could answer to the comments.

I will answer them where you made the comments.

> Once my comments on this version are addressed, I would have no reason to
> object at dropping the RFC tag from this patchset.

Thanks.  I will drop the RFC after addressing your comments.

> Nonetheless, I show some warnings and errors from checkpatch.pl.  I don't
> really care about those for RFC patches, so no problem at all.  But if you
> agree to my opinion about RFC tag dropping, and therefore if you will send 
> next
> version without RFC tag, please make sure you also run checkpatch.pl before
> posting.

Sure.  I will run checkpatch.pl from the next revision.

Thanks,
Honggyu

> 
> Thanks,
> SJ
> 
> [...]

[PATCHv3 2/2] usb: typec: anx7688: Add driver for ANX7688 USB-C HDMI bridge

2024-04-08 Thread Pavel Machek

From: Ondrej Jirman 

This is driver for ANX7688 USB-C HDMI, with flashing and debugging
features removed. ANX7688 is rather criticial piece on PinePhone,
there's no display and no battery charging without it.

There's likely more work to be done here, but having basic support
in mainline is needed to be able to work on the other stuff
(networking, cameras, power management).

Signed-off-by: Ondrej Jirman 
Co-developed-by: Martijn Braam 
Co-developed-by: Samuel Holland 
Signed-off-by: Pavel Machek 

---

v2: Fix checkpatch stuff. Some cleanups, adapt to dts format in 1/2.
v3: Turn down debugging, as requested during review.

diff --git a/drivers/usb/typec/Kconfig b/drivers/usb/typec/Kconfig
index 2f80c2792dbd..c9043ae61546 100644
--- a/drivers/usb/typec/Kconfig
+++ b/drivers/usb/typec/Kconfig
@@ -64,6 +64,17 @@ config TYPEC_ANX7411
  If you choose to build this driver as a dynamically linked module, the
  module will be called anx7411.ko.
 
+config TYPEC_ANX7688
+   tristate "Analogix ANX7688 Type-C DRP Port controller and mux driver"
+   depends on I2C
+   depends on USB_ROLE_SWITCH
+   help
+ Say Y or M here if your system has Analogix ANX7688 Type-C Bridge
+ controller driver.
+
+ If you choose to build this driver as a dynamically linked module, the
+ module will be called anx7688.ko.
+
 config TYPEC_RT1719
tristate "Richtek RT1719 Sink Only Type-C controller driver"
depends on USB_ROLE_SWITCH || !USB_ROLE_SWITCH
diff --git a/drivers/usb/typec/Makefile b/drivers/usb/typec/Makefile
index 7a368fea61bc..3f8ff94ad294 100644
--- a/drivers/usb/typec/Makefile
+++ b/drivers/usb/typec/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_TYPEC_TCPM)+= tcpm/
 obj-$(CONFIG_TYPEC_UCSI)   += ucsi/
 obj-$(CONFIG_TYPEC_TPS6598X)   += tipd/
 obj-$(CONFIG_TYPEC_ANX7411)+= anx7411.o
+obj-$(CONFIG_TYPEC_ANX7688)+= anx7688.o
 obj-$(CONFIG_TYPEC_HD3SS3220)  += hd3ss3220.o
 obj-$(CONFIG_TYPEC_STUSB160X)  += stusb160x.o
 obj-$(CONFIG_TYPEC_RT1719) += rt1719.o
diff --git a/drivers/usb/typec/anx7688.c b/drivers/usb/typec/anx7688.c
new file mode 100644
index ..407cbd5fedba
--- /dev/null
+++ b/drivers/usb/typec/anx7688.c
@@ -0,0 +1,1830 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ANX7688 USB-C HDMI bridge/PD driver
+ *
+ * Warning, this driver is somewhat PinePhone specific.
+ *
+ * How this works:
+ * - this driver allows to program firmware into ANX7688 EEPROM, and
+ *   initialize it
+ * - it then communicates with the firmware running on the OCM (on-chip
+ *   microcontroller)
+ * - it detects whether there is cable plugged in or not and powers
+ *   up or down the ANX7688 based on that
+ * - when the cable is connected the firmware on the OCM will handle
+ *   the detection of the nature of the device on the other end
+ *   of the USB-C cable
+ * - this driver then communicates with the USB phy to let it swap
+ *   data roles accordingly
+ * - it also enables VBUS and VCONN regulators as appropriate
+ * - USB phy driver (Allwinner) needs to know whether to switch to
+ *   device or host mode, or whether to turn off
+ * - when the firmware detects SRC.1.5A or SRC.3.0A via CC pins
+ *   or something else via PD, it notifies this driver via software
+ *   interrupt and this driver will determine how to update the TypeC
+ *   port status and what input current limit is appropriate
+ * - input current limit determination happens 500ms after cable
+ *   insertion or hard reset (delay is necessary to determine whether
+ *   the remote end is PD capable or not)
+ * - this driver tells to the PMIC driver that the input current limit
+ *   needs to be changed
+ * - this driver also monitors PMIC status and re-sets the input current
+ *   limit if it changes for some reason (due to PMIC internal decision
+ *   making) (this is disabled for now)
+ *
+ * ANX7688 FW behavior as observed:
+ *
+ * - DO NOT SET MORE THAN 1 SINK CAPABILITY! Firmware will ignore what
+ *   you set and send hardcoded PDO_BATT 5-21V 30W message!
+ *
+ * Product brief:
+ * 
https://www.analogix.com/en/system/files/AA-002281-PB-6-ANX7688_Product_Brief_0.pdf
+ * Development notes:
+ * https://xnux.eu/devices/feature/anx7688.html
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* firmware regs */
+
+#define ANX7688_REG_VBUS_OFF_DELAY_TIME 0x22
+#define ANX7688_REG_FEATURE_CTRL0x27
+#define ANX7688_REG_EEPROM_LOAD_STATUS1 0x11
+#define ANX7688_REG_EEPROM_LOAD_STATUS0 0x12
+#define ANX7688_REG_FW_VERSION1 0x15
+#define ANX7688_REG_FW_VERSION0 0x16
+
+#define ANX7688_EEPROM_FW_LOADED   0x01
+
+#define ANX7688_REG_STATUS_INT_MASK 0x17
+#define ANX7688_REG_STATUS_INT  0x28
+#define ANX7688_IRQS_RECEIVED_MSG   BIT(0)
+#define ANX7688_IRQS_RECEIVED_ACK   BIT(1)
+#define

[PATCHv3 1/2] dt-bindings: usb: typec: anx7688: start a binding document

2024-04-08 Thread Pavel Machek

Add binding for anx7688 usb type-c bridge. I don't have a datasheet,
but I did best I could.

Signed-off-by: Pavel Machek 

---

v2: implement review feedback
v3: fix single character pointed by robot

diff --git a/Documentation/devicetree/bindings/usb/analogix,anx7688.yaml 
b/Documentation/devicetree/bindings/usb/analogix,anx7688.yaml
new file mode 100644
index ..48b9ae936cb5
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/analogix,anx7688.yaml
@@ -0,0 +1,127 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/usb/analogix,anx7688.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+# Pin names can be deduced from
+# 
https://files.pine64.org/doc/PinePhone/PinePhone%20v1.2b%20Released%20Schematic.pdf
+
+title: Analogix ANX7688 Type-C controller
+
+maintainers:
+  - Pavel Machek 
+
+properties:
+  compatible:
+enum:
+  - analogix,anx7688
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  reset-gpios:
+maxItems: 1
+description: GPIO controlling RESET_N (B7) pin.
+
+  enable-gpios:
+maxItems: 1
+description: GPIO controlling POWER_EN (D2) pin.
+
+  cabledet-gpios:
+maxItems: 1
+description: GPIO controlling CABLE_DET (C3) pin.
+
+  avdd10-supply:
+description: 1.0V power supply going to AVDD10 (A4, ...) pins
+
+  dvdd10-supply:
+description: 1.0V power supply going to DVDD10 (D6, ...) pins
+
+  avdd18-supply:
+description: 1.8V power supply going to AVDD18 (E3, ...) pins
+
+  dvdd18-supply:
+description: 1.8V power supply going to DVDD18 (G4, ...) pins
+
+  avdd33-supply:
+description: 3.3V power supply going to AVDD33 (C4, ...) pins
+
+  i2c-supply: true
+  vconn-supply: true
+  hdmi-vt-supply: true
+  vbus-supply: true
+  vbus-in-supply: true
+
+  connector:
+type: object
+$ref: /schemas/connector/usb-connector.yaml
+
+description:
+  Properties for usb c connector.
+
+properties:
+  compatible:
+const: usb-c-connector
+
+required:
+  - compatible
+  - reg
+  - connector
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+i2c {
+#address-cells = <1>;
+#size-cells = <0>;
+
+typec@2c {
+compatible = "analogix,anx7688";
+reg = <0x2c>;
+interrupts = <8 IRQ_TYPE_EDGE_FALLING>;
+interrupt-parent = <>;
+
+enable-gpios = < 3 10 GPIO_ACTIVE_LOW>; /* PD10 */
+reset-gpios = < 3 6 GPIO_ACTIVE_HIGH>; /* PD6 */
+cabledet-gpios = <_pio 0 8 GPIO_ACTIVE_HIGH>; /* PL8 */
+
+avdd10-supply = <_anx1v0>;
+dvdd10-supply = <_anx1v0>;
+avdd18-supply = <_ldo_io1>;
+dvdd18-supply = <_ldo_io1>;
+avdd33-supply = <_dcdc1>;
+i2c-supply = <_ldo_io0>;
+vconn-supply = <_vconn5v0>;
+hdmi-vt-supply = <_dldo1>;
+
+vbus-supply = <_usb_5v>;
+vbus-in-supply = <_power_supply>;
+
+typec_con: connector {
+compatible = "usb-c-connector";
+power-role = "dual";
+data-role = "dual";
+try-power-role = "source";
+
+ports {
+#address-cells = <1>;
+#size-cells = <0>;
+port@0 {
+reg = <0>;
+typec_con_ep: endpoint {
+remote-endpoint = <_hs_ep>;
+};
+};
+};
+};
+};
+};
+...

-- 
People of Russia, stop Putin before his war on Ukraine escalates.


signature.asc
Description: PGP signature

[PATCH] tracing: Add new_exec tracepoint

2024-04-08 Thread Marco Elver

Add "new_exec" tracepoint, which is run right after the point of no
return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
runs before flushing the old exec, i.e. while the task still has the
original state (such as original MM), but when the new exec either
succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

  * allowing tracing eBPF programs access to the original MM on exec,
before current->mm is replaced;
  * counting exec in the original task (via perf event);
  * profiling flush time ("new_exec" to "sched_process_exec").

Example of tracing output ("new_exec" and "sched_process_exec"):

  $ cat /sys/kernel/debug/tracing/trace_pipe
  <...>-379 [003] .   179.626921: new_exec: filename=/usr/bin/sshd 
pid=379 comm=sshd
  <...>-379 [003] .   179.629131: sched_process_exec: 
filename=/usr/bin/sshd pid=379 old_pid=379
  <...>-381 [002] .   180.048580: new_exec: filename=/bin/bash 
pid=381 comm=sshd
  <...>-381 [002] .   180.053122: sched_process_exec: 
filename=/bin/bash pid=381 old_pid=381
  <...>-385 [001] .   180.068277: new_exec: filename=/usr/bin/tty 
pid=385 comm=bash
  <...>-385 [001] .   180.069485: sched_process_exec: 
filename=/usr/bin/tty pid=385 old_pid=385
  <...>-389 [006] .   192.020147: new_exec: filename=/usr/bin/dmesg 
pid=389 comm=bash
   bash-389 [006] .   192.021377: sched_process_exec: 
filename=/usr/bin/dmesg pid=389 old_pid=389

Signed-off-by: Marco Elver 
---
 fs/exec.c   |  2 ++
 include/trace/events/task.h | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 38bf71cbdf5e..ab778ae1fc06 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1268,6 +1268,8 @@ int begin_new_exec(struct linux_binprm * bprm)
if (retval)
return retval;
 
+   trace_new_exec(current, bprm);
+
/*
 * Ensure all future errors are fatal.
 */
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 47b527464d1a..8853dc44783d 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -56,6 +56,36 @@ TRACE_EVENT(task_rename,
__entry->newcomm, __entry->oom_score_adj)
 );
 
+/**
+ * new_exec - called before setting up new exec
+ * @task:  pointer to the current task
+ * @bprm:  pointer to linux_binprm used for new exec
+ *
+ * Called before flushing the old exec, but at the point of no return during
+ * switching to the new exec.
+ */
+TRACE_EVENT(new_exec,
+
+   TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
+
+   TP_ARGS(task, bprm),
+
+   TP_STRUCT__entry(
+   __string(   filename,   bprm->filename  )
+   __field(pid_t,  pid )
+   __string(   comm,   task->comm  )
+   ),
+
+   TP_fast_assign(
+   __assign_str(filename, bprm->filename);
+   __entry->pid = task->pid;
+   __assign_str(comm, task->comm);
+   ),
+
+   TP_printk("filename=%s pid=%d comm=%s",
+ __get_str(filename), __entry->pid, __get_str(comm))
+);
+
 #endif
 
 /* This part must be outside protection */
-- 
2.44.0.478.gd926399ef9-goog

[PATCH v2] kprobes: Avoid possible warn in __arm_kprobe_ftrace()

2024-04-08 Thread Zheng Yejian

There is once warn in __arm_kprobe_ftrace() on:

 ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
 if (WARN_ONCE(..., "Failed to arm kprobe-ftrace at %pS (error %d)\n", ...)
   return ret;

This warning is generated because 'p->addr' is detected to be not a valid
ftrace location in ftrace_set_filter_ip(). The ftrace address check is done
by check_ftrace_location() at the beginning of check_kprobe_address_safe().
At that point, ftrace_location(addr) == addr should return true if the
module is loaded. Then the module is searched twice:
  1. in is_module_text_address(), we find that 'p->addr' is in a module;
  2. in __module_text_address(), we find the module;

If the module has just been unloaded before the second search, then
'*probed_mod' is NULL and we would not go to get the module refcount,
then the return value of check_kprobe_address_safe() would be 0, but
actually we need to return -EINVAL.

To fix it, originally we can simply check 'p->addr' is out of text again,
like below. But that would check twice respectively in kernel text and
module text, so finally I reduce them to be once.

  if (!(core_kernel_text((unsigned long) p->addr) ||
  is_module_text_address((unsigned long) p->addr)) || ...) {
ret = -EINVAL;
goto out;
  }
  ...
  *probed_mod = __module_text_address((unsigned long) p->addr);
  if (*probed_mod) {
...
  } else if (!core_kernel_text((unsigned long) p->addr)) { // check again!
ret = -EINVAL;
goto out;
  }

Signed-off-by: Zheng Yejian 
---
 kernel/kprobes.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

v2:
 - Update commit messages and comments as suggested by Masami.
   Link: 
https://lore.kernel.org/all/20240408115038.b0c85767bf1f249eccc32...@kernel.org/

v1:
 - Link: 
https://lore.kernel.org/all/20240407035904.2556645-1-zhengyeji...@huawei.com/

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9d9095e81792..65adc815fc6e 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1567,10 +1567,17 @@ static int check_kprobe_address_safe(struct kprobe *p,
jump_label_lock();
preempt_disable();
 
-   /* Ensure it is not in reserved area nor out of text */
-   if (!(core_kernel_text((unsigned long) p->addr) ||
-   is_module_text_address((unsigned long) p->addr)) ||
-   in_gate_area_no_mm((unsigned long) p->addr) ||
+   /* Ensure the address is in a text area, and find a module if exists. */
+   *probed_mod = NULL;
+   if (!core_kernel_text((unsigned long) p->addr)) {
+   *probed_mod = __module_text_address((unsigned long) p->addr);
+   if (!(*probed_mod)) {
+   ret = -EINVAL;
+   goto out;
+   }
+   }
+   /* Ensure it is not in reserved area. */
+   if (in_gate_area_no_mm((unsigned long) p->addr) ||
within_kprobe_blacklist((unsigned long) p->addr) ||
jump_label_text_reserved(p->addr, p->addr) ||
static_call_text_reserved(p->addr, p->addr) ||
@@ -1580,8 +1587,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
goto out;
}
 
-   /* Check if 'p' is probing a module. */
-   *probed_mod = __module_text_address((unsigned long) p->addr);
+   /* Get module refcount and reject __init functions for loaded modules. 
*/
if (*probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
-- 
2.25.1

[PATCH] [v4] module: don't ignore sysfs_create_link() failures

2024-04-08 Thread Arnd Bergmann

From: Arnd Bergmann 

The sysfs_create_link() return code is marked as __must_check, but the
module_add_driver() function tries hard to not care, by assigning the
return code to a variable. When building with 'make W=1', gcc still
warns because this variable is only assigned but not used:

drivers/base/module.c: In function 'module_add_driver':
drivers/base/module.c:36:6: warning: variable 'no_warn' set but not used 
[-Wunused-but-set-variable]

Rework the code to properly unwind and return the error code to the
caller. My reading of the original code was that it tries to
not fail when the links already exist, so keep ignoring -EEXIST
errors.

Fixes: e17e0f51aeea ("Driver core: show drivers in /sys/module/")
Reviewed-by: Greg Kroah-Hartman 
See-also: 4a7fb6363f2d ("add __must_check to device management code")
Signed-off-by: Arnd Bergmann 
---
Luis, can you merge this through the modules-next tree?

Cc: Luis Chamberlain 
Cc: linux-modu...@vger.kernel.org
Cc: "Rafael J. Wysocki" 
---
v4: minor style changes, based on feedback from Andy Shevchenko
v3: make error handling stricter, add unwinding,
 fix build fail with CONFIG_MODULES=n
v2: rework to actually handle the error. I have not tested the
error handling beyond build testing, so please review carefully.
---
 drivers/base/base.h   |  9 ++---
 drivers/base/bus.c|  9 -
 drivers/base/module.c | 42 +++---
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0738ccad08b2..db4f910e8e36 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -192,11 +192,14 @@ extern struct kset *devices_kset;
 void devices_kset_move_last(struct device *dev);
 
 #if defined(CONFIG_MODULES) && defined(CONFIG_SYSFS)
-void module_add_driver(struct module *mod, struct device_driver *drv);
+int module_add_driver(struct module *mod, struct device_driver *drv);
 void module_remove_driver(struct device_driver *drv);
 #else
-static inline void module_add_driver(struct module *mod,
-struct device_driver *drv) { }
+static inline int module_add_driver(struct module *mod,
+   struct device_driver *drv)
+{
+   return 0;
+}
 static inline void module_remove_driver(struct device_driver *drv) { }
 #endif
 
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index daee55c9b2d9..ffea0728b8b2 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -674,7 +674,12 @@ int bus_add_driver(struct device_driver *drv)
if (error)
goto out_del_list;
}
-   module_add_driver(drv->owner, drv);
+   error = module_add_driver(drv->owner, drv);
+   if (error) {
+   printk(KERN_ERR "%s: failed to create module links for %s\n",
+   __func__, drv->name);
+   goto out_detach;
+   }
 
error = driver_create_file(drv, _attr_uevent);
if (error) {
@@ -699,6 +704,8 @@ int bus_add_driver(struct device_driver *drv)
 
return 0;
 
+out_detach:
+   driver_detach(drv);
 out_del_list:
klist_del(>knode_bus);
 out_unregister:
diff --git a/drivers/base/module.c b/drivers/base/module.c
index 46ad4d636731..a1b55da07127 100644
--- a/drivers/base/module.c
+++ b/drivers/base/module.c
@@ -30,14 +30,14 @@ static void module_create_drivers_dir(struct module_kobject 
*mk)
mutex_unlock(_dir_mutex);
 }
 
-void module_add_driver(struct module *mod, struct device_driver *drv)
+int module_add_driver(struct module *mod, struct device_driver *drv)
 {
char *driver_name;
-   int no_warn;
struct module_kobject *mk = NULL;
+   int ret;
 
if (!drv)
-   return;
+   return 0;
 
if (mod)
mk = >mkobj;
@@ -56,17 +56,37 @@ void module_add_driver(struct module *mod, struct 
device_driver *drv)
}
 
if (!mk)
-   return;
+   return 0;
+
+   ret = sysfs_create_link(>p->kobj, >kobj, "module");
+   if (ret)
+   return ret;
 
-   /* Don't check return codes; these calls are idempotent */
-   no_warn = sysfs_create_link(>p->kobj, >kobj, "module");
driver_name = make_driver_name(drv);
-   if (driver_name) {
-   module_create_drivers_dir(mk);
-   no_warn = sysfs_create_link(mk->drivers_dir, >p->kobj,
-   driver_name);
-   kfree(driver_name);
+   if (!driver_name) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   module_create_drivers_dir(mk);
+   if (!mk->drivers_dir) {
+   ret = -EINVAL;
+   goto out;
}
+
+   ret = sysfs_create_link(mk->drivers_dir, >p->kobj, driver_name);
+   if (ret)
+   goto out;
+
+   kfree(driver_name);
+
+   return 0;
+out:
+   sysfs_remove_link(>p->kobj, "module");
+

Re: [PATCH] [v3] module: don't ignore sysfs_create_link() failures

2024-04-08 Thread Arnd Bergmann

On Tue, Mar 26, 2024, at 16:29, Andy Shevchenko wrote:
> On Tue, Mar 26, 2024 at 03:57:18PM +0100, Arnd Bergmann wrote:
>> From: Arnd Bergmann 
>> 
>> The sysfs_create_link() return code is marked as __must_check, but the
>> module_add_driver() function tries hard to not care, by assigning the
>> return code to a variable. When building with 'make W=1', gcc still
>> warns because this variable is only assigned but not used:
>> 
>> drivers/base/module.c: In function 'module_add_driver':
>> drivers/base/module.c:36:6: warning: variable 'no_warn' set but not used 
>> [-Wunused-but-set-variable]
>> 
>> Rework the code to properly unwind and return the error code to the
>> caller. My reading of the original code was that it tries to
>> not fail when the links already exist, so keep ignoring -EEXIST
>> errors.
>
>> Cc: Luis Chamberlain 
>> Cc: linux-modu...@vger.kernel.org
>> Cc: Greg Kroah-Hartman 
>> Cc: "Rafael J. Wysocki" 
>
> Wondering if you can move these to be after --- to avoid polluting commit
> message. This will have the same effect and be archived on lore. But on
> pros side it will unload the commit message(s) from unneeded noise.

Done

>
>> +error = module_add_driver(drv->owner, drv);
>> +if (error) {
>> +printk(KERN_ERR "%s: failed to create module links for %s\n",
>> +__func__, drv->name);
>
> What's wrong with pr_err()? Even if it's not a style used, in a new pieces of
> code this can be improved beforehand. So, we will reduce a technical debt, and
> not adding to it.

I think that would be more confusing, and would rather keep the
style consistent. There is no practical difference here.

>> +int module_add_driver(struct module *mod, struct device_driver *drv)
>>  {
>>  char *driver_name;
>> -int no_warn;
>> +int ret;
>
> I would move it...
>
>>  struct module_kobject *mk = NULL;
>
> ...to be here.

Done

 Arnd

Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio

2024-04-08 Thread Michael S. Tsirkin

On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
> From: Hou Tao 
> 
> Hi,
> 
> The patch set aims to fix the warning related to an abnormal size
> parameter of kmalloc() in virtiofs. The warning occurred when attempting
> to insert a 10MB sized kernel module kept in a virtiofs with cache
> disabled. As analyzed in patch #1, the root cause is that the length of
> the read buffer is no limited, and the read buffer is passed directly to
> virtiofs through out_args[0].value. Therefore patch #1 limits the
> length of the read buffer passed to virtiofs by using max_pages. However
> it is not enough, because now the maximal value of max_pages is 256.
> Consequently, when reading a 10MB-sized kernel module, the length of the
> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
> try to allocate 2MB from memory subsystem. The request for 2MB of
> physically contiguous memory significantly stress the memory subsystem
> and may fail indefinitely on hosts with fragmented memory. To address
> this, patch #2~#5 use scattered pages in a bio_vec to replace the
> kmalloc-allocated bounce buffer when the length of the bounce buffer for
> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
> allocation of the bounce buffer and sg array in virtiofs is that
> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
> Therefore the last patch uses GFP_NOFS for the allocation of both sg
> array and bounce buffer when initiated by the kworker. For more details,
> please check the individual patches.
> 
> As usual, comments are always welcome.
> 
> Change Log:

Bernd should I just merge the patchset as is?
It seems to fix a real problem and no one has the
time to work on a better fix  WDYT?


> v2:
>   * limit the length of ITER_KVEC dio by max_pages instead of the
> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
> dio being consistent with other rw operations.
>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
> backed by scattered pages when the length of the bounce buffer for
> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
> fragmented memory, the KVEC_ITER dio can be handled normally by
> virtiofs. (Bernd Schubert)
>   * merge the GFP_NOFS patch [1] into this patch-set and use
> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
> (Benjamin Coddington)
> 
> v1: 
> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
> 
> [1]: 
> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/
> 
> Hou Tao (6):
>   fuse: limit the length of ITER_KVEC dio by max_pages
>   virtiofs: move alloc/free of argbuf into separated helpers
>   virtiofs: factor out more common methods for argbuf
>   virtiofs: support bounce buffer backed by scattered pages
>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
>   virtiofs: use GFP_NOFS when enqueuing request through kworker
> 
>  fs/fuse/file.c  |  12 +-
>  fs/fuse/virtio_fs.c | 336 +---
>  2 files changed, 296 insertions(+), 52 deletions(-)
> 
> -- 
> 2.29.2

Re: [PATCH v3] Documentation: Add reconnect process for VDUSE

2024-04-08 Thread Michael S. Tsirkin

On Thu, Apr 04, 2024 at 01:56:31PM +0800, Cindy Lu wrote:
> Add a document explaining the reconnect process, including what the
> Userspace App needs to do and how it works with the kernel.
> 
> Signed-off-by: Cindy Lu 
> ---
>  Documentation/userspace-api/vduse.rst | 41 +++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/Documentation/userspace-api/vduse.rst 
> b/Documentation/userspace-api/vduse.rst
> index bdb880e01132..7faa83462e78 100644
> --- a/Documentation/userspace-api/vduse.rst
> +++ b/Documentation/userspace-api/vduse.rst
> @@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
> after the used ring is filled.
>  
>  For more details on the uAPI, please see include/uapi/linux/vduse.h.
> +
> +HOW VDUSE devices reconnection works
> +
> +1. What is reconnection?
> +
> +   When the userspace application loads, it should establish a connection
> +   to the vduse kernel device. Sometimes,the userspace application exists,
> +   and we want to support its restart and connect to the kernel device again
> +
> +2. How can I support reconnection in a userspace application?
> +
> +2.1 During initialization, the userspace application should first verify the
> +existence of the device "/dev/vduse/vduse_name".
> +If it doesn't exist, it means this is the first-time for connection. 
> goto step 2.2
> +If it exists, it means this is a reconnection, and we should goto step 
> 2.3
> +
> +2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> +/dev/vduse/control.
> +When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
> +the reconnect information. The total memory size is PAGE_SIZE*vq_mumber.

Confused. Where is that allocation, in code?

Thanks!

> +2.3 Check if the information is suitable for reconnect
> +If this is reconnection :
> +Before attempting to reconnect, The userspace application needs to use 
> the
> +ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
> VDUSE_DEV_GET_FEATURES...)
> +to get the information from kernel.
> +Please review the information and confirm if it is suitable to reconnect.
> +
> +2.4 Userspace application needs to mmap the memory to userspace
> +The userspace application requires mapping one page for every vq. These 
> pages
> +should be used to save vq-related information during system running. 
> Additionally,
> +the application must define its own structure to store information for 
> reconnection.
> +
> +2.5 Completed the initialization and running the application.
> +While the application is running, it is important to store relevant 
> information
> +about reconnections in mapped pages. When calling the ioctl 
> VDUSE_VQ_GET_INFO to
> +get vq information, it's necessary to check whether it's a reconnection. 
> If it is
> +a reconnection, the vq-related information must be get from the mapped 
> pages.
> +
> +2.6 When the Userspace application exits, it is necessary to unmap all the
> +pages for reconnection
> -- 
> 2.43.0

Re: [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at()

2024-04-08 Thread David Hildenbrand


On 05.04.24 13:58, Paolo Bonzini wrote:

With the demise of the .change_pte() MMU notifier callback, there is no
notification happening in set_pte_at_notify().  It is a synonym of
set_pte_at() and can be replaced with it.

Signed-off-by: Paolo Bonzini 
---


A real joy seeing that gone

Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH 3/4] mmu_notifier: remove the .change_pte() callback

2024-04-08 Thread David Hildenbrand


On 05.04.24 13:58, Paolo Bonzini wrote:

The scope of set_pte_at_notify() has reduced more and more through the
years.  Initially, it was meant for when the change to the PTE was
not bracketed by mmu_notifier_invalidate_range_{start,end}().  However,
that has not been so for over ten years.  During all this period
the only implementation of .change_pte() was KVM and it
had no actual functionality, because it was called after
mmu_notifier_invalidate_range_start() zapped the secondary PTE.

Now that this (nonfunctional) user of the .change_pte() callback is
gone, the whole callback can be removed.  For now, leave in place
set_pte_at_notify() even though it is just a synonym for set_pte_at().

Signed-off-by: Paolo Bonzini 
---


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH v3 3/3] vhost: Improve vhost_get_avail_idx() with smp_rmb()

2024-04-08 Thread Michael S. Tsirkin

On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
> Hi Michael,
> 
> On 3/30/24 19:02, Gavin Shan wrote:
> > On 3/28/24 19:31, Michael S. Tsirkin wrote:
> > > On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> > > > All the callers of vhost_get_avail_idx() are concerned to the memory
> > > > barrier, imposed by smp_rmb() to ensure the order of the available
> > > > ring entry read and avail_idx read.
> > > > 
> > > > Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> > > > the avail_idx is advanced. With it, the callers needn't to worry
> > > > about the memory barrier.
> > > > 
> > > > Suggested-by: Michael S. Tsirkin 
> > > > Signed-off-by: Gavin Shan 
> > > 
> > > Previous patches are ok. This one I feel needs more work -
> > > first more code such as sanity checking should go into
> > > this function, second there's actually a difference
> > > between comparing to last_avail_idx and just comparing
> > > to the previous value of avail_idx.
> > > I will pick patches 1-2 and post a cleanup on top so you can
> > > take a look, ok?
> > > 
> > 
> > Thanks, Michael. It's fine to me.
> > 
> 
> A kindly ping.
> 
> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
> platforms.
> 
> For PATCH[3], I also can help with the improvement if you don't have time
> for it. Please let me know.
> 
> Thanks,
> Gavin

The thing to do is basically diff with the patch I wrote :)
We can also do a bit more cleanups on top of *that*, like unifying
error handling.

-- 
MST

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-08 Thread maobibo





On 2024/4/5 下午7:58, Paolo Bonzini wrote:

The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini 
---
  arch/arm64/kvm/mmu.c  | 34 -
  arch/loongarch/include/asm/kvm_host.h |  1 -
  arch/loongarch/kvm/mmu.c  | 32 
  arch/mips/kvm/mmu.c   | 30 ---
  arch/powerpc/include/asm/kvm_ppc.h|  1 -
  arch/powerpc/kvm/book3s.c |  5 ---
  arch/powerpc/kvm/book3s.h |  1 -
  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
  arch/powerpc/kvm/book3s_hv.c  |  1 -
  arch/powerpc/kvm/book3s_pr.c  |  7 
  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
  arch/riscv/kvm/mmu.c  | 20 --
  arch/x86/kvm/mmu/mmu.c| 54 +--
  arch/x86/kvm/mmu/spte.c   | 16 
  arch/x86/kvm/mmu/spte.h   |  2 -
  arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
  arch/x86/kvm/mmu/tdp_mmu.h|  1 -
  include/linux/kvm_host.h  |  2 -
  include/trace/events/kvm.h| 15 
  virt/kvm/kvm_main.c   | 43 -
  20 files changed, 2 insertions(+), 327 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index dc04bc767865..ff17849be9f4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
return false;
  }
  
-bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)

-{
-   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
-
-   if (!kvm->arch.mmu.pgt)
-   return false;
-
-   WARN_ON(range->end - range->start != 1);
-
-   /*
-* If the page isn't tagged, defer to user_mem_abort() for sanitising
-* the MTE tags. The S2 pte should have been unmapped by
-* mmu_notifier_invalidate_range_end().
-*/
-   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
-   return false;
-
-   /*
-* We've moved a page around, probably through CoW, so let's treat
-* it just like a translation fault and the map handler will clean
-* the cache to the PoC.
-*
-* The MMU notifiers will have unmapped a huge PMD before calling
-* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
-* therefore we never need to clear out a huge PMD through this
-* calling path and a memcache is not required.
-*/
-   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
-  PAGE_SIZE, __pfn_to_phys(pfn),
-  KVM_PGTABLE_PROT_R, NULL, 0);
-
-   return false;
-}
-
  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
  {
u64 size = (range->end - range->start) << PAGE_SHIFT;
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..69305441f40d 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -203,7 +203,6 @@ void kvm_flush_tlb_all(void);
  void kvm_flush_tlb_gpa(struct kvm_vcpu *vcpu, unsigned long gpa);
  int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long badv, bool 
write);
  
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);

  int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end, bool blockable);
  int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index a556cff35740..98883aa23ab8 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -494,38 +494,6 @@ bool kvm_unmap_gfn_range(struct

Re: [PATCH v3 3/3] vhost: Improve vhost_get_avail_idx() with smp_rmb()

2024-04-08 Thread Michael S. Tsirkin

On Mon, Apr 08, 2024 at 02:15:24PM +1000, Gavin Shan wrote:
> Hi Michael,
> 
> On 3/30/24 19:02, Gavin Shan wrote:
> > On 3/28/24 19:31, Michael S. Tsirkin wrote:
> > > On Thu, Mar 28, 2024 at 10:21:49AM +1000, Gavin Shan wrote:
> > > > All the callers of vhost_get_avail_idx() are concerned to the memory
> > > > barrier, imposed by smp_rmb() to ensure the order of the available
> > > > ring entry read and avail_idx read.
> > > > 
> > > > Improve vhost_get_avail_idx() so that smp_rmb() is executed when
> > > > the avail_idx is advanced. With it, the callers needn't to worry
> > > > about the memory barrier.
> > > > 
> > > > Suggested-by: Michael S. Tsirkin 
> > > > Signed-off-by: Gavin Shan 
> > > 
> > > Previous patches are ok. This one I feel needs more work -
> > > first more code such as sanity checking should go into
> > > this function, second there's actually a difference
> > > between comparing to last_avail_idx and just comparing
> > > to the previous value of avail_idx.
> > > I will pick patches 1-2 and post a cleanup on top so you can
> > > take a look, ok?
> > > 
> > 
> > Thanks, Michael. It's fine to me.
> > 
> 
> A kindly ping.
> 
> If it's ok to you, could you please merge PATCH[1-2]? Our downstream
> 9.4 need the fixes, especially for NVidia's grace-hopper and grace-grace
> platforms.

Yes - in the next rc hopefully.

> For PATCH[3], I also can help with the improvement if you don't have time
> for it. Please let me know.
> 
> Thanks,
> Gavin


That would be great.

-- 
MST

Re: [PATCH net-next v5] virtio_net: Support RX hash XDP hint

2024-04-08 Thread Jason Wang

On Mon, Apr 1, 2024 at 11:38 AM Liang Chen  wrote:
>
> On Thu, Feb 29, 2024 at 4:37 PM Liang Chen  wrote:
> >
> > On Tue, Feb 27, 2024 at 4:42 AM John Fastabend  
> > wrote:
> > >
> > > Jason Wang wrote:
> > > > On Fri, Feb 23, 2024 at 9:42 AM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Fri, 09 Feb 2024 13:57:25 +0100, Paolo Abeni  
> > > > > wrote:
> > > > > > On Fri, 2024-02-09 at 18:39 +0800, Liang Chen wrote:
> > > > > > > On Wed, Feb 7, 2024 at 10:27 PM Paolo Abeni  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, 2024-02-07 at 10:54 +0800, Liang Chen wrote:
> > > > > > > > > On Tue, Feb 6, 2024 at 6:44 PM Paolo Abeni 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Sat, 2024-02-03 at 10:56 +0800, Liang Chen wrote:
> > > > > > > > > > > On Sat, Feb 3, 2024 at 12:20 AM Jesper Dangaard Brouer 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On 02/02/2024 13.11, Liang Chen wrote:
> > > > > > > > > > [...]
> > > > > > > > > > > > > @@ -1033,6 +1039,16 @@ static void 
> > > > > > > > > > > > > put_xdp_frags(struct xdp_buff *xdp)
> > > > > > > > > > > > >   }
> > > > > > > > > > > > >   }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +static void virtnet_xdp_save_rx_hash(struct 
> > > > > > > > > > > > > virtnet_xdp_buff *virtnet_xdp,
> > > > > > > > > > > > > +  struct net_device 
> > > > > > > > > > > > > *dev,
> > > > > > > > > > > > > +  struct 
> > > > > > > > > > > > > virtio_net_hdr_v1_hash *hdr_hash)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (dev->features & NETIF_F_RXHASH) {
> > > > > > > > > > > > > + virtnet_xdp->hash_value = 
> > > > > > > > > > > > > hdr_hash->hash_value;
> > > > > > > > > > > > > + virtnet_xdp->hash_report = 
> > > > > > > > > > > > > hdr_hash->hash_report;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > > > > > Would it be possible to store a pointer to hdr_hash in 
> > > > > > > > > > > > virtnet_xdp_buff,
> > > > > > > > > > > > with the purpose of delaying extracting this, until and 
> > > > > > > > > > > > only if XDP
> > > > > > > > > > > > bpf_prog calls the kfunc?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > That seems to be the way v1 works,
> > > > > > > > > > > https://lore.kernel.org/all/20240122102256.261374-1-liangchen.li...@gmail.com/
> > > > > > > > > > > . But it was pointed out that the inline header may be 
> > > > > > > > > > > overwritten by
> > > > > > > > > > > the xdp prog, so the hash is copied out to maintain its 
> > > > > > > > > > > integrity.
> > > > > > > > > >
> > > > > > > > > > Why? isn't XDP supposed to get write access only to the pkt
> > > > > > > > > > contents/buffer?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Normally, an XDP program accesses only the packet data. 
> > > > > > > > > However,
> > > > > > > > > there's also an XDP RX Metadata area, referenced by the 
> > > > > > > > > data_meta
> > > > > > > > > pointer. This pointer can be adjusted with 
> > > > > > > > > bpf_xdp_adjust_meta to
> > > > > > > > > point somewhere ahead of the data buffer, thereby granting 
> > > > > > > > > the XDP
> > > > > > > > > program access to the virtio header located immediately 
> > > > > > > > > before the
> > > > > > > >
> > > > > > > > AFAICS bpf_xdp_adjust_meta() does not allow moving the 
> > > > > > > > meta_data before
> > > > > > > > xdp->data_hard_start:
> > > > > > > >
> > > > > > > > https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4210
> > > > > > > >
> > > > > > > > and virtio net set such field after the virtio_net_hdr:
> > > > > > > >
> > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/net/virtio_net.c#L1218
> > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/net/virtio_net.c#L1420
> > > > > > > >
> > > > > > > > I don't see how the virtio hdr could be touched? Possibly even 
> > > > > > > > more
> > > > > > > > important: if such thing is possible, I think is should be 
> > > > > > > > somewhat
> > > > > > > > denied (for the same reason an H/W nic should prevent XDP from
> > > > > > > > modifying its own buffer descriptor).
> > > > > > >
> > > > > > > Thank you for highlighting this concern. The header layout differs
> > > > > > > slightly between small and mergeable mode. Taking 'mergeable 
> > > > > > > mode' as
> > > > > > > an example, after calling xdp_prepare_buff the layout of xdp_buff
> > > > > > > would be as depicted in the diagram below,
> > > > > > >
> > > > > > >   buf
> > > > > > >|
> > > > > > >v
> > > > > > > +--+--+-+
> > > > > > > | xdp headroom | virtio header| packet  |
> > > > > > > | (256 bytes)  | (20 bytes)   | content |
> > > > > > >

Re: Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-04-08 Thread Jason Wang

On Wed, Apr 3, 2024 at 10:47 AM tab  wrote:
>
> > >
> > > On Fri, Mar 29, 2024 at 11:55:50AM +0800, Jason Wang wrote:
> > > > On Wed, Mar 27, 2024 at 5:08 PM Jason Wang  wrote:
> > > > >
> > > > > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > > > > From: Rong Wang 
> > > > > > >
> > > > > > > Once enable iommu domain for one device, the MSI
> > > > > > > translation tables have to be there for software-managed MSI.
> > > > > > > Otherwise, platform with software-managed MSI without an
> > > > > > > irq bypass function, can not get a correct memory write event
> > > > > > > from pcie, will not get irqs.
> > > > > > > The solution is to obtain the MSI phy base address from
> > > > > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > > > > then translation tables will be created while request irq.
> > > > > > >
> > > > > > > Change log
> > > > > > > --
> > > > > > >
> > > > > > > v1->v2:
> > > > > > > - add resv iotlb to avoid overlap mapping.
> > > > > > > v2->v3:
> > > > > > > - there is no need to export the iommu symbol anymore.
> > > > > > >
> > > > > > > Signed-off-by: Rong Wang 
> > > > > >
> > > > > > There's in interest to keep extending vhost iotlb -
> > > > > > we should just switch over to iommufd which supports
> > > > > > this already.
> > > > >
> > > > > IOMMUFD is good but VFIO supports this before IOMMUFD. This patch
> > > > > makes vDPA run without a backporting of full IOMMUFD in the production
> > > > > environment. I think it's worth.
> > > > >
> > > > > If you worry about the extension, we can just use the vhost iotlb
> > > > > existing facility to do this.
> > > > >
> > > > > Thanks
> > > >
> > > > Btw, Wang Rong,
> > > >
> > > > It looks that Cindy does have the bandwidth in working for IOMMUFD 
> > > > support.
> > >
> > > I think you mean she does not.
> >
> > Yes, you are right.
> >
> > Thanks
>
> I need to discuss internally, and there may be someone else will do that.
>
> Thanks.

Ok, please let us know if you have a conclusion.

Thanks

>
> >
> > >
> > > > Do you have the will to do that?
> > > >
> > > > Thanks
> > >
>
>
>
>
> --
> 发自我的网易邮箱平板适配版
> 
>
>
> - Original Message -
> From: "Jason Wang" 
> To: "Michael S. Tsirkin" 
> Cc: "Wang Rong" , k...@vger.kernel.org, 
> virtualizat...@lists.linux.dev, net...@vger.kernel.org, 
> linux-kernel@vger.kernel.org, "Cindy Lu" 
> Sent: Fri, 29 Mar 2024 18:39:54 +0800
> Subject: Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for 
> software-managed MSI
>
> On Fri, Mar 29, 2024 at 5:13 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Mar 29, 2024 at 11:55:50AM +0800, Jason Wang wrote:
> > > On Wed, Mar 27, 2024 at 5:08 PM Jason Wang  wrote:
> > > >
> > > > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > > > From: Rong Wang 
> > > > > >
> > > > > > Once enable iommu domain for one device, the MSI
> > > > > > translation tables have to be there for software-managed MSI.
> > > > > > Otherwise, platform with software-managed MSI without an
> > > > > > irq bypass function, can not get a correct memory write event
> > > > > > from pcie, will not get irqs.
> > > > > > The solution is to obtain the MSI phy base address from
> > > > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > > > then translation tables will be created while request irq.
> > > > > >
> > > > > > Change log
> > > > > > --
> > > > > >
> > > > > > v1->v2:
> > > > > > - add resv iotlb to avoid overlap mapping.
> > > > > > v2->v3:
> > > > > > - there is no need to export the iommu symbol anymore.
> > > > > >
> > > > > > Signed-off-by: Rong Wang 
> > > > >
> > > > > There's in interest to keep extending vhost iotlb -
> > > > > we should just switch over to iommufd which supports
> > > > > this already.
> > > >
> > > > IOMMUFD is good but VFIO supports this before IOMMUFD. This patch
> > > > makes vDPA run without a backporting of full IOMMUFD in the production
> > > > environment. I think it's worth.
> > > >
> > > > If you worry about the extension, we can just use the vhost iotlb
> > > > existing facility to do this.
> > > >
> > > > Thanks
> > >
> > > Btw, Wang Rong,
> > >
> > > It looks that Cindy does have the bandwidth in working for IOMMUFD 
> > > support.
> >
> > I think you mean she does not.
>
> Yes, you are right.
>
> Thanks
>
> >
> > > Do you have the will to do that?
> > >
> > > Thanks
> >

Re: [PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range()

2024-04-08 Thread Philippe Mathieu-Daudé


On 5/4/24 13:58, Paolo Bonzini wrote:

The only user was kvm_mmu_notifier_change_pte(), which is now gone.

Signed-off-by: Paolo Bonzini 
---
  virt/kvm/kvm_main.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at()

2024-04-08 Thread Philippe Mathieu-Daudé


On 5/4/24 13:58, Paolo Bonzini wrote:

With the demise of the .change_pte() MMU notifier callback, there is no
notification happening in set_pte_at_notify().  It is a synonym of
set_pte_at() and can be replaced with it.

Signed-off-by: Paolo Bonzini 
---
  include/linux/mmu_notifier.h | 2 --
  kernel/events/uprobes.c  | 5 ++---
  mm/ksm.c | 4 ++--
  mm/memory.c  | 7 +--
  mm/migrate_device.c  | 8 ++--
  5 files changed, 7 insertions(+), 19 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 3/3] dt-bindings: remoteproc: qcom,sdm845-adsp-pil: Fix qcom,halt-regs definition

2024-04-08 Thread Krzysztof Kozlowski

On 07/04/2024 11:58, Luca Weiss wrote:
> Set the 'items' correctly for the qcom,halt-regs property and update the
> description to match what it should be.
> 
> Signed-off-by: Luca Weiss 
> ---
>  .../devicetree/bindings/remoteproc/qcom,sdm845-adsp-pil.yaml| 6 
> +-
>  1 file changed, 5 insertions(+), 1 deletion(-)


Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

86 matches

Mail list logo