Re: [PATCH v5] vp_vdpa: don't allocate unused msix vectors

2024-04-25 Thread Gavin Liu
Hi mst.

> > > >I just have a question on this part. How come hardware sends 
> > > >interrupts does
> > not guest driver disable them?
> > >
> > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU 
> > > sets
> > the call fd to -1
> > >2:On the host side, the vhost_vdpa program will set
> > vp_vdpa->vring[i].cb.callback to invalid
> > >3:Before the modification, the vp_vdpa_request_irq function 
> > > does not
> > check whether
> > >   vp_vdpa->vring[i].cb.callback is valid. Instead, it enables 
> > > the
> > hardware's MSIX
> > > interrupts based on the number of queues of the device
> > >
> >
> > So MSIX is enabled but why would it trigger? virtio PMD in poll mode 
> > presumably suppresses interrupts after all.
>> Virtio pmd is in the guest,but in host side,the msix is enabled,then 
> >the device will triger Interrupt normally. I analysed this bug before,and I 
> >think gavin is right.
> >Did I make it clear?

>Not really. Guest disables interrupts presumably (it's polling) why does 
>device still send them?

The testing model is as follows:
testpmd   testpmd---
  ^^ 
  ||  
  ||
  ||
  vv
  vfio pci---   ---vfio pci---
 pci device -----pci device--
 guest os -   guest os --
   
   
---virtio device-----vfio device--
--qemu-----qemu---
^  ^
|  |
|  |
|  |
v  v
vhost_vdpa   vfio pci  
host os--   
--hw-
  VF1  VF2

1:The guest os uses PMD mode, so the guest os won't receive interrupts. We can 
reach a consensus on this point.

2:Note that the MSIX interrupts mentioned here refer to the interrupts received 
by PCI devices on the host from the hardware.

3:In the guest OS, Virtio devices use PMD mode. The host does not need to 
enable the MSIX capability of the PCI device, 
   nor does it need to register interrupt callbacks. Do you agree with this?

4:  Note that the patch is proposed to ensure that PCI devices on the host are 
not disturbed by interrupts.



- Forwarded Message -
From: Michael S. Tsirkin m...@redhat.com
Sent: April 26, 2024 6:10 AM
To: Angus Chen angus.c...@jaguarmicro.com
Cc: Gavin Liu gavin@jaguarmicro.com; jasow...@redhat.com; 
virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; 
linux-kernel@vger.kernel.org; Heng Qi hen...@linux.alibaba.com
Subject: Re: Reply: [PATCH v5] vp_vdpa: don't allocate unused msix vectors


External Mail: This email originated from OUTSIDE of the organization!
Do not click links, open attachments or provide ANY information unless you 
recognize the sender and know the content is safe.


On Tue, Apr 23, 2024 at 08:42:57AM +, Angus Chen wrote:
> Hi mst.
>
> > -Original Message-
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, April 23, 2024 4:35 PM
> > To: Gavin Liu 
> > Cc: jasow...@redhat.com; Angus Chen ; 
> > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; 
> > linux-kernel@vger.kernel.org; Heng Qi 
> > Subject: Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix 
> > vectors
> >
> > On Tue, Apr 23, 2024 at 01:39:17AM +, Gavin Liu wrote:
> > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote:
> > > > From: Yuxue Liu 
> > > >
> > > > When there is a ctlq and it doesn't require interrupt 
> > > > callbacks,the original method of calculating vectors wastes 
> > > > hardware msi or msix resources as well as system IRQ resources.
> > > >
> > > > When conducting performance testing using testpmd in the guest 
> > > > os, it was found that the performance was lower compared to 
> > > > directly using vfio-pci to passthrough the device
> > > >
> > > > In scenarios where the virtio device in the guest os does not 
> > > > utilize interrupts, the vdpa driver still configures the 
> > > > hardware's msix vector. Therefore, the hardware still sends interrupts 
> > > > to the host os.
> > >
> > > >I just have a question on this part. How come hardware sends 
> > > >interrupts does
> > not guest driver disable them?
> > >
> > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU 
> > > sets
> > the call fd to -1
> > >2:On the host side, the vhost_vdpa program will set
> > vp_vdpa->vring[i].cb.callback to invalid
> > >3:Before the modification, the vp_vdpa_request_irq 

Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-25 Thread Guillaume Morin
On 25 Apr 21:56, David Hildenbrand wrote:
>
> On 25.04.24 17:19, Guillaume Morin wrote:
> > On 24 Apr 23:00, David Hildenbrand wrote:
> > > > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
> > > > hugetlb mappings. However this was also on my TODO and I have a draft
> > > > patch that implements it.
> > > 
> > > Yes, I documented it back then and added sanity checks in GUP code to 
> > > fence
> > > it off. Shouldn't be too hard to implement (famous last words) and would 
> > > be
> > > the cleaner thing to use here once I manage to switch over to
> > > FOLL_WRITE|FOLL_FORCE to break COW.
> > 
> > Yes, my patch seems to be working. The hugetlb code is pretty simple.
> > And it allows ptrace and the proc pid mem file to work on the executable
> > private hugetlb mappings.
> > 
> > There is one thing I am unclear about though. hugetlb enforces that
> > huge_pte_write() is true on FOLL_WRITE in both the fault and
> > follow_page_mask paths. I am not sure if we can simply assume in the
> > hugetlb code that if the pte is not writable and this is a write fault
> > then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
> > checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?
> > 
> > The latter is more complicated in the fault path because there is no
> > FAULT_FLAG_FORCE flag.
> > 
> 
> I just pushed something to
>   https://github.com/davidhildenbrand/linux/tree/uprobes_cow
> 
> Only very lightly tested so far. Expect the worst :)


I'll try it out and send you the hugetlb bits

> 
> I still detest having the zapping logic there, but to get it all right I
> don't see a clean way around that.
> 
> 
> For hugetlb, we'd primarily have to implement the
> mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE).

For FOLL_FORCE, heer is my draft. Let me know if this is what you had in
mind. 


diff --git a/mm/gup.c b/mm/gup.c
index 1611e73b1121..ac60e0ae64e8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
-   /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
-   if (is_vm_hugetlb_page(vma))
-   return -EFAULT;
/*
 * We used to let the write,force case do COW in a
 * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3548eae42cf9..73f86eddf888 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
   struct folio *pagecache_folio, spinlock_t *ptl,
   struct vm_fault *vmf)
 {
-   const bool unshare = flags & FAULT_FLAG_UNSHARE;
+   const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) &&
+   (vma->vm_flags & VM_WRITE);
pte_t pte = huge_ptep_get(ptep);
struct hstate *h = hstate_vma(vma);
struct folio *old_folio;
@@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, 
struct vm_area_struct *vma,
 * can trigger this, because hugetlb_fault() will always resolve
 * uffd-wp bit first.
 */
-   if (!unshare && huge_pte_uffd_wp(pte))
+   if (make_writable && huge_pte_uffd_wp(pte))
return 0;
 
-   /*
-* hugetlb does not support FOLL_FORCE-style write faults that keep the
-* PTE mapped R/O such as maybe_mkwrite() would do.
-*/
-   if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
-   return VM_FAULT_SIGSEGV;
-
/* Let's take out MAP_SHARED mappings first. */
if (vma->vm_flags & VM_MAYSHARE) {
set_huge_ptep_writable(vma, haddr, ptep);
@@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
folio_move_anon_rmap(old_folio, vma);
SetPageAnonExclusive(_folio->page);
}
-   if (likely(!unshare))
+   if (likely(make_writable))
set_huge_ptep_writable(vma, haddr, ptep);
 
delayacct_wpcopy_end();
@@ -6094,7 +6088,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
spin_lock(ptl);
ptep = hugetlb_walk(vma, haddr, huge_page_size(h));
if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
-   pte_t newpte = make_huge_pte(vma, _folio->page, !unshare);
+   pte_t newpte = make_huge_pte(vma, _folio->page,
+make_writable);
 
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, 

Re: [PATCH 3/4] vhost: Improve vhost_get_avail_head()

2024-04-25 Thread Gavin Shan

On 4/26/24 06:42, kernel test robot wrote:> kernel test robot noticed the 
following build warnings:


[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.9-rc5 next-20240424]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Gavin-Shan/vhost-Drop-variable-last_avail_idx-in-vhost_get_vq_desc/20240423-112803
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20240423032407.262329-4-gshan%40redhat.com
patch subject: [PATCH 3/4] vhost: Improve vhost_get_avail_head()
config: i386-randconfig-141-20240426 
(https://download.01.org/0day-ci/archive/20240426/202404260448.g7f06v7m-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404260448.g7f06v7m-...@intel.com/

smatch warnings:
drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: unsigned 'head' is never 
less than zero.
drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: error code type promoted 
to positive: 'head'

vim +/head +2614 drivers/vhost/vhost.c

   2581 
   2582 /* This looks in the virtqueue and for the first available buffer, and 
converts
   2583  * it to an iovec for convenient access.  Since descriptors consist of 
some
   2584  * number of output then some number of input descriptors, it's 
actually two
   2585  * iovecs, but we pack them into one and note how many of each there 
were.
   2586  *
   2587  * This function returns the descriptor number found, or vq->num (which 
is
   2588  * never a valid descriptor number) if none was found.  A negative code 
is
   2589  * returned on error. */
   2590 int vhost_get_vq_desc(struct vhost_virtqueue *vq,
   2591   struct iovec iov[], unsigned int iov_size,
   2592   unsigned int *out_num, unsigned int *in_num,
   2593   struct vhost_log *log, unsigned int *log_num)
   2594 {
   2595 struct vring_desc desc;
   2596 unsigned int i, head, found = 0;
   2597 int ret, access;
   2598 
   2599 if (vq->avail_idx == vq->last_avail_idx) {
   2600 ret = vhost_get_avail_idx(vq);
   2601 if (unlikely(ret))
   2602 return ret;
   2603 
   2604 /* If there's nothing new since last we looked, return
   2605  * invalid.
   2606  */
   2607 if (vq->avail_idx == vq->last_avail_idx)
   2608 return vq->num;
   2609 }
   2610 
   2611 /* Grab the next descriptor number they're advertising, and 
increment
   2612  * the index we've seen. */
   2613 head = vhost_get_avail_head(vq);

  2614  if (unlikely(head < 0))

   2615 return head;


Thanks for the report. @head needs to be 'int' instead of 'unsigned int'
so that it can hold the error number from vhost_get_avail_head(). I would
give it more time to see if there are other review comments before I revise
it to fix it up.

Thanks,
Gavin




Re: [GIT PULL] virtio: bugfix

2024-04-25 Thread pr-tracker-bot
The pull request you sent on Thu, 25 Apr 2024 18:01:06 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c942a0cd3603e34dd2d7237e064d9318cb7f9654

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix vectors

2024-04-25 Thread Michael S. Tsirkin
On Tue, Apr 23, 2024 at 08:42:57AM +, Angus Chen wrote:
> Hi mst.
> 
> > -Original Message-
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, April 23, 2024 4:35 PM
> > To: Gavin Liu 
> > Cc: jasow...@redhat.com; Angus Chen ;
> > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com;
> > linux-kernel@vger.kernel.org; Heng Qi 
> > Subject: Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix vectors
> > 
> > On Tue, Apr 23, 2024 at 01:39:17AM +, Gavin Liu wrote:
> > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote:
> > > > From: Yuxue Liu 
> > > >
> > > > When there is a ctlq and it doesn't require interrupt callbacks,the
> > > > original method of calculating vectors wastes hardware msi or msix
> > > > resources as well as system IRQ resources.
> > > >
> > > > When conducting performance testing using testpmd in the guest os, it
> > > > was found that the performance was lower compared to directly using
> > > > vfio-pci to passthrough the device
> > > >
> > > > In scenarios where the virtio device in the guest os does not utilize
> > > > interrupts, the vdpa driver still configures the hardware's msix
> > > > vector. Therefore, the hardware still sends interrupts to the host os.
> > >
> > > >I just have a question on this part. How come hardware sends interrupts 
> > > >does
> > not guest driver disable them?
> > >
> > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU sets
> > the call fd to -1
> > >2:On the host side, the vhost_vdpa program will set
> > vp_vdpa->vring[i].cb.callback to invalid
> > >3:Before the modification, the vp_vdpa_request_irq function does not
> > check whether
> > >   vp_vdpa->vring[i].cb.callback is valid. Instead, it enables the
> > hardware's MSIX
> > > interrupts based on the number of queues of the device
> > >
> > 
> > So MSIX is enabled but why would it trigger? virtio PMD in poll mode
> > presumably suppresses interrupts after all.
> Virtio pmd is in the guest,but in host side,the msix is enabled,then the 
> device will triger 
> Interrupt normally. I analysed this bug before,and I think gavin is right.
> Did I make it clear?

Not really. Guest disables interrupts presumably (it's polling)
why does device still send them?


> > 
> > >
> > >
> > > - Original Message -
> > > From: Michael S. Tsirkin m...@redhat.com
> > > Sent: April 22, 2024 20:09
> > > To: Gavin Liu gavin@jaguarmicro.com
> > > Cc: jasow...@redhat.com; Angus Chen angus.c...@jaguarmicro.com;
> > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com;
> > linux-kernel@vger.kernel.org; Heng Qi hen...@linux.alibaba.com
> > > Subject: Re: [PATCH v5] vp_vdpa: don't allocate unused msix vectors
> > >
> > >
> > >
> > > External Mail: This email originated from OUTSIDE of the organization!
> > > Do not click links, open attachments or provide ANY information unless you
> > recognize the sender and know the content is safe.
> > >
> > >
> > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote:
> > > > From: Yuxue Liu 
> > > >
> > > > When there is a ctlq and it doesn't require interrupt callbacks,the
> > > > original method of calculating vectors wastes hardware msi or msix
> > > > resources as well as system IRQ resources.
> > > >
> > > > When conducting performance testing using testpmd in the guest os, it
> > > > was found that the performance was lower compared to directly using
> > > > vfio-pci to passthrough the device
> > > >
> > > > In scenarios where the virtio device in the guest os does not utilize
> > > > interrupts, the vdpa driver still configures the hardware's msix
> > > > vector. Therefore, the hardware still sends interrupts to the host os.
> > >
> > > I just have a question on this part. How come hardware sends interrupts 
> > > does
> > not guest driver disable them?
> > >
> > > > Because of this unnecessary
> > > > action by the hardware, hardware performance decreases, and it also
> > > > affects the performance of the host os.
> > > >
> > > > Before modification:(interrupt mode)
> > > >  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
> > > >  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
> > > >  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
> > > >  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
> > > >
> > > > After modification:(interrupt mode)
> > > >  32:  0  0  1  7   PCI-MSI 32768-edge  vp-vdpa[:00:02.0]-0
> > > >  33: 36  0  3  0   PCI-MSI 32769-edge  vp-vdpa[:00:02.0]-1
> > > >  34:  0  0  0  0   PCI-MSI 32770-edge  vp-vdpa[:00:02.0]-config
> > > >
> > > > Before modification:(virtio pmd mode for guest os)
> > > >  32:  0   0  0  0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0
> > > >  33:  0   0  0  0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1
> > > >  34:  0   0  0  0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2
> > > >  35:  0   0  0  0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config
> > > >
> > > > 

[GIT PULL] virtio: bugfix

2024-04-25 Thread Michael S. Tsirkin
The following changes since commit 0bbac3facb5d6cc0171c45c9873a2dc96bea9680:

  Linux 6.9-rc4 (2024-04-14 13:38:39 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 98a821546b3919a10a58faa12ebe5e9a55cd638e:

  vDPA: code clean for vhost_vdpa uapi (2024-04-22 17:07:13 -0400)


virtio: bugfix

enum renames for vdpa uapi - we better do this now before
the names have been in any releases.

Signed-off-by: Michael S. Tsirkin 


Zhu Lingshan (1):
  vDPA: code clean for vhost_vdpa uapi

 drivers/vdpa/vdpa.c   | 6 +++---
 include/uapi/linux/vdpa.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)




Re: [PATCH v7 4/6] soc: qcom: qmi: add a way to remove running service

2024-04-25 Thread Chris Lew




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

Add qmi_del_server(), a pair to qmi_add_server(), a way to remove
running server from the QMI socket. This is e.g. necessary for
pd-mapper, which needs to readd a server each time the DSP is started or


s/readd/read/


stopped.

Tested-by: Neil Armstrong  # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov 
---





+/**
+ * qmi_del_server() - register a service with the name service


Update comment to describe removal of service instead of 'register'.



Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets

2024-04-25 Thread Pablo Neira Ayuso
Hi,

On Thu, Apr 25, 2024 at 06:28:40PM +0200, Ismael Luceno wrote:
> It was observed in the wild that pairs of consecutive packets would leave
> the IPVS with the same wrong checksum, and the issue only went away when
> disabling GSO.
> 
> IPVS needs to avoid computing the SCTP checksum when using GSO.

This patch went already upstream.

It was no clear to me that a v3 was needed.

You will have to post a follow up.



Re: [PATCH 3/4] vhost: Improve vhost_get_avail_head()

2024-04-25 Thread kernel test robot
Hi Gavin,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.9-rc5 next-20240424]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Gavin-Shan/vhost-Drop-variable-last_avail_idx-in-vhost_get_vq_desc/20240423-112803
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20240423032407.262329-4-gshan%40redhat.com
patch subject: [PATCH 3/4] vhost: Improve vhost_get_avail_head()
config: i386-randconfig-141-20240426 
(https://download.01.org/0day-ci/archive/20240426/202404260448.g7f06v7m-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404260448.g7f06v7m-...@intel.com/

smatch warnings:
drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: unsigned 'head' is never 
less than zero.
drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: error code type promoted 
to positive: 'head'

vim +/head +2614 drivers/vhost/vhost.c

  2581  
  2582  /* This looks in the virtqueue and for the first available buffer, and 
converts
  2583   * it to an iovec for convenient access.  Since descriptors consist of 
some
  2584   * number of output then some number of input descriptors, it's 
actually two
  2585   * iovecs, but we pack them into one and note how many of each there 
were.
  2586   *
  2587   * This function returns the descriptor number found, or vq->num (which 
is
  2588   * never a valid descriptor number) if none was found.  A negative code 
is
  2589   * returned on error. */
  2590  int vhost_get_vq_desc(struct vhost_virtqueue *vq,
  2591struct iovec iov[], unsigned int iov_size,
  2592unsigned int *out_num, unsigned int *in_num,
  2593struct vhost_log *log, unsigned int *log_num)
  2594  {
  2595  struct vring_desc desc;
  2596  unsigned int i, head, found = 0;
  2597  int ret, access;
  2598  
  2599  if (vq->avail_idx == vq->last_avail_idx) {
  2600  ret = vhost_get_avail_idx(vq);
  2601  if (unlikely(ret))
  2602  return ret;
  2603  
  2604  /* If there's nothing new since last we looked, return
  2605   * invalid.
  2606   */
  2607  if (vq->avail_idx == vq->last_avail_idx)
  2608  return vq->num;
  2609  }
  2610  
  2611  /* Grab the next descriptor number they're advertising, and 
increment
  2612   * the index we've seen. */
  2613  head = vhost_get_avail_head(vq);
> 2614  if (unlikely(head < 0))
  2615  return head;
  2616  
  2617  /* When we start there are none of either input nor output. */
  2618  *out_num = *in_num = 0;
  2619  if (unlikely(log))
  2620  *log_num = 0;
  2621  
  2622  i = head;
  2623  do {
  2624  unsigned iov_count = *in_num + *out_num;
  2625  if (unlikely(i >= vq->num)) {
  2626  vq_err(vq, "Desc index is %u > %u, head = %u",
  2627 i, vq->num, head);
  2628  return -EINVAL;
  2629  }
  2630  if (unlikely(++found > vq->num)) {
  2631  vq_err(vq, "Loop detected: last one at %u "
  2632 "vq size %u head %u\n",
  2633 i, vq->num, head);
  2634  return -EINVAL;
  2635  }
  2636  ret = vhost_get_desc(vq, , i);
  2637  if (unlikely(ret)) {
  2638  vq_err(vq, "Failed to get descriptor: idx %d 
addr %p\n",
  2639 i, vq->desc + i);
  2640  return -EFAULT;
  2641  }
  2642  if (desc.flags & cpu_to_vhost16(vq, 
VRING_DESC_F_INDIRECT)) {
  2643  ret = get_indirect(vq, iov, iov_size,
  2644 out_num, in_num,
  2645 log, log_num, );
  2646  if (unlikely(ret < 0)) {
  2647  if (ret != -EAGAIN)
  2648  vq_err(vq, "Failure detected "
  2649 

Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-04-25 Thread Andrii Nakryiko
On Mon, Apr 15, 2024 at 5:49 AM Masami Hiramatsu (Google)
 wrote:
>
> Hi,
>
> Here is the 9th version of the series to re-implement the fprobe on
> function-graph tracer. The previous version is;
>
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
>
> This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> and fixed some bugs + performance optimization patch[36/36].
>  - [12/36] Fix to clear fgraph_array entry in registration failure, also
>return -ENOSPC when fgraph_array is full.
>  - [28/36] Add new store_fprobe_entry_data() for fprobe.
>  - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
>  - [36/36] Add new flag to skip timestamp recording.
>
> Overview
> 
> This series does major 2 changes, enable multiple function-graphs on
> the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> fprobe on this function-graph.
>
> The former changes had been sent from Steven Rostedt 4 years ago (*),
> which allows users to set different setting function-graph tracer (and
> other tracers based on function-graph) in each trace-instances at the
> same time.
>
> (*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/
>
> The purpose of latter change are;
>
>  1) Remove dependency of the rethook from fprobe so that we can reduce
>the return hook code and shadow stack.
>
>  2) Make 'ftrace_regs' the common trace interface for the function
>boundary.
>
> 1) Currently we have 2(or 3) different function return hook codes,
>  the function-graph tracer and rethook (and legacy kretprobe).
>  But since this  is redundant and needs double maintenance cost,
>  I would like to unify those. From the user's viewpoint, function-
>  graph tracer is very useful to grasp the execution path. For this
>  purpose, it is hard to use the rethook in the function-graph
>  tracer, but the opposite is possible. (Strictly speaking, kretprobe
>  can not use it because it requires 'pt_regs' for historical reasons.)
>
> 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
>  wrong for the function entry and exit. Moreover, depending on the
>  architecture, there is no way to accurately reproduce 'pt_regs'
>  outside of interrupt or exception handlers. This means fprobe should
>  not use 'pt_regs' because it does not use such exceptions.
>  (Conversely, kprobe should use 'pt_regs' because it is an abstract
>   interface of the software breakpoint exception.)
>
> This series changes fprobe to use function-graph tracer for tracing
> function entry and exit, instead of mixture of ftrace and rethook.
> Unlike the rethook which is a per-task list of system-wide allocated
> nodes, the function graph's ret_stack is a per-task shadow stack.
> Thus it does not need to set 'nr_maxactive' (which is the number of
> pre-allocated nodes).
> Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> their register interface, this changes it to convert 'ftrace_regs' to
> 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> so users must access only registers for function parameters or
> return value.
>
> Design
> --
> Instead of using ftrace's function entry hook directly, the new fprobe
> is built on top of the function-graph's entry and return callbacks
> with 'ftrace_regs'.
>
> Since the fprobe requires access to 'ftrace_regs', the architecture
> must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> entry callback with 'ftrace_regs', and also
> CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> return_to_handler.
>
> All fprobes share a single function-graph ops (means shares a common
> ftrace filter) similar to the kprobe-on-ftrace. This needs another
> layer to find corresponding fprobe in the common function-graph
> callbacks, but has much better scalability, since the number of
> registered function-graph ops is limited.
>
> In the entry callback, the fprobe runs its entry_handler and saves the
> address of 'fprobe' on the function-graph's shadow stack as data. The
> return callback decodes the data to get the 'fprobe' address, and runs
> the exit_handler.
>
> The fprobe introduces two hash-tables, one is for entry callback which
> searches fprobes related to the given function address passed by entry
> callback. The other is for a return callback which checks if the given
> 'fprobe' data structure pointer is still valid. Note that it is
> possible to unregister fprobe before the return callback runs. Thus
> the address validation must be done before using it in the return
> callback.
>
> This series can be applied against the probes/for-next branch, which
> is based on v6.9-rc3.
>
> This series can also be found below branch.
>
> 

Re: [PATCH v9 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded

2024-04-25 Thread Andrii Nakryiko
On Mon, Apr 15, 2024 at 6:25 AM Masami Hiramatsu (Google)
 wrote:
>
> From: Masami Hiramatsu (Google) 
>
> Skip recording calltime and rettime if the fgraph_ops does not need it.
> This is a kind of performance optimization for fprobe. Since the fprobe
> user does not use these entries, recording timestamp in fgraph is just
> a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag,
> and all fgraph_ops sets this flag, skip recording calltime and rettime.
>
> Suggested-by: Jiri Olsa 
> Signed-off-by: Masami Hiramatsu (Google) 
> ---
>  Changes in v9:
>   - Newly added.
> ---
>  include/linux/ftrace.h |2 ++
>  kernel/trace/fgraph.c  |   46 +++---
>  kernel/trace/fprobe.c  |1 +
>  3 files changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index d845a80a3d56..06fc7cbef897 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -1156,6 +1156,8 @@ struct fgraph_ops {
> struct ftrace_ops   ops; /* for the hash lists */
> void*private;
> int idx;
> +   /* If skip_timestamp is true, this does not record timestamps. */
> +   boolskip_timestamp;
>  };
>
>  void *fgraph_reserve_data(int idx, int size_bytes);
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 7556fbbae323..a5722537bb79 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -131,6 +131,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
>  int ftrace_graph_active;
>
>  static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
> +static bool fgraph_skip_timestamp;
>
>  /* LRU index table for fgraph_array */
>  static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
> @@ -475,7 +476,7 @@ void ftrace_graph_stop(void)
>  static int
>  ftrace_push_return_trace(unsigned long ret, unsigned long func,
>  unsigned long frame_pointer, unsigned long *retp,
> -int fgraph_idx)
> +int fgraph_idx, bool skip_ts)
>  {
> struct ftrace_ret_stack *ret_stack;
> unsigned long long calltime;
> @@ -498,8 +499,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned 
> long func,
> ret_stack = get_ret_stack(current, current->curr_ret_stack, );
> if (ret_stack && ret_stack->func == func &&
> get_fgraph_type(current, index + FGRAPH_RET_INDEX) == 
> FGRAPH_TYPE_BITMAP &&
> -   !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, 
> fgraph_idx))
> +   !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, 
> fgraph_idx)) {
> +   /* If previous one skips calltime, update it. */
> +   if (!skip_ts && !ret_stack->calltime)
> +   ret_stack->calltime = trace_clock_local();
> return index + FGRAPH_RET_INDEX;
> +   }
>
> val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;
>
> @@ -517,7 +522,10 @@ ftrace_push_return_trace(unsigned long ret, unsigned 
> long func,
> return -EBUSY;
> }
>
> -   calltime = trace_clock_local();
> +   if (skip_ts)

would it be ok to add likely() here to keep the least-overhead code path linear?

> +   calltime = 0LL;
> +   else
> +   calltime = trace_clock_local();
>
> index = READ_ONCE(current->curr_ret_stack);
> ret_stack = RET_STACK(current, index);
> @@ -601,7 +609,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned 
> long func,
> trace.func = func;
> trace.depth = ++current->curr_ret_depth;
>
> -   index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
> +   index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0,
> +fgraph_skip_timestamp);
> if (index < 0)
> goto out;
>
> @@ -654,7 +663,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned 
> long func,
> return -ENODEV;
>
> /* Use start for the distance to ret_stack (skipping over reserve) */
> -   index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 
> gops->idx);
> +   index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 
> gops->idx,
> +gops->skip_timestamp);
> if (index < 0)
> return index;
> type = get_fgraph_type(current, index);
> @@ -732,6 +742,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
> unsigned long *ret,
> *ret = ret_stack->ret;
> trace->func = ret_stack->func;
> trace->calltime = ret_stack->calltime;
> +   trace->rettime = 0;
> trace->overrun = atomic_read(>trace_overrun);
> trace->depth = current->curr_ret_depth;
> /*
> @@ -792,7 +803,6 @@ __ftrace_return_to_handler(struct 

Re: [PATCH v9 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

2024-04-25 Thread Andrii Nakryiko
On Mon, Apr 15, 2024 at 6:22 AM Masami Hiramatsu (Google)
 wrote:
>
> From: Masami Hiramatsu (Google) 
>
> Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
> converted from ftrace_regs by ftrace_partial_regs(), thus some registers
> may always returns 0. But it should be enough for function entry (access
> arguments) and exit (access return value).
>
> Signed-off-by: Masami Hiramatsu (Google) 
> Acked-by: Florent Revest 
> ---
>  Changes from previous series: NOTHING, Update against the new series.
> ---
>  kernel/trace/bpf_trace.c |   22 +-
>  1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e51a6ef87167..57b1174030c9 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
>  fs_initcall(bpf_event_init);
>  #endif /* CONFIG_MODULES */
>
> -#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
> +#ifdef CONFIG_FPROBE
>  struct bpf_kprobe_multi_link {
> struct bpf_link link;
> struct fprobe fp;
> @@ -2600,6 +2600,8 @@ struct user_syms {
> char *buf;
>  };
>
> +static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);

this is a waste if CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, right?
Can we guard it?


> +
>  static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, 
> u32 cnt)
>  {
> unsigned long __user usymbol;
> @@ -2792,13 +2794,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct 
> bpf_run_ctx *ctx)
>
>  static int
>  kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> -  unsigned long entry_ip, struct pt_regs *regs)
> +  unsigned long entry_ip, struct ftrace_regs *fregs)
>  {
> struct bpf_kprobe_multi_run_ctx run_ctx = {
> .link = link,
> .entry_ip = entry_ip,
> };
> struct bpf_run_ctx *old_run_ctx;
> +   struct pt_regs *regs;
> int err;
>
> if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> @@ -2809,6 +2812,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link 
> *link,
>
> migrate_disable();
> rcu_read_lock();
> +   regs = ftrace_partial_regs(fregs, 
> this_cpu_ptr(_kprobe_multi_pt_regs));

and then pass NULL if defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)?


> old_run_ctx = bpf_set_run_ctx(_ctx.run_ctx);
> err = bpf_prog_run(link->link.prog, regs);
> bpf_reset_run_ctx(old_run_ctx);
> @@ -2826,13 +2830,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned 
> long fentry_ip,
>   void *data)
>  {
> struct bpf_kprobe_multi_link *link;
> -   struct pt_regs *regs = ftrace_get_regs(fregs);
> -
> -   if (!regs)
> -   return 0;
>
> link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> -   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> +   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
> return 0;
>  }
>
> @@ -2842,13 +2842,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, 
> unsigned long fentry_ip,
>void *data)
>  {
> struct bpf_kprobe_multi_link *link;
> -   struct pt_regs *regs = ftrace_get_regs(fregs);
> -
> -   if (!regs)
> -   return;
>
> link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> -   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> +   kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
>  }
>
>  static int symbols_cmp_r(const void *a, const void *b, const void *priv)
> @@ -3107,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr 
> *attr, struct bpf_prog *pr
> kvfree(cookies);
> return err;
>  }
> -#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +#else /* !CONFIG_FPROBE */
>  int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog 
> *prog)
>  {
> return -EOPNOTSUPP;
>
>



Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-25 Thread David Hildenbrand

On 25.04.24 17:19, Guillaume Morin wrote:

On 24 Apr 23:00, David Hildenbrand wrote:

One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
hugetlb mappings. However this was also on my TODO and I have a draft
patch that implements it.


Yes, I documented it back then and added sanity checks in GUP code to fence
it off. Shouldn't be too hard to implement (famous last words) and would be
the cleaner thing to use here once I manage to switch over to
FOLL_WRITE|FOLL_FORCE to break COW.


Yes, my patch seems to be working. The hugetlb code is pretty simple.
And it allows ptrace and the proc pid mem file to work on the executable
private hugetlb mappings.

There is one thing I am unclear about though. hugetlb enforces that
huge_pte_write() is true on FOLL_WRITE in both the fault and
follow_page_mask paths. I am not sure if we can simply assume in the
hugetlb code that if the pte is not writable and this is a write fault
then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?

The latter is more complicated in the fault path because there is no
FAULT_FLAG_FORCE flag.



I just pushed something to
https://github.com/davidhildenbrand/linux/tree/uprobes_cow

Only very lightly tested so far. Expect the worst :)

I still detest having the zapping logic there, but to get it all right I 
don't see a clean way around that.



For hugetlb, we'd primarily have to implement the 
mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE).


Likely vaddr and PAGE_SIZE in uprobe_write_opcode() would have to be 
expanded to cover the full hugetlb page.


--
Cheers,

David / dhildenb




Re: [PATCH v7 1/6] soc: qcom: pdr: protect locator_addr with the main mutex

2024-04-25 Thread Chris Lew



On 4/24/2024 2:27 AM, Dmitry Baryshkov wrote:

If the service locator server is restarted fast enough, the PDR can
rewrite locator_addr fields concurrently. Protect them by placing
modification of those fields under the main pdr->lock.

Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
Tested-by: Neil Armstrong  # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov 
---
  drivers/soc/qcom/pdr_interface.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
index a1b6a4081dea..19cfe4b41235 100644
--- a/drivers/soc/qcom/pdr_interface.c
+++ b/drivers/soc/qcom/pdr_interface.c
@@ -76,12 +76,12 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
  locator_hdl);
struct pdr_service *pds;
  
+	mutex_lock(>lock);

/* Create a local client port for QMI communication */
pdr->locator_addr.sq_family = AF_QIPCRTR;
pdr->locator_addr.sq_node = svc->node;
pdr->locator_addr.sq_port = svc->port;
  
-	mutex_lock(>lock);

pdr->locator_init_complete = true;
mutex_unlock(>lock);
  
@@ -104,10 +104,10 @@ static void pdr_locator_del_server(struct qmi_handle *qmi,
  
  	mutex_lock(>lock);

pdr->locator_init_complete = false;
-   mutex_unlock(>lock);
  
  	pdr->locator_addr.sq_node = 0;

pdr->locator_addr.sq_port = 0;
+   mutex_unlock(>lock);
  }
  
  static const struct qmi_ops pdr_locator_ops = {




These two functions are provided as qmi_ops handlers in pdr_locator_ops. 
Aren't they serialized in the qmi handle's workqueue since it as an 
ordered_workqueue? Even in a fast pdr scenario I don't think we would 
see a race condition between these two functions.


The other access these two functions do race against is in the 
pdr_notifier_work. I think you would need to protect locator_addr in 
pdr_get_domain_list since the qmi_send_request there uses 
'pdr->locator_addr'.


Thanks!
Chris



[PATCH 2/2] dt-bindings: soc: qcom,smp2p: Mark qcom,ipc as deprecated

2024-04-25 Thread Luca Weiss
Deprecate the qcom,ipc way of accessing the mailbox in favor of the
'mboxes' property.

Update the example to use mboxes.

Signed-off-by: Luca Weiss 
---
 Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml 
b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml
index 58500529b90f..141d666dc3f7 100644
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml
@@ -41,6 +41,7 @@ properties:
 description:
   Three entries specifying the outgoing ipc bit used for signaling the
   remote end of the smp2p edge.
+deprecated: true
 
   qcom,local-pid:
 $ref: /schemas/types.yaml#/definitions/uint32
@@ -128,7 +129,7 @@ examples:
 compatible = "qcom,smp2p";
 qcom,smem = <431>, <451>;
 interrupts = ;
-qcom,ipc = < 8 18>;
+mboxes = < 18>;
 qcom,local-pid = <0>;
 qcom,remote-pid = <4>;
 

-- 
2.44.0




[PATCH 1/2] dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated

2024-04-25 Thread Luca Weiss
Deprecate the qcom,ipc way of accessing the mailbox in favor of the
'mboxes' property.

Update the example to use mboxes.

Signed-off-by: Luca Weiss 
---
 Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml
index 02c85b420c1a..63500b1a0f6f 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml
@@ -61,6 +61,7 @@ properties:
 description:
   Three entries specifying the outgoing ipc bit used for signaling the
   remote processor.
+deprecated: true
 
   qcom,smd-edge:
 $ref: /schemas/types.yaml#/definitions/uint32
@@ -111,7 +112,7 @@ examples:
 smd-edge {
 interrupts = ;
 
-qcom,ipc = < 8 8>;
+mboxes = < 8>;
 qcom,smd-edge = <1>;
 };
 };

-- 
2.44.0




[PATCH 0/2] Mark qcom,ipc as deprecated in two schemas

2024-04-25 Thread Luca Weiss
The mboxes property has been supported in those bindings since a while
and was always meant to the replace qcom,ipc properties, so let's mark
qcom,ipc as deprecated - and update the examples to use mboxes.

Related:
https://lore.kernel.org/linux-arm-msm/20240424-apcs-mboxes-v1-0-6556c47cb...@z3ntu.xyz/

Signed-off-by: Luca Weiss 
---
Luca Weiss (2):
  dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated
  dt-bindings: soc: qcom,smp2p: Mark qcom,ipc as deprecated

 Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml | 3 ++-
 Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml  | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)
---
base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
change-id: 20240425-qcom-ipc-deprecate-37afbe33cadc

Best regards,
-- 
Luca Weiss 




Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs

2024-04-25 Thread Andrew Davis

On 4/25/24 12:15 PM, Conor Dooley wrote:

On Wed, Apr 24, 2024 at 03:36:39PM -0500, Rob Herring wrote:


On Wed, 24 Apr 2024 14:06:09 -0500, Andrew Davis wrote:

From: Hari Nagalla 

K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain.
The remote processor's life cycle management and IPC mechanisms are
similar across the R5F and M4F cores from remote processor driver
point of view. However, there are subtle differences in image loading
and starting the M4F subsystems.

The YAML binding document provides the various node properties to be
configured by the consumers of the M4F subsystem.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
  .../bindings/remoteproc/ti,k3-m4f-rproc.yaml  | 126 ++
  1 file changed, 126 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml



My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:

dtschema/dtc warnings/errors:


doc reference errors (make refcheckdocs):
Warning: Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml 
references a file that doesn't exist: 
Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml: 
Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml


The file is now in dt-schema:
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml


So should I use "reserved-memory/reserved-memory.yaml" here, or just
drop this line completely?

Andrew



Re: [PATCH RFC 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc

2024-04-25 Thread Luca Weiss
On Donnerstag, 25. April 2024 18:17:15 MESZ Rob Herring wrote:
> On Wed, Apr 24, 2024 at 07:21:51PM +0200, Luca Weiss wrote:
> > The qcom,ipc-N properties are essentially providing a reference to a
> > mailbox, so allow using the mboxes property to do the same in a more
> > structured way.
> 
> Can we mark qcom,ipc-N as deprecated then?

Yes, that should be ok. Will also send a similar change to the other bindings
that support both qcom,ipc and mboxes.

>  
> > Since multiple SMSM hosts are supported, we need to be able to provide
> > the correct mailbox for each host. The old qcom,ipc-N properties map to
> > the mboxes property by index, starting at 0 since that's a valid SMSM
> > host also.
> > 
> > The new example shows how an smsm node with just qcom,ipc-3 should be
> > specified with the mboxes property.
> > 
> > Signed-off-by: Luca Weiss 
> > ---
> >  .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 48 
> > ++
> >  1 file changed, 40 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml 
> > b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> > index db67cf043256..b12589171169 100644
> > --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> > +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> > @@ -33,6 +33,13 @@ properties:
> >specifier of the column in the subscription matrix representing the 
> > local
> >processor.
> >  
> > +  mboxes:
> > +minItems: 1
> > +maxItems: 5
> 
> Need to define what each entry is.

The entry is (description from qcom,ipc-N)

  "the outgoing ipc bit used for signaling the N:th remote processor."

So you want me to add 5 times e.g.

- the IPC mailbox used for signaling the 0th remote processor
- the IPC mailbox used for signaling the 1st remote processor

etc? I don't really have any extra knowledge on smsm to be able to write
something better there..

Also what are your thoughts on this binding vs the alternative I wrote
in the cover letter? I'm not really happy about how the properties are
represented.

Regards
Luca


> 
> > +description:
> > +  Reference to the mailbox representing the outgoing doorbell in APCS 
> > for
> > +  this client.
> > +
> >'#size-cells':
> >  const: 0
> >  
> > @@ -98,15 +105,18 @@ required:
> >- '#address-cells'
> >- '#size-cells'
> >  
> > -anyOf:
> > +oneOf:
> >- required:
> > -  - qcom,ipc-1
> > -  - required:
> > -  - qcom,ipc-2
> > -  - required:
> > -  - qcom,ipc-3
> > -  - required:
> > -  - qcom,ipc-4
> > +  - mboxes
> > +  - anyOf:
> > +  - required:
> > +  - qcom,ipc-1
> > +  - required:
> > +  - qcom,ipc-2
> > +  - required:
> > +  - qcom,ipc-3
> > +  - required:
> > +  - qcom,ipc-4
> >  
> >  additionalProperties: false
> >  
> > @@ -136,3 +146,25 @@ examples:
> >  #interrupt-cells = <2>;
> >  };
> >  };
> > +  # Example using mboxes property
> > +  - |
> > +#include 
> > +
> > +shared-memory {
> > +compatible = "qcom,smsm";
> > +#address-cells = <1>;
> > +#size-cells = <0>;
> > +mboxes = <0>, <0>, <0>, < 19>;
> > +
> > +apps@0 {
> > +reg = <0>;
> > +#qcom,smem-state-cells = <1>;
> > +};
> > +
> > +wcnss@7 {
> > +reg = <7>;
> > +interrupts = ;
> > +interrupt-controller;
> > +#interrupt-cells = <2>;
> > +};
> > +};
> > 
> 







Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task

2024-04-25 Thread Jiri Olsa
On Thu, Apr 25, 2024 at 02:05:31AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:977b1ef51866 Merge tag 'block-6.9-20240420' of git://git.k..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=17080d2098
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f47e5e015c177e57
> dashboard link: https://syzkaller.appspot.com/bug?extid=83e7f982ca045ab4405c
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/549d1add1da9/disk-977b1ef5.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/3e8e501c8aa2/vmlinux-977b1ef5.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/d02f7cb905b8/bzImage-977b1ef5.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+83e7f982ca045ab44...@syzkaller.appspotmail.com
> 
> ==
> WARNING: possible circular locking dependency detected
> 6.9.0-rc4-syzkaller-00266-g977b1ef51866 #0 Not tainted
> --
> syz-executor.0/11241 is trying to acquire lock:
> 888020a2c0d8 (>siglock){-.-.}-{2:2}, at: 
> force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
> 
> but task is already holding lock:
> 8880b943e658 (>__lock){-.-.}-{2:2}, at: 
> raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (>__lock){-.-.}-{2:2}:
>lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
>_raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
>raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
>raw_spin_rq_lock kernel/sched/sched.h:1385 [inline]
>_raw_spin_rq_lock_irqsave kernel/sched/sched.h:1404 [inline]
>rq_lock_irqsave kernel/sched/sched.h:1683 [inline]
>class_rq_lock_irqsave_constructor kernel/sched/sched.h:1737 [inline]
>sched_mm_cid_exit_signals+0x17b/0x4b0 kernel/sched/core.c:12005
>exit_signals+0x2a1/0x5c0 kernel/signal.c:3016
>do_exit+0x6a8/0x27e0 kernel/exit.c:837
>__do_sys_exit kernel/exit.c:994 [inline]
>__se_sys_exit kernel/exit.c:992 [inline]
>__pfx___ia32_sys_exit+0x0/0x10 kernel/exit.c:992
>do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> -> #0 (>siglock){-.-.}-{2:2}:
>check_prev_add kernel/locking/lockdep.c:3134 [inline]
>check_prevs_add kernel/locking/lockdep.c:3253 [inline]
>validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
>__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
>lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
>__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
>force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
>force_sig_fault_to_task kernel/signal.c:1733 [inline]
>force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738
>__bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
>handle_page_fault arch/x86/mm/fault.c:1505 [inline]
>exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
>asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>strncpy_from_user+0x2c6/0x2f0 lib/strncpy_from_user.c:138
>strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186
>bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:216 [inline]
>bpf_probe_read_compat_str kernel/trace/bpf_trace.c:311 [inline]
>bpf_probe_read_compat_str+0xe9/0x180 kernel/trace/bpf_trace.c:307
>bpf_prog_e42f6260c1b72fb3+0x3d/0x3f
>bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
>__bpf_prog_run include/linux/filter.h:657 [inline]
>bpf_prog_run include/linux/filter.h:664 [inline]
>__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
>bpf_trace_run4+0x25a/0x490 kernel/trace/bpf_trace.c:2422
>__traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222
>trace_sched_switch include/trace/events/sched.h:222 [inline]
>__schedule+0x2535/0x4a00 kernel/sched/core.c:6743
>preempt_schedule_irq+0xfb/0x1c0 kernel/sched/core.c:7068
>irqentry_exit+0x5e/0x90 kernel/entry/common.c:354
>asm_sysvec_apic_timer_interrupt+0x1a/0x20 
> arch/x86/include/asm/idtentry.h:702
>force_sig_fault+0x0/0x1d0
>__bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
>handle_page_fault arch/x86/mm/fault.c:1505 [inline]
>

Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets

2024-04-25 Thread Jakub Kicinski
On Thu, 25 Apr 2024 18:28:40 +0200 Ismael Luceno wrote:
> Changes since v2:
> * Use only skb_is_gso, no need to check for GSO type

v2 is already in the tree, if the change is important you need to send
an incremental fix.



Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets

2024-04-25 Thread Julian Anastasov

Hello,

On Thu, 25 Apr 2024, Ismael Luceno wrote:

> It was observed in the wild that pairs of consecutive packets would leave
> the IPVS with the same wrong checksum, and the issue only went away when
> disabling GSO.
> 
> IPVS needs to avoid computing the SCTP checksum when using GSO.
> 
> Fixes: 90017accff61 ("sctp: Add GSO support", 2016-06-02)
> Co-developed-by: Firo Yang 
> Signed-off-by: Ismael Luceno 
> Tested-by: Andreas Taschner 
> CC: Michal Kubeček 
> CC: Simon Horman 
> CC: Julian Anastasov 
> CC: lvs-de...@vger.kernel.org
> CC: netfilter-de...@vger.kernel.org
> CC: net...@vger.kernel.org
> CC: coret...@netfilter.org
> ---
> 
> Notes:
> Changes since v2:
> * Use only skb_is_gso, no need to check for GSO type

v2 is already applied. I acked it because sctp_gso_segment()
checks for skb_is_gso_sctp(). If v3 is just an optimization
better to live with v2? Is it possible to see skb_is_gso() but
not skb_is_gso_sctp() while working with SCTP packet?

> Changes since v1:
> * Added skb_is_gso before skb_is_gso_sctp.
> * Added "Fixes" tag.
> 
>  net/netfilter/ipvs/ip_vs_proto_sctp.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c 
> b/net/netfilter/ipvs/ip_vs_proto_sctp.c
> index a0921adc31a9..83e452916403 100644
> --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
> +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
> @@ -126,7 +126,8 @@ sctp_snat_handler(struct sk_buff *skb, struct 
> ip_vs_protocol *pp,
>   if (sctph->source != cp->vport || payload_csum ||
>   skb->ip_summed == CHECKSUM_PARTIAL) {
>   sctph->source = cp->vport;
> - sctp_nat_csum(skb, sctph, sctphoff);
> + if (!skb_is_gso(skb))
> + sctp_nat_csum(skb, sctph, sctphoff);
>   } else {
>   skb->ip_summed = CHECKSUM_UNNECESSARY;
>   }
> @@ -174,7 +175,8 @@ sctp_dnat_handler(struct sk_buff *skb, struct 
> ip_vs_protocol *pp,
>   (skb->ip_summed == CHECKSUM_PARTIAL &&
>!(skb_dst(skb)->dev->features & NETIF_F_SCTP_CRC))) {
>   sctph->dest = cp->dport;
> - sctp_nat_csum(skb, sctph, sctphoff);
> + if (!skb_is_gso(skb))
> + sctp_nat_csum(skb, sctph, sctphoff);
>   } else if (skb->ip_summed != CHECKSUM_PARTIAL) {
>   skb->ip_summed = CHECKSUM_UNNECESSARY;
>   }
> -- 
> 2.43.0

Regards

--
Julian Anastasov 

Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs

2024-04-25 Thread Conor Dooley
On Wed, Apr 24, 2024 at 02:06:09PM -0500, Andrew Davis wrote:
> From: Hari Nagalla 
> 
> K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain.
> The remote processor's life cycle management and IPC mechanisms are
> similar across the R5F and M4F cores from remote processor driver
> point of view. However, there are subtle differences in image loading
> and starting the M4F subsystems.
> 
> The YAML binding document provides the various node properties to be
> configured by the consumers of the M4F subsystem.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Hari Nagalla 
> Signed-off-by: Andrew Davis 
> +  mboxes:
> +description: |

> +  memory-region:
> +description: |

Not sure that either chomping operator is needed here, but that's a nit.
With the incorrect link fixed up,
Reviewed-by: Conor Dooley 

Cheers,
Conor.



signature.asc
Description: PGP signature


Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs

2024-04-25 Thread Conor Dooley
On Wed, Apr 24, 2024 at 03:36:39PM -0500, Rob Herring wrote:
> 
> On Wed, 24 Apr 2024 14:06:09 -0500, Andrew Davis wrote:
> > From: Hari Nagalla 
> > 
> > K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain.
> > The remote processor's life cycle management and IPC mechanisms are
> > similar across the R5F and M4F cores from remote processor driver
> > point of view. However, there are subtle differences in image loading
> > and starting the M4F subsystems.
> > 
> > The YAML binding document provides the various node properties to be
> > configured by the consumers of the M4F subsystem.
> > 
> > Signed-off-by: Martyn Welch 
> > Signed-off-by: Hari Nagalla 
> > Signed-off-by: Andrew Davis 
> > ---
> >  .../bindings/remoteproc/ti,k3-m4f-rproc.yaml  | 126 ++
> >  1 file changed, 126 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml
> > 
> 
> My bot found errors running 'make dt_binding_check' on your patch:
> 
> yamllint warnings/errors:
> 
> dtschema/dtc warnings/errors:
> 
> 
> doc reference errors (make refcheckdocs):
> Warning: Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml 
> references a file that doesn't exist: 
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
> Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml: 
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml

The file is now in dt-schema:
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml


signature.asc
Description: PGP signature


Re: [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-25 Thread Vincent Donnefort
On Wed, Apr 24, 2024 at 10:55:54PM +0200, David Hildenbrand wrote:
> On 24.04.24 22:31, Vincent Donnefort wrote:
> > Hi David,
> > 
> > Thanks for your quick response.
> > 
> > On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote:
> > > 
> > > I gave it some more thought, and I think we are still missing something (I
> > > wish PFNMAP/MIXEDMAP wouldn't be that hard).
> > > 
> > > > +
> > > > +/*
> > > > + *   +--+  pgoff == 0
> > > > + *   |   meta page  |
> > > > + *   +--+  pgoff == 1
> > > > + *   | subbuffer 0  |
> > > > + *   |  |
> > > > + *   +--+  pgoff == (1 + (1 << subbuf_order))
> > > > + *   | subbuffer 1  |
> > > > + *   |  |
> > > > + * ...
> > > > + */
> > > > +#ifdef CONFIG_MMU
> > > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> > > > +   struct vm_area_struct *vma)
> > > > +{
> > > > +   unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = 
> > > > vma->vm_pgoff;
> > > > +   unsigned int subbuf_pages, subbuf_order;
> > > > +   struct page **pages;
> > > > +   int p = 0, s = 0;
> > > > +   int err;
> > > > +
> > > 
> > > I'd add some comments here like
> > > 
> > > /* Refuse any MAP_PRIVATE or writable mappings. */
> > > > +   if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> > > > +   !(vma->vm_flags & VM_MAYSHARE))
> > > > +   return -EPERM;
> > > > +
> > > 
> > > /*
> > >   * Make sure the mapping cannot become writable later. Also, tell the VM
> > >   * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell
> > >   * GUP to leave them alone as well (VM_IO).
> > >   */
> > > > +   vm_flags_mod(vma,
> > > > +VM_MIXEDMAP | VM_PFNMAP |
> > > > +VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO,
> > > > +VM_MAYWRITE);
> > > 
> > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and,
> > > as stated, vm_insert_pages() even complains quite a lot when it would have
> > > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good
> > > reason.
> > > 
> > > Can't we limit ourselves to VM_IO?
> > > 
> > > But then, I wonder if it really helps much regarding GUP: yes, it blocks
> > > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked()
> > > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not
> > > reject it.
> > > 
> > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends 
> > > are
> > > the way to go, that will set pte_special() such that also GUP-fast will
> > > leave it alone, just like vm_normal_page() would.
> > > 
> > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone
> > > won't stop all of GUP. We really have to mark the PTE as special, which
> > > vm_insert_page() must not do (because it is refcounted!).
> > 
> > Hum, apologies, I am not sure to follow the connection here. Why do you 
> > think
> > the recommendation was to prevent GUP?
> 
> Ah, I'm hallucinating! :) "not let people play games with the mapping" to me
> implied "make sure nobody touches it". If GUP is acceptable that makes stuff
> a lot easier. VM_IO will block some GUP, but not all of it.
> 
> > 
> > > 
> > > Which means: do we really have to stop GUP from grabbing that page?
> > > 
> > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO)
> > > would be better.
> > 
> > Under the assumption we do not want to stop all GUP, why not using VM_IO 
> > over
> > VM_MIXEDMAP which is I believe more restrictive?
> 
> VM_MIXEDMAP will be implicitly set by vm_insert_page(). There is a lengthy 
> comment
> for vm_normal_page() that explains all this madness. VM_MIXEDMAP is primarily
> relevant for COW mappings, which you just forbid completely.
> 
> remap_pfn_range_notrack() documents the semantics of some of the other flags:
> 
>*   VM_IO tells people not to look at these pages
>*  (accesses can have side effects).
>*   VM_PFNMAP tells the core MM that the base pages are just
>*  raw PFN mappings, and do not have a "struct page" associated
>*  with them.
>*   VM_DONTEXPAND
>*  Disable vma merging and expanding with mremap().
>*   VM_DONTDUMP
>*  Omit vma from core dump, even when VM_IO turned off.
> 
> VM_PFNMAP is very likely really not what we want, unless we really perform raw
> PFN mappings ... VM_IO we can set without doing much harm.
> 
> So I would suggest dropping VM_PFNMAP when using vm_insert_pages(), using 
> only VM_IO
> and likely just letting vm_insert_pages() set VM_MIXEDMAP for you.

Sounds good, I will do that in v22.

> 
> [...]
> 
> > > 
> > > vm_insert_pages() documents: "In case of error, we may have mapped a 
> > > subset
> > > of the provided pages. It is the caller's responsibility to account for 
> > > this
> > > case."

[PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets

2024-04-25 Thread Ismael Luceno
It was observed in the wild that pairs of consecutive packets would leave
the IPVS with the same wrong checksum, and the issue only went away when
disabling GSO.

IPVS needs to avoid computing the SCTP checksum when using GSO.

Fixes: 90017accff61 ("sctp: Add GSO support", 2016-06-02)
Co-developed-by: Firo Yang 
Signed-off-by: Ismael Luceno 
Tested-by: Andreas Taschner 
CC: Michal Kubeček 
CC: Simon Horman 
CC: Julian Anastasov 
CC: lvs-de...@vger.kernel.org
CC: netfilter-de...@vger.kernel.org
CC: net...@vger.kernel.org
CC: coret...@netfilter.org
---

Notes:
Changes since v2:
* Use only skb_is_gso, no need to check for GSO type

Changes since v1:
* Added skb_is_gso before skb_is_gso_sctp.
* Added "Fixes" tag.

 net/netfilter/ipvs/ip_vs_proto_sctp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c 
b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index a0921adc31a9..83e452916403 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -126,7 +126,8 @@ sctp_snat_handler(struct sk_buff *skb, struct 
ip_vs_protocol *pp,
if (sctph->source != cp->vport || payload_csum ||
skb->ip_summed == CHECKSUM_PARTIAL) {
sctph->source = cp->vport;
-   sctp_nat_csum(skb, sctph, sctphoff);
+   if (!skb_is_gso(skb))
+   sctp_nat_csum(skb, sctph, sctphoff);
} else {
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
@@ -174,7 +175,8 @@ sctp_dnat_handler(struct sk_buff *skb, struct 
ip_vs_protocol *pp,
(skb->ip_summed == CHECKSUM_PARTIAL &&
 !(skb_dst(skb)->dev->features & NETIF_F_SCTP_CRC))) {
sctph->dest = cp->dport;
-   sctp_nat_csum(skb, sctph, sctphoff);
+   if (!skb_is_gso(skb))
+   sctp_nat_csum(skb, sctph, sctphoff);
} else if (skb->ip_summed != CHECKSUM_PARTIAL) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
-- 
2.43.0




Re: [PATCH RFC 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc

2024-04-25 Thread Rob Herring
On Wed, Apr 24, 2024 at 07:21:51PM +0200, Luca Weiss wrote:
> The qcom,ipc-N properties are essentially providing a reference to a
> mailbox, so allow using the mboxes property to do the same in a more
> structured way.

Can we mark qcom,ipc-N as deprecated then?
 
> Since multiple SMSM hosts are supported, we need to be able to provide
> the correct mailbox for each host. The old qcom,ipc-N properties map to
> the mboxes property by index, starting at 0 since that's a valid SMSM
> host also.
> 
> The new example shows how an smsm node with just qcom,ipc-3 should be
> specified with the mboxes property.
> 
> Signed-off-by: Luca Weiss 
> ---
>  .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 48 
> ++
>  1 file changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml 
> b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> index db67cf043256..b12589171169 100644
> --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
> @@ -33,6 +33,13 @@ properties:
>specifier of the column in the subscription matrix representing the 
> local
>processor.
>  
> +  mboxes:
> +minItems: 1
> +maxItems: 5

Need to define what each entry is.

> +description:
> +  Reference to the mailbox representing the outgoing doorbell in APCS for
> +  this client.
> +
>'#size-cells':
>  const: 0
>  
> @@ -98,15 +105,18 @@ required:
>- '#address-cells'
>- '#size-cells'
>  
> -anyOf:
> +oneOf:
>- required:
> -  - qcom,ipc-1
> -  - required:
> -  - qcom,ipc-2
> -  - required:
> -  - qcom,ipc-3
> -  - required:
> -  - qcom,ipc-4
> +  - mboxes
> +  - anyOf:
> +  - required:
> +  - qcom,ipc-1
> +  - required:
> +  - qcom,ipc-2
> +  - required:
> +  - qcom,ipc-3
> +  - required:
> +  - qcom,ipc-4
>  
>  additionalProperties: false
>  
> @@ -136,3 +146,25 @@ examples:
>  #interrupt-cells = <2>;
>  };
>  };
> +  # Example using mboxes property
> +  - |
> +#include 
> +
> +shared-memory {
> +compatible = "qcom,smsm";
> +#address-cells = <1>;
> +#size-cells = <0>;
> +mboxes = <0>, <0>, <0>, < 19>;
> +
> +apps@0 {
> +reg = <0>;
> +#qcom,smem-state-cells = <1>;
> +};
> +
> +wcnss@7 {
> +reg = <7>;
> +interrupts = ;
> +interrupt-controller;
> +#interrupt-cells = <2>;
> +};
> +};
> 
> -- 
> 2.44.0
> 



Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-25 Thread David Hildenbrand

On 25.04.24 17:19, Guillaume Morin wrote:

On 24 Apr 23:00, David Hildenbrand wrote:

One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
hugetlb mappings. However this was also on my TODO and I have a draft
patch that implements it.


Yes, I documented it back then and added sanity checks in GUP code to fence
it off. Shouldn't be too hard to implement (famous last words) and would be
the cleaner thing to use here once I manage to switch over to
FOLL_WRITE|FOLL_FORCE to break COW.


Yes, my patch seems to be working. The hugetlb code is pretty simple.
And it allows ptrace and the proc pid mem file to work on the executable
private hugetlb mappings.

There is one thing I am unclear about though. hugetlb enforces that
huge_pte_write() is true on FOLL_WRITE in both the fault and
follow_page_mask paths. I am not sure if we can simply assume in the
hugetlb code that if the pte is not writable and this is a write fault
then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?

The latter is more complicated in the fault path because there is no
FAULT_FLAG_FORCE flag.



handle_mm_fault()->sanitize_fault_flags() makes sure that we'll only 
proceed with a fault either if

* we have VM_WRITE set
* we are in a COW mapping (MAP_PRIVATE with at least VM_MAYWRITE)

Once you see FAULT_FLAG_WRITE and you do have VM_WRITE, you don't care 
about FOLL_FORCE, it's simply a write fault.


Once you see FAULT_FLAG_WRITE and you *don't* have VM_WRITE, you must 
have VM_MAYWRITE and are essentially in FOLL_FORCE.


In a VMA without VM_WRITE, you must never map a PTE writable. In 
ordinary COW code, that's done in wp_page_copy(), where we *always* use 
maybe_mkwrite(), to do exactly what a write fault would do, but without 
mapping the PTE writable.


That's what the whole can_follow_write_pmd()/can_follow_write_pte() is 
about: writing to PTEs that are not writable.


You'll have to follow the exact same model in hugetlb 
(can_follow_write_pmd(), hugetlb_maybe_mkwrite(), ...).


--
Cheers,

David / dhildenb




Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-25 Thread Guillaume Morin
On 24 Apr 23:00, David Hildenbrand wrote:
> > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
> > hugetlb mappings. However this was also on my TODO and I have a draft
> > patch that implements it.
> 
> Yes, I documented it back then and added sanity checks in GUP code to fence
> it off. Shouldn't be too hard to implement (famous last words) and would be
> the cleaner thing to use here once I manage to switch over to
> FOLL_WRITE|FOLL_FORCE to break COW.

Yes, my patch seems to be working. The hugetlb code is pretty simple.
And it allows ptrace and the proc pid mem file to work on the executable
private hugetlb mappings.

There is one thing I am unclear about though. hugetlb enforces that
huge_pte_write() is true on FOLL_WRITE in both the fault and
follow_page_mask paths. I am not sure if we can simply assume in the
hugetlb code that if the pte is not writable and this is a write fault
then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?

The latter is more complicated in the fault path because there is no
FAULT_FLAG_FORCE flag.

-- 
Guillaume Morin 



Re: [PATCH net-next v6] net/ipv4: add tracepoint for icmp_send

2024-04-25 Thread Jakub Kicinski
On Tue, 23 Apr 2024 17:23:39 +0800 (CST) xu.xi...@zte.com.cn wrote:
> +#include 
> \ No newline at end of file

Please fix.
-- 
pw-bot: cr



Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

2024-04-25 Thread Google
On Wed, 24 Apr 2024 15:35:15 +0200
Florent Revest  wrote:

> Neat! :) I had a look at mostly the "high level" part (fprobe and
> arm64 specific bits) and this seems to be in a good state to me.
> 

Thanks for the review this long series!

> Thanks for all that work, that is quite a refactoring :)
> 
> On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
>  wrote:
> >
> > Hi,
> >
> > Here is the 9th version of the series to re-implement the fprobe on
> > function-graph tracer. The previous version is;
> >
> > https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
> >
> > This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> > and fixed some bugs + performance optimization patch[36/36].
> >  - [12/36] Fix to clear fgraph_array entry in registration failure, also
> >return -ENOSPC when fgraph_array is full.
> >  - [28/36] Add new store_fprobe_entry_data() for fprobe.
> >  - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> >  - [36/36] Add new flag to skip timestamp recording.
> >
> > Overview
> > 
> > This series does major 2 changes, enable multiple function-graphs on
> > the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> > fprobe on this function-graph.
> >
> > The former changes had been sent from Steven Rostedt 4 years ago (*),
> > which allows users to set different setting function-graph tracer (and
> > other tracers based on function-graph) in each trace-instances at the
> > same time.
> >
> > (*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/
> >
> > The purpose of latter change are;
> >
> >  1) Remove dependency of the rethook from fprobe so that we can reduce
> >the return hook code and shadow stack.
> >
> >  2) Make 'ftrace_regs' the common trace interface for the function
> >boundary.
> >
> > 1) Currently we have 2(or 3) different function return hook codes,
> >  the function-graph tracer and rethook (and legacy kretprobe).
> >  But since this  is redundant and needs double maintenance cost,
> >  I would like to unify those. From the user's viewpoint, function-
> >  graph tracer is very useful to grasp the execution path. For this
> >  purpose, it is hard to use the rethook in the function-graph
> >  tracer, but the opposite is possible. (Strictly speaking, kretprobe
> >  can not use it because it requires 'pt_regs' for historical reasons.)
> >
> > 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> >  wrong for the function entry and exit. Moreover, depending on the
> >  architecture, there is no way to accurately reproduce 'pt_regs'
> >  outside of interrupt or exception handlers. This means fprobe should
> >  not use 'pt_regs' because it does not use such exceptions.
> >  (Conversely, kprobe should use 'pt_regs' because it is an abstract
> >   interface of the software breakpoint exception.)
> >
> > This series changes fprobe to use function-graph tracer for tracing
> > function entry and exit, instead of mixture of ftrace and rethook.
> > Unlike the rethook which is a per-task list of system-wide allocated
> > nodes, the function graph's ret_stack is a per-task shadow stack.
> > Thus it does not need to set 'nr_maxactive' (which is the number of
> > pre-allocated nodes).
> > Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> > Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> > their register interface, this changes it to convert 'ftrace_regs' to
> > 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> > so users must access only registers for function parameters or
> > return value.
> >
> > Design
> > --
> > Instead of using ftrace's function entry hook directly, the new fprobe
> > is built on top of the function-graph's entry and return callbacks
> > with 'ftrace_regs'.
> >
> > Since the fprobe requires access to 'ftrace_regs', the architecture
> > must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> > CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> > entry callback with 'ftrace_regs', and also
> > CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> > return_to_handler.
> >
> > All fprobes share a single function-graph ops (means shares a common
> > ftrace filter) similar to the kprobe-on-ftrace. This needs another
> > layer to find corresponding fprobe in the common function-graph
> > callbacks, but has much better scalability, since the number of
> > registered function-graph ops is limited.
> >
> > In the entry callback, the fprobe runs its entry_handler and saves the
> > address of 'fprobe' on the function-graph's shadow stack as data. The
> > return callback decodes the data to get the 'fprobe' address, and runs
> > the exit_handler.
> >
> > The fprobe introduces two hash-tables, one is for entry callback which
> > searches fprobes related to the given function address passed by entry
> > callback. The other is for a 

Re: [PATCH v10 00/12] Initial Marvell PXA1908 support

2024-04-25 Thread Rob Herring


On Wed, 24 Apr 2024 20:42:27 +0200, Duje Mihanović wrote:
> Hello,
> 
> This series adds initial support for the Marvell PXA1908 SoC and
> "samsung,coreprimevelte", a smartphone using the SoC.
> 
> USB works and the phone can boot a rootfs from an SD card, but there are
> some warnings in the dmesg:
> 
> During SMP initialization:
> [0.006519] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU1: 0x00
> [0.006542] CPU features: Unsupported CPU feature variation detected.
> [0.006589] CPU1: Booted secondary processor 0x01 [0x410fd032]
> [0.010710] Detected VIPT I-cache on CPU2
> [0.010716] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU2: 0x00
> [0.010758] CPU2: Booted secondary processor 0x02 [0x410fd032]
> [0.014849] Detected VIPT I-cache on CPU3
> [0.014855] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU3: 0x00
> [0.014895] CPU3: Booted secondary processor 0x03 [0x410fd032]
> 
> SMMU probing fails:
> [0.101798] arm-smmu c001.iommu: probing hardware configuration...
> [0.101809] arm-smmu c001.iommu: SMMUv1 with:
> [0.101816] arm-smmu c001.iommu: no translation support!
> 
> A 3.14 based Marvell tree is available on GitHub
> acorn-marvell/brillo_pxa_kernel, and a Samsung one on GitHub
> CoderCharmander/g361f-kernel.
> 
> Andreas Färber attempted to upstream support for this SoC in 2017:
> https://lore.kernel.org/lkml/20170222022929.10540-1-afaer...@suse.de/
> 
> Signed-off-by: Duje Mihanović 
> 
> Changes in v10:
> - Update trailers
> - Rebase on v6.9-rc5
> - Clock driver changes:
>   - Add a couple of forgotten clocks in APBC
> - The clocks are thermal_clk, ipc_clk, ssp0_clk, ssp2_clk and swjtag
> - The IDs and register offsets were already present, but I forgot to
>   actually register them
>   - Split each controller block into own file
>   - Drop unneeded -of in clock driver filenames
>   - Simplify struct pxa1908_clk_unit
>   - Convert to platform driver
>   - Add module metadata
> - DTS changes:
>   - Properly name pinctrl nodes
>   - Drop pinctrl #size-cells, #address-cells, ranges and #gpio-size-cells
>   - Fix pinctrl input-schmitt configuration
> - Link to v9: 
> https://lore.kernel.org/20240402-pxa1908-lkml-v9-0-25a003e83...@skole.hr
> 
> Changes in v9:
> - Update trailers and rebase on v6.9-rc2, no changes
> - Link to v8: 
> https://lore.kernel.org/20240110-pxa1908-lkml-v8-0-fea768a59...@skole.hr
> 
> Changes in v8:
> - Drop SSPA patch
> - Drop broken-cd from eMMC node
> - Specify S-Boot hardcoded initramfs location in device tree
> - Add ARM PMU node
> - Correct inverted modem memory base and size
> - Update trailers
> - Rebase on next-20240110
> - Link to v7: 
> https://lore.kernel.org/20231102-pxa1908-lkml-v7-0-cabb1a0cb...@skole.hr
>   and https://lore.kernel.org/20231102152033.5511-1-duje.mihano...@skole.hr
> 
> Changes in v7:
> - Suppress SND_MMP_SOC_SSPA on ARM64
> - Update trailers
> - Rebase on v6.6-rc7
> - Link to v6: 
> https://lore.kernel.org/r/20231010-pxa1908-lkml-v6-0-b2fe09240...@skole.hr
> 
> Changes in v6:
> - Address maintainer comments:
>   - Add "marvell,pxa1908-padconf" binding to pinctrl-single driver
> - Drop GPIO patch as it's been pulled
> - Update trailers
> - Rebase on v6.6-rc5
> - Link to v5: 
> https://lore.kernel.org/r/20230812-pxa1908-lkml-v5-0-a5d51937e...@skole.hr
> 
> Changes in v5:
> - Address maintainer comments:
>   - Move *_NR_CLKS to clock driver from dt binding file
> - Allocate correct number of clocks for each block instead of blindly
>   allocating 50 for each
> - Link to v4: 
> https://lore.kernel.org/r/20230807-pxa1908-lkml-v4-0-cb387d73b...@skole.hr
> 
> Changes in v4:
> - Address maintainer comments:
>   - Relicense clock binding file to BSD-2
> - Add pinctrl-names to SD card node
> - Add vgic registers to GIC node
> - Rebase on v6.5-rc5
> - Link to v3: 
> https://lore.kernel.org/r/20230804-pxa1908-lkml-v3-0-8e48fca37...@skole.hr
> 
> Changes in v3:
> - Address maintainer comments:
>   - Drop GPIO dynamic allocation patch
>   - Move clock register offsets into driver (instead of bindings file)
>   - Add missing Tested-by trailer to u32_fract patch
>   - Move SoC binding to arm/mrvl/mrvl.yaml
> - Add serial0 alias and stdout-path to board dts to enable UART
>   debugging
> - Rebase on v6.5-rc4
> - Link to v2: 
> https://lore.kernel.org/r/20230727162909.6031-1-duje.mihano...@skole.hr
> 
> Changes in v2:
> - Remove earlycon patch as it's been merged into tty-next
> - Address maintainer comments:
>   - Clarify GPIO regressions on older PXA platforms
>   - Add Fixes tag to commit disabling GPIO pinctrl calls for this SoC
>   - Add missing includes to clock driver
>   - Clock driver uses HZ_PER_MHZ, u32_fract and GENMASK
>   - Dual license clock bindings

Re: [PATCH v5 3/5] vduse: Add function to get/free the pages for reconnection

2024-04-25 Thread Michael S. Tsirkin
On Thu, Apr 25, 2024 at 09:35:58AM +0800, Jason Wang wrote:
> On Wed, Apr 24, 2024 at 5:51 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Apr 24, 2024 at 08:44:10AM +0800, Jason Wang wrote:
> > > On Tue, Apr 23, 2024 at 4:42 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Apr 23, 2024 at 11:09:59AM +0800, Jason Wang wrote:
> > > > > On Tue, Apr 23, 2024 at 4:05 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Apr 18, 2024 at 08:57:51AM +0800, Jason Wang wrote:
> > > > > > > On Wed, Apr 17, 2024 at 5:29 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 12, 2024 at 09:28:23PM +0800, Cindy Lu wrote:
> > > > > > > > > Add the function vduse_alloc_reconnnect_info_mem
> > > > > > > > > and vduse_alloc_reconnnect_info_mem
> > > > > > > > > These functions allow vduse to allocate and free memory for 
> > > > > > > > > reconnection
> > > > > > > > > information. The amount of memory allocated is vq_num pages.
> > > > > > > > > Each VQS will map its own page where the reconnection 
> > > > > > > > > information will be saved
> > > > > > > > >
> > > > > > > > > Signed-off-by: Cindy Lu 
> > > > > > > > > ---
> > > > > > > > >  drivers/vdpa/vdpa_user/vduse_dev.c | 40 
> > > > > > > > > ++
> > > > > > > > >  1 file changed, 40 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > > > > > > > > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > index ef3c9681941e..2da659d5f4a8 100644
> > > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > > > > > > > @@ -65,6 +65,7 @@ struct vduse_virtqueue {
> > > > > > > > >   int irq_effective_cpu;
> > > > > > > > >   struct cpumask irq_affinity;
> > > > > > > > >   struct kobject kobj;
> > > > > > > > > + unsigned long vdpa_reconnect_vaddr;
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > >  struct vduse_dev;
> > > > > > > > > @@ -1105,6 +1106,38 @@ static void 
> > > > > > > > > vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)
> > > > > > > > >
> > > > > > > > >   vq->irq_effective_cpu = curr_cpu;
> > > > > > > > >  }
> > > > > > > > > +static int vduse_alloc_reconnnect_info_mem(struct vduse_dev 
> > > > > > > > > *dev)
> > > > > > > > > +{
> > > > > > > > > + unsigned long vaddr = 0;
> > > > > > > > > + struct vduse_virtqueue *vq;
> > > > > > > > > +
> > > > > > > > > + for (int i = 0; i < dev->vq_num; i++) {
> > > > > > > > > + /*page 0~ vq_num save the reconnect info for 
> > > > > > > > > vq*/
> > > > > > > > > + vq = dev->vqs[i];
> > > > > > > > > + vaddr = get_zeroed_page(GFP_KERNEL);
> > > > > > > >
> > > > > > > >
> > > > > > > > I don't get why you insist on stealing kernel memory for 
> > > > > > > > something
> > > > > > > > that is just used by userspace to store data for its own use.
> > > > > > > > Userspace does not lack ways to persist data, for example,
> > > > > > > > create a regular file anywhere in the filesystem.
> > > > > > >
> > > > > > > Good point. So the motivation here is to:
> > > > > > >
> > > > > > > 1) be self contained, no dependency for high speed persist data
> > > > > > > storage like tmpfs
> > > > > >
> > > > > > No idea what this means.
> > > > >
> > > > > I mean a regular file may slow down the datapath performance, so
> > > > > usually the application will try to use tmpfs and other which is a
> > > > > dependency for implementing the reconnection.
> > > >
> > > > Are we worried about systems without tmpfs now?
> > >
> > > Yes.
> >
> > Why? Who ships these?
> 
> Not sure, but it could be disabled or unmounted. I'm not sure make
> VDUSE depends on TMPFS is a good idea.
> 
> Thanks

Don't disable or unmount it then?
The use-case needs to be much clearer if we are adding a way for
userspace to pin kernel memory for unlimited time.

-- 
MST




[syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task

2024-04-25 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:977b1ef51866 Merge tag 'block-6.9-20240420' of git://git.k..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17080d2098
kernel config:  https://syzkaller.appspot.com/x/.config?x=f47e5e015c177e57
dashboard link: https://syzkaller.appspot.com/bug?extid=83e7f982ca045ab4405c
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/549d1add1da9/disk-977b1ef5.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/3e8e501c8aa2/vmlinux-977b1ef5.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/d02f7cb905b8/bzImage-977b1ef5.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+83e7f982ca045ab44...@syzkaller.appspotmail.com

==
WARNING: possible circular locking dependency detected
6.9.0-rc4-syzkaller-00266-g977b1ef51866 #0 Not tainted
--
syz-executor.0/11241 is trying to acquire lock:
888020a2c0d8 (>siglock){-.-.}-{2:2}, at: 
force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334

but task is already holding lock:
8880b943e658 (>__lock){-.-.}-{2:2}, at: 
raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (>__lock){-.-.}-{2:2}:
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
   _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
   raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
   raw_spin_rq_lock kernel/sched/sched.h:1385 [inline]
   _raw_spin_rq_lock_irqsave kernel/sched/sched.h:1404 [inline]
   rq_lock_irqsave kernel/sched/sched.h:1683 [inline]
   class_rq_lock_irqsave_constructor kernel/sched/sched.h:1737 [inline]
   sched_mm_cid_exit_signals+0x17b/0x4b0 kernel/sched/core.c:12005
   exit_signals+0x2a1/0x5c0 kernel/signal.c:3016
   do_exit+0x6a8/0x27e0 kernel/exit.c:837
   __do_sys_exit kernel/exit.c:994 [inline]
   __se_sys_exit kernel/exit.c:992 [inline]
   __pfx___ia32_sys_exit+0x0/0x10 kernel/exit.c:992
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (>siglock){-.-.}-{2:2}:
   check_prev_add kernel/locking/lockdep.c:3134 [inline]
   check_prevs_add kernel/locking/lockdep.c:3253 [inline]
   validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
   __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
   force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
   force_sig_fault_to_task kernel/signal.c:1733 [inline]
   force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738
   __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
   handle_page_fault arch/x86/mm/fault.c:1505 [inline]
   exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
   asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
   strncpy_from_user+0x2c6/0x2f0 lib/strncpy_from_user.c:138
   strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186
   bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:216 [inline]
   bpf_probe_read_compat_str kernel/trace/bpf_trace.c:311 [inline]
   bpf_probe_read_compat_str+0xe9/0x180 kernel/trace/bpf_trace.c:307
   bpf_prog_e42f6260c1b72fb3+0x3d/0x3f
   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
   __bpf_prog_run include/linux/filter.h:657 [inline]
   bpf_prog_run include/linux/filter.h:664 [inline]
   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
   bpf_trace_run4+0x25a/0x490 kernel/trace/bpf_trace.c:2422
   __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222
   trace_sched_switch include/trace/events/sched.h:222 [inline]
   __schedule+0x2535/0x4a00 kernel/sched/core.c:6743
   preempt_schedule_irq+0xfb/0x1c0 kernel/sched/core.c:7068
   irqentry_exit+0x5e/0x90 kernel/entry/common.c:354
   asm_sysvec_apic_timer_interrupt+0x1a/0x20 
arch/x86/include/asm/idtentry.h:702
   force_sig_fault+0x0/0x1d0
   __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
   handle_page_fault arch/x86/mm/fault.c:1505 [inline]
   exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
   asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
   __put_user_handle_exception+0x0/0x10
   __do_sys_gettimeofday kernel/time/time.c:147 [inline]
   

Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Dmitry Baryshkov
On Thu, 25 Apr 2024 at 10:08, Steev Klimaszewski  wrote:
>
> Hi Dmitry,
>
> On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
>  wrote:
> >
> > Protection domain mapper is a QMI service providing mapping between
> > 'protection domains' and services supported / allowed in these domains.
> > For example such mapping is required for loading of the WiFi firmware or
> > for properly starting up the UCSI / altmode / battery manager support.
> >
> > The existing userspace implementation has several issue. It doesn't play
> > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> > firmware location is changed (or if the firmware was not available at
> > the time pd-mapper was started but the corresponding directory is
> > mounted later), etc.
> >
> > However this configuration is largely static and common between
> > different platforms. Provide in-kernel service implementing static
> > per-platform data.
> >
> > Unlike previous revisions of the patchset, this iteration uses static
> > configuration per platform, rather than building it dynamically from the
> > list of DSPs being started.
> >
> > To: Bjorn Andersson 
> > To: Konrad Dybcio 
> > To: Sibi Sankar 
> > To: Mathieu Poirier 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-remotep...@vger.kernel.org
> > Cc: Johan Hovold 
> > Cc: Xilin Wu 
> > Cc: "Bryan O'Donoghue" 
> > --
> >
> > Changes in v7:
> > - Fixed modular build (Steev)
> > - Link to v6: 
> > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
> >
> > Changes in v6:
> > - Reworked mutex to fix lockdep issue on deregistration
> > - Fixed dependencies between PD-mapper and remoteproc to fix modular
> >   builds (Krzysztof)
> > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> > - Fixed kerneldocs (Krzysztof)
> > - Removed extra pr_debug messages (Krzysztof)
> > - Fixed wcss build (Krzysztof)
> > - Added platforms which do not require protection domain mapping to
> >   silence the notice on those platforms
> > - Link to v5: 
> > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
> >
> > Changes in v5:
> > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> > Lew)
> > - pd_mapper: reworked to provide static configuration per platform
> >   (Bjorn)
> > - Link to v4: 
> > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
> >
> > Changes in v4:
> > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> > - Added configuration for sm6350 (Thanks to Luca)
> > - Removed RFC tag (Konrad)
> > - Link to v3: 
> > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
> >
> > Changes in RFC v3:
> > - Send start / stop notifications when PD-mapper domain list is changed
> > - Reworked the way PD-mapper treats protection domains, register all of
> >   them in a single batch
> > - Added SC7180 domains configuration based on TCL Book 14 GO
> > - Link to v2: 
> > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
> >
> > Changes in RFC v2:
> > - Swapped num_domains / domains (Konrad)
> > - Fixed an issue with battery not working on sc8280xp
> > - Added missing configuration for QCS404
> >
> > ---
> > Dmitry Baryshkov (6):
> >   soc: qcom: pdr: protect locator_addr with the main mutex
> >   soc: qcom: pdr: fix parsing of domains lists
> >   soc: qcom: pdr: extract PDR message marshalling data
> >   soc: qcom: qmi: add a way to remove running service
> >   soc: qcom: add pd-mapper implementation
> >   remoteproc: qcom: enable in-kernel PD mapper
> >
> >  drivers/remoteproc/Kconfig  |   4 +
> >  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
> >  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
> >  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
> >  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
> >  drivers/soc/qcom/Kconfig|  14 +
> >  drivers/soc/qcom/Makefile   |   2 +
> >  drivers/soc/qcom/pdr_interface.c|   6 +-
> >  drivers/soc/qcom/pdr_internal.h | 318 ++---
> >  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> > 
> >  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
> >  drivers/soc/qcom/qmi_interface.c|  67 
> >  include/linux/soc/qcom/pd_mapper.h  |  28 ++
> >  include/linux/soc/qcom/qmi.h|   2 +
> >  14 files changed, 1193 insertions(+), 302 deletions(-)
> > ---
> > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
> >
> > Best regards,
> > --
> > Dmitry Baryshkov 
> >
> >
> I've tested this series over a large number of reboots, and the p-d
> devices(?) do always seem to come up (with the pd-mapper service
> disabled) on my Thinkpad X13s.  One less service to run in userland!
> Tested-by: Steev Klimaszewski 

Thank you!

-- 
With best wishes
Dmitry



Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Steev Klimaszewski
Hi Dmitry,

On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> Unlike previous revisions of the patchset, this iteration uses static
> configuration per platform, rather than building it dynamically from the
> list of DSPs being started.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> --
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (6):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: qmi: add a way to remove running service
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/Kconfig  |   4 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
>  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
>  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
>  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
>  drivers/soc/qcom/Kconfig|  14 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|   6 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  drivers/soc/qcom/qmi_interface.c|  67 
>  include/linux/soc/qcom/pd_mapper.h  |  28 ++
>  include/linux/soc/qcom/qmi.h|   2 +
>  14 files changed, 1193 insertions(+), 302 deletions(-)
> ---
> base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>
>
I've tested this series over a large number of reboots, and the p-d
devices(?) do always seem to come up (with the pd-mapper service
disabled) on my Thinkpad X13s.  One less service to run in userland!
Tested-by: Steev Klimaszewski