[PATCH] eventfs: Directly return NULL to avoid null point dereferenced

2024-05-10 Thread hao . ge
From: Hao Ge 

When the condition ei->is_free holds,we return NULL directly to
avoid update_events_attr to use NULL point about ei.

Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default 
ownership")
Signed-off-by: Hao Ge 
---
 fs/tracefs/event_inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index a878cea70f4c..da2827c6acc2 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -346,8 +346,7 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
 * doesn't matter.
 */
if (ei->is_freed) {
-   ei = NULL;
-   break;
+   return NULL;
}
// Walk upwards until you find the events inode
} while (!ei->is_events);
-- 
2.25.1




Re: [PATCH v3 2/2] virtiofs: use GFP_NOFS when enqueuing request through kworker

2024-05-10 Thread Hou Tao
Hi,

On 5/10/2024 7:19 PM, Miklos Szeredi wrote:
> On Fri, 26 Apr 2024 at 16:38, Hou Tao  wrote:
>> From: Hou Tao 
>>
>> When invoking virtio_fs_enqueue_req() through kworker, both the
>> allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
>> Considering the size of the sg array may be greater than PAGE_SIZE, use
>> GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory
>> allocation failure and to avoid unnecessarily depleting the atomic
>> reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly,
>> GFP_KERNEL and memalloc_nofs_{save|restore} helpers are used instead.
> Makes sense.
>
> However, I don't understand why the GFP_NOFS behavior is optional. It
> should work when queuing the request for the first time as well, no?

No. fuse_request_queue_background() may call queue_request_and_unlock()
with fc->bg_lock being held and bg_lock is a spin-lock, so as for now it
is bad to call kmalloc(GFP_NOFS) with a spin-lock being held. The
acquisition of fc->bg_lock in  fuse_request_queue_background() may could
be optimized, but I will leave it for future work.
> Thanks,
> Miklos
> .




Re: [PATCH] modules: Drop the .export_symbol section from the final modules

2024-05-10 Thread Ainux Wang
Hi, everyone:

Could someone help to review this patch?

Best regards,
 Ainux Wang.


Ainux Wang  于2024年5月7日周二 13:15写道:
>
> Hi, everyone:
>
> Could someone review this patch?
>
> Best regards,
> Ainux Wang.
>
>  于2024年4月17日周三 13:35写道:
>
> >
> > From: Wang Yao 
> >
> > Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
> > forget drop the .export_symbol section from the final modules.
> >
> > Signed-off-by: Wang Yao 
> > ---
> >  scripts/module.lds.S | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> > index bf5bcf2836d8..89ff01a22634 100644
> > --- a/scripts/module.lds.S
> > +++ b/scripts/module.lds.S
> > @@ -13,6 +13,7 @@ SECTIONS {
> > /DISCARD/ : {
> > *(.discard)
> > *(.discard.*)
> > +   *(.export_symbol)
> > }
> >
> > __ksymtab   0 : { *(SORT(___ksymtab+*)) }
> > --
> > 2.27.0
> >



[PATCH v2 2/2] drivers: remoteproc: xlnx: add sram support

2024-05-10 Thread Tanmay Shah
AMD-Xilinx zynqmp platform contains on-chip sram memory (OCM).
R5 cores can access OCM and access is faster than DDR memory but slower
than TCM memories available. Sram region can have optional multiple
power-domains.

Signed-off-by: Tanmay Shah 
---

Changes in v2:
  - Fix integer assignement to variable that also fixes following sparse 
warnings

drivers/remoteproc/xlnx_r5_remoteproc.c:995:26: sparse: warning: Using plain 
integer as NULL pointer

 drivers/remoteproc/xlnx_r5_remoteproc.c | 221 +++-
 1 file changed, 220 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 039370cffa32..e0a78a5d4ad5 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -56,6 +56,21 @@ struct mem_bank_data {
char *bank_name;
 };
 
+/**
+ * struct zynqmp_sram_bank - sram bank description
+ *
+ * @sram_res: sram address region information
+ * @power_domains: Array of pm domain id
+ * @num_pd: total pm domain id count
+ * @da: device address of sram
+ */
+struct zynqmp_sram_bank {
+   struct resource sram_res;
+   int *power_domains;
+   int num_pd;
+   u32 da;
+};
+
 /**
  * struct mbox_info
  *
@@ -109,6 +124,8 @@ static const struct mem_bank_data 
zynqmp_tcm_banks_lockstep[] = {
  * struct zynqmp_r5_core
  *
  * @rsc_tbl_va: resource table virtual address
+ * @sram: Array of sram memories assigned to this core
+ * @num_sram: number of sram for this core
  * @dev: device of RPU instance
  * @np: device node of RPU instance
  * @tcm_bank_count: number TCM banks accessible to this RPU
@@ -120,6 +137,8 @@ static const struct mem_bank_data 
zynqmp_tcm_banks_lockstep[] = {
  */
 struct zynqmp_r5_core {
struct resource_table *rsc_tbl_va;
+   struct zynqmp_sram_bank **sram;
+   int num_sram;
struct device *dev;
struct device_node *np;
int tcm_bank_count;
@@ -483,6 +502,69 @@ static int add_mem_regions_carveout(struct rproc *rproc)
return 0;
 }
 
+static int add_sram_carveouts(struct rproc *rproc)
+{
+   struct zynqmp_r5_core *r5_core = rproc->priv;
+   struct rproc_mem_entry *rproc_mem;
+   struct zynqmp_sram_bank *sram;
+   dma_addr_t dma_addr;
+   int da, i, j, ret;
+   size_t len;
+
+   for (i = 0; i < r5_core->num_sram; i++) {
+   sram = r5_core->sram[i];
+
+   dma_addr = (dma_addr_t)sram->sram_res.start;
+   len = resource_size(>sram_res);
+   da = sram->da;
+
+   for (j = 0; j < sram->num_pd; j++) {
+   ret = zynqmp_pm_request_node(sram->power_domains[j],
+
ZYNQMP_PM_CAPABILITY_ACCESS, 0,
+
ZYNQMP_PM_REQUEST_ACK_BLOCKING);
+   if (ret < 0) {
+   dev_err(r5_core->dev,
+   "failed to request on SRAM pd 0x%x",
+   sram->power_domains[j]);
+   goto fail_sram;
+   } else {
+   pr_err("sram pd 0x%x request success\n",
+  sram->power_domains[j]);
+   }
+   }
+
+   /* Register associated reserved memory regions */
+   rproc_mem = rproc_mem_entry_init(>dev, NULL,
+(dma_addr_t)dma_addr,
+len, da,
+zynqmp_r5_mem_region_map,
+zynqmp_r5_mem_region_unmap,
+sram->sram_res.name);
+
+   rproc_add_carveout(rproc, rproc_mem);
+   rproc_coredump_add_segment(rproc, da, len);
+
+   dev_err(>dev, "sram carveout %s addr=%llx, da=0x%x, 
size=0x%lx",
+   sram->sram_res.name, dma_addr, da, len);
+   }
+
+   return 0;
+
+fail_sram:
+   /* Release current sram pd. */
+   while (--j >= 0)
+   zynqmp_pm_release_node(sram->power_domains[j]);
+
+   /* Release previously requested sram pd. */
+   while (--i >= 0) {
+   sram = r5_core->sram[i];
+   for (j = 0; j < sram->num_pd; j++)
+   zynqmp_pm_release_node(sram->power_domains[j]);
+   }
+
+   return ret;
+}
+
 /*
  * tcm_mem_unmap()
  * @rproc: single R5 core's corresponding rproc instance
@@ -659,6 +741,12 @@ static int zynqmp_r5_rproc_prepare(struct rproc *rproc)
return ret;
}
 
+   ret = add_sram_carveouts(rproc);
+   if (ret) {
+   dev_err(>dev, "failed to get sram carveout %d\n", ret);
+   return ret;
+   }
+
return 0;
 }
 
@@ -673,8 +761,9 @@ static 

[PATCH v2 1/2] drivers: remoteproc: xlnx: add attach detach support

2024-05-10 Thread Tanmay Shah
It is possible that remote processor is already running before
linux boot or remoteproc platform driver probe. Implement required
remoteproc framework ops to provide resource table address and
connect or disconnect with remote processor in such case.

Signed-off-by: Tanmay Shah 
---

Changes in v2:
  - Fix following sparse warnings

drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse:expected struct 
rsc_tbl_data *rsc_data_va
drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse:expected struct 
resource_table *rsc_addr
drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse:expected void 
volatile [noderef] __iomem *addr

 drivers/remoteproc/xlnx_r5_remoteproc.c | 164 +++-
 1 file changed, 160 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 84243d1dff9f..039370cffa32 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -25,6 +25,10 @@
 /* RX mailbox client buffer max length */
 #define MBOX_CLIENT_BUF_MAX(IPI_BUF_LEN_MAX + \
 sizeof(struct zynqmp_ipi_message))
+
+#define RSC_TBL_XLNX_MAGIC ((uint32_t)'x' << 24 | (uint32_t)'a' << 16 | \
+(uint32_t)'m' << 8 | (uint32_t)'p')
+
 /*
  * settings for RPU cluster mode which
  * reflects possible values of xlnx,cluster-mode dt-property
@@ -73,6 +77,15 @@ struct mbox_info {
struct mbox_chan *rx_chan;
 };
 
+/* Xilinx Platform specific data structure */
+struct rsc_tbl_data {
+   const int version;
+   const u32 magic_num;
+   const u32 comp_magic_num;
+   const u32 rsc_tbl_size;
+   const uintptr_t rsc_tbl;
+} __packed;
+
 /*
  * Hardcoded TCM bank values. This will stay in driver to maintain backward
  * compatibility with device-tree that does not have TCM information.
@@ -95,20 +108,24 @@ static const struct mem_bank_data 
zynqmp_tcm_banks_lockstep[] = {
 /**
  * struct zynqmp_r5_core
  *
+ * @rsc_tbl_va: resource table virtual address
  * @dev: device of RPU instance
  * @np: device node of RPU instance
  * @tcm_bank_count: number TCM banks accessible to this RPU
  * @tcm_banks: array of each TCM bank data
  * @rproc: rproc handle
+ * @rsc_tbl_size: resource table size retrieved from remote
  * @pm_domain_id: RPU CPU power domain id
  * @ipi: pointer to mailbox information
  */
 struct zynqmp_r5_core {
+   struct resource_table *rsc_tbl_va;
struct device *dev;
struct device_node *np;
int tcm_bank_count;
struct mem_bank_data **tcm_banks;
struct rproc *rproc;
+   u32 rsc_tbl_size;
u32 pm_domain_id;
struct mbox_info *ipi;
 };
@@ -621,10 +638,19 @@ static int zynqmp_r5_rproc_prepare(struct rproc *rproc)
 {
int ret;
 
-   ret = add_tcm_banks(rproc);
-   if (ret) {
-   dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
-   return ret;
+   /**
+* For attach/detach use case, Firmware is already loaded so
+* TCM isn't really needed at all. Also, for security TCM can be
+* locked in such case and linux may not have access at all.
+* So avoid adding TCM banks. TCM power-domains requested during attach
+* callback.
+*/
+   if (rproc->state != RPROC_DETACHED) {
+   ret = add_tcm_banks(rproc);
+   if (ret) {
+   dev_err(>dev, "failed to get TCM banks, err 
%d\n", ret);
+   return ret;
+   }
}
 
ret = add_mem_regions_carveout(rproc);
@@ -662,6 +688,123 @@ static int zynqmp_r5_rproc_unprepare(struct rproc *rproc)
return 0;
 }
 
+static struct resource_table *zynqmp_r5_get_loaded_rsc_table(struct rproc 
*rproc,
+size_t *size)
+{
+   struct zynqmp_r5_core *r5_core;
+
+   r5_core = rproc->priv;
+
+   *size = r5_core->rsc_tbl_size;
+
+   return r5_core->rsc_tbl_va;
+}
+
+static int zynqmp_r5_get_rsc_table_va(struct zynqmp_r5_core *r5_core)
+{
+   struct device *dev = r5_core->dev;
+   struct rsc_tbl_data *rsc_data_va;
+   struct resource_table *rsc_addr;
+   struct resource res_mem;
+   struct device_node *np;
+   int ret;
+
+   /**
+* It is expected from remote processor firmware to provide resource
+* table address via struct rsc_tbl_data data structure.
+* Start address of first entry under "memory-region" property list
+* contains that data structure which holds resource table address, size
+* and some magic number to validate correct resource table entry.
+*/
+   np = of_parse_phandle(r5_core->np, "memory-region", 0);
+   if (!np) {
+   dev_err(dev, "failed to get memory region dev node\n");
+   return -EINVAL;
+   }
+
+   ret = of_address_to_resource(np, 0, 

[PATCH v2 0/2] remoteproc: xlnx: Add attach detach ops and sram support

2024-05-10 Thread Tanmay Shah
Attach detach ops are needed to connect to remote processor that is
running before remoteproc driver is probed. Implement remoteproc
framework ops that enables such use case on AMD-Xilinx platforms.

Remote processor can also use On Chip sram Memory (OCM) for various
purpose. For example, for fast code execution or data access compare
to DDR memory. Such sram region is made available to remoteproc nodes
via "sram" property. Add support in driver to parse and use OCM memory
via sram property.

Changes in v2:
  - Fix following sparse warnings

drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse:expected struct 
rsc_tbl_data *rsc_data_va
drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse:expected struct 
resource_table *rsc_addr
drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse:expected void 
volatile [noderef] __iomem *addr
drivers/remoteproc/xlnx_r5_remoteproc.c:995:26: sparse: warning: Using plain 
integer as NULL pointer

Tanmay Shah (2):
  drivers: remoteproc: xlnx: add attach detach support
  drivers: remoteproc: xlnx: add sram support

 drivers/remoteproc/xlnx_r5_remoteproc.c | 385 +++-
 1 file changed, 380 insertions(+), 5 deletions(-)


base-commit: c8d8f841e95bcc07ac8c5621fc171a24f1fd5cdb
-- 
2.25.1




Re: [PATCH 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

2024-05-10 Thread Reinette Chatre
Hi Dmitrii,

Thank you very much for uncovering and fixing this issue.

On 4/30/2024 7:38 AM, Dmitrii Kuvaiskii wrote:
> On Mon, Apr 29, 2024 at 04:11:03PM +0300, Jarkko Sakkinen wrote:
>> On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
>>> Two enclave threads may try to add and remove the same enclave page
>>> simultaneously (e.g., if the SGX runtime supports both lazy allocation
>>> and `MADV_DONTNEED` semantics). Consider this race:
>>>
>>> 1. T1 performs page removal in sgx_encl_remove_pages() and stops right
>>>after removing the page table entry and right before re-acquiring the
>>>enclave lock to EREMOVE and xa_erase(>page_array) the page.
>>> 2. T2 tries to access the page, and #PF[not_present] is raised. The
>>>condition to EAUG in sgx_vma_fault() is not satisfied because the
>>>page is still present in encl->page_array, thus the SGX driver
>>>assumes that the fault happened because the page was swapped out. The
>>>driver continues on a code path that installs a page table entry
>>>*without* performing EAUG.
>>> 3. The enclave page metadata is in inconsistent state: the PTE is
>>>installed but there was no EAUG. Thus, T2 in userspace infinitely
>>>receives SIGSEGV on this page (and EACCEPT always fails).
>>>
>>> Fix this by making sure that T1 (the page-removing thread) always wins
>>> this data race. In particular, the page-being-removed is marked as such,
>>> and T2 retries until the page is fully removed.
>>>
>>> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
>>> Cc: sta...@vger.kernel.org
>>> Signed-off-by: Dmitrii Kuvaiskii 
>>> ---
>>>  arch/x86/kernel/cpu/sgx/encl.c  | 3 ++-
>>>  arch/x86/kernel/cpu/sgx/encl.h  | 3 +++
>>>  arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
>>>  3 files changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>>> index 41f14b1a3025..7ccd8b2fce5f 100644
>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>>> @@ -257,7 +257,8 @@ static struct sgx_encl_page 
>>> *__sgx_encl_load_page(struct sgx_encl *encl,
>>>  
>>> /* Entry successfully located. */
>>> if (entry->epc_page) {
>>> -   if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
>>> +   if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
>>> +  SGX_ENCL_PAGE_BEING_REMOVED))
>>> return ERR_PTR(-EBUSY);
>>>  
>>> return entry;
>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
>>> index f94ff14c9486..fff5f2293ae7 100644
>>> --- a/arch/x86/kernel/cpu/sgx/encl.h
>>> +++ b/arch/x86/kernel/cpu/sgx/encl.h
>>> @@ -25,6 +25,9 @@
>>>  /* 'desc' bit marking that the page is being reclaimed. */
>>>  #define SGX_ENCL_PAGE_BEING_RECLAIMED  BIT(3)
>>>  
>>> +/* 'desc' bit marking that the page is being removed. */
>>> +#define SGX_ENCL_PAGE_BEING_REMOVEDBIT(2)
>>> +
>>>  struct sgx_encl_page {
>>> unsigned long desc;
>>> unsigned long vm_max_prot_bits:8;
>>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c 
>>> b/arch/x86/kernel/cpu/sgx/ioctl.c
>>> index b65ab214bdf5..c542d4dd3e64 100644
>>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>>> @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl 
>>> *encl,
>>>  * Do not keep encl->lock because of dependency on
>>>  * mmap_lock acquired in sgx_zap_enclave_ptes().
>>>  */
>>> +   entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
>>> mutex_unlock(>lock);
>>>  
>>> sgx_zap_enclave_ptes(encl, addr);
>>
>> It is somewhat trivial to NAK this as the commit message does
>> not do any effort describing the new flag. By default at least
>> I have strong opposition against any new flags related to
>> reclaiming even if it needs a bit of extra synchronization
>> work in the user space.
>>
>> One way to describe concurrency scenarios would be to take
>> example from https://www.kernel.org/doc/Documentation/memory-barriers.txt
>>
>> I.e. see the examples with CPU 1 and CPU 2.
> 
> Thank you for the suggestion. Here is my new attempt at describing the racy
> scenario:
> 
> Consider some enclave page added to the enclave. User space decides to
> temporarily remove this page (e.g., emulating the MADV_DONTNEED semantics)
> on CPU1. At the same time, user space performs a memory access on the same
> page on CPU2, which results in a #PF and ultimately in sgx_vma_fault().
> Scenario proceeds as follows:
> 
> /*
>  * CPU1: User space performs
>  * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
>  * on a single enclave page
>  */
> sgx_encl_remove_pages() {
> 
>   mutex_lock(>lock);
> 
>   entry = sgx_encl_load_page(encl);
>   /*
>* verify that page is
>* trimmed and accepted
>*/
> 
>   mutex_unlock(>lock);
> 
>   /*
>* remove PTE entry; cannot
>* be performed under lock
>*/
>   

Re: [PATCH 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS

2024-05-10 Thread Reinette Chatre
Hi Dmitrii,

Thank you so much for finding as well as fixing this issue.

On 4/30/2024 7:37 AM, Dmitrii Kuvaiskii wrote:
> On Mon, Apr 29, 2024 at 04:04:24PM +0300, Jarkko Sakkinen wrote:
>> On Mon Apr 29, 2024 at 1:43 PM EEST, Dmitrii Kuvaiskii wrote:
>>> Two enclave threads may try to access the same non-present enclave page
>>> simultaneously (e.g., if the SGX runtime supports lazy allocation). The
>>> threads will end up in sgx_encl_eaug_page(), racing to acquire the
>>> enclave lock. The winning thread will perform EAUG, set up the page
>>> table entry, and insert the page into encl->page_array. The losing
>>> thread will then get -EBUSY on xa_insert(>page_array) and proceed
>>> to error handling path.
>>
>> And that path removes page. Not sure I got gist of this tbh.
> 
> Well, this is not about a redundant EREMOVE performed. This is about the
> enclave page becoming inaccessible due to a bug triggered with a data race.
> 
> Consider some enclave page not yet added to the enclave. The enclave
> performs a memory access to it at the same time on CPU1 and CPU2. Since the
> page does not yet have a corresponding PTE, the #PF handler on both CPUs
> calls sgx_vma_fault(). Scenario proceeds as follows:
> 
> /*
>  * Fault on CPU1
>  */
> sgx_vma_fault() {
> 
>   xa_load(>page_array) == NULL ->
> 
>   sgx_encl_eaug_page() {
> 
> .../*
> * Fault on CPU2
> */
>sgx_vma_fault() {
> 
>  xa_load(>page_array) == NULL ->
> 
>  sgx_encl_eaug_page() {
> 
>...
> 

Up to here it may be helpful to have the CPU1 and CPU2 code run concurrently
to highlight the race. First one to get the mutex "wins".

>mutex_lock(>lock);
>/*
> * alloc encl_page
> */

Please note that encl_page is allocated before mutex is obtained.

>/*
> * alloc EPC page
> */
>epc_page = sgx_alloc_epc_page(...);
>/*
> * add page_to enclave's xarray

"page_to" -> "page to" ?

> */
>xa_insert(>page_array, ...);
>/*
> * add page to enclave via EAUG
> * (page is in pending state)
> */
>/*
> * add PTE entry
> */
>vmf_insert_pfn(...);
> 
>mutex_unlock(>lock);
>return VM_FAULT_NOPAGE;
>  }
>}

A brief comment under CPU2 essentially stating that this is a "good"
flow may help. Something like: "All good up to here. Enclave page successfully
added to enclave, ready for EACCEPT from user space". (please feel free to
improve)

>  mutex_lock(>lock);
>  /*
>   * alloc encl_page
>   */

This should be outside mutex_lock(). It can even be shown earlier how
CPU1 and CPU2 can allocate encl_page concurrently (which is fine to do).

>  /*
>   * alloc EPC page
>   */
>  epc_page = sgx_alloc_epc_page(...);
>  /*
>   * add page_to enclave's xarray,

hmmm ... is page_to actually intended?

>   * this fails with -EBUSY

It may help to highlight that this failure is because CPU1 and CPU2 are both
attempting to access the same page thus the page was already added in CPU2 flow.

>   */
>  xa_insert(>page_array, ...);
> 
>err_out_shrink:
>  sgx_encl_free_epc_page(epc_page) {
>/*
> * remove page via EREMOVE
> */

This needs emphasis that this is *BAD*. Something like:
"BUG: Enclave page added from CPU2 is yanked (via EREMOVE)
from enclave while it remains "accessible" from OS perspective 
PTE installed with entry in OS's page_array)."

(please feel free to improve)

>/*
> * free EPC page
> */
>sgx_free_epc_page(epc_page);
>  }
> 
>   mutex_unlock(>lock);
>   return VM_FAULT_SIGBUS;

This needs emphasis that this is *BAD*. "BUG: SIGBUS is
returned for a valid enclave page."  (please feel free to
improve)

> }
>   }
> 
> CPU2 added the enclave page (in pending state) to the enclave and installed
> the PTE. The kernel gives control back to the user space, without raising a
> 

[ANNOUNCE] 5.10.216-rt108

2024-05-10 Thread Luis Claudio R. Goncalves
Hello RT-list!

I'm pleased to announce the 5.10.216-rt108 stable release.

This release is just an update to the new stable 5.10.216 version,
without any RT specific changes.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v5.10-rt
  Head SHA1: f0ba9d06ef6b1edce9a62b662415d70e000efdd8

Or to build 5.10.216-rt108 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz

  https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.216.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.216-rt108.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis




Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-10 Thread Dave Hansen
On 5/10/24 12:06, Dongli Zhang wrote:
>   } else {
> + /*
> +  * This call borrows from the comments and implementation
> +  * of apic_update_vector(): "If the target CPU is offline
> +  * then the regular release mechanism via the cleanup
> +  * vector is not possible and the vector can be immediately
> +  * freed in the underlying matrix allocator.".
> +  */
> + irq_matrix_free(vector_matrix, apicd->prev_cpu,
> + apicd->prev_vector, apicd->is_managed);
>   apicd->prev_vector = 0;
>   }

I know it's just two sites, but I'd much rather spend the space on a
helper function than a copy-and-pasted comment.  Wouldn't something like
this make it like stupidly obvious what's going on:

if (cpu_online(cpu)) {
...
} else {
irq_matrix_free_offline(apicd->prev_cpu,
apicd->prev_vector,
apicd->is_managed);
apicd->prev_vector = 0;
}

/* Free a vector when the target CPU is offline */
static void irq_matrix_free_offline(...)
{
lockdep_assert_held(_lock);
WARN_ON_ONCE(!cpu_offline(apicd->prev_cpu));

/*
 * The regular release mechanism via the cleanup vector is not
 * possible.  Immediately free the vector in the underlying
 * matrix allocator.
 */
irq_matrix_free(, cpu, vector, managed);
}

It would also be rather hard to screw up even if someone called it on an
online CPU because you'd get a nice warning.



[PATCH RESEND 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-10 Thread Dongli Zhang
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.

When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but if the old vector on the original CPU remains online, it is not
immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the
reclaiming process is delayed until the next trigger of the interrupt on
the new CPU.

Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming vectors.

However, if the old CPU is offline before the interrupt triggers again on
the new CPU, irq_complete_move() simply resets both apicd->move_in_progress
and apicd->prev_vector to 0. Consequently, the vector remains unreclaimed
in vector_matrix, resulting in a CPU vector leak.

To address this issue, the fix borrows from the comments and implementation
of apic_update_vector(): "If the target CPU is offline then the regular
release mechanism via the cleanup vector is not possible and the vector can
be immediately freed in the underlying matrix allocator.".

Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
Correct the email of b...@alien8.de. Sorry for my fault. 

 arch/x86/kernel/apic/vector.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 185738c72766..aad189a3bac9 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -1036,6 +1036,15 @@ static void __vector_schedule_cleanup(struct 
apic_chip_data *apicd)
add_timer_on(>timer, cpu);
}
} else {
+   /*
+* This call borrows from the comments and implementation
+* of apic_update_vector(): "If the target CPU is offline
+* then the regular release mechanism via the cleanup
+* vector is not possible and the vector can be immediately
+* freed in the underlying matrix allocator.".
+*/
+   irq_matrix_free(vector_matrix, apicd->prev_cpu,
+   apicd->prev_vector, apicd->is_managed);
apicd->prev_vector = 0;
}
raw_spin_unlock(_lock);
-- 
2.39.3




[PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-10 Thread Dongli Zhang
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.

When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but if the old vector on the original CPU remains online, it is not
immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the
reclaiming process is delayed until the next trigger of the interrupt on
the new CPU.

Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming vectors.

However, if the old CPU is offline before the interrupt triggers again on
the new CPU, irq_complete_move() simply resets both apicd->move_in_progress
and apicd->prev_vector to 0. Consequently, the vector remains unreclaimed
in vector_matrix, resulting in a CPU vector leak.

To address this issue, the fix borrows from the comments and implementation
of apic_update_vector(): "If the target CPU is offline then the regular
release mechanism via the cleanup vector is not possible and the vector can
be immediately freed in the underlying matrix allocator.".

Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 arch/x86/kernel/apic/vector.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 185738c72766..aad189a3bac9 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -1036,6 +1036,15 @@ static void __vector_schedule_cleanup(struct 
apic_chip_data *apicd)
add_timer_on(>timer, cpu);
}
} else {
+   /*
+* This call borrows from the comments and implementation
+* of apic_update_vector(): "If the target CPU is offline
+* then the regular release mechanism via the cleanup
+* vector is not possible and the vector can be immediately
+* freed in the underlying matrix allocator.".
+*/
+   irq_matrix_free(vector_matrix, apicd->prev_cpu,
+   apicd->prev_vector, apicd->is_managed);
apicd->prev_vector = 0;
}
raw_spin_unlock(_lock);
-- 
2.39.3




Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread Steven Rostedt
On Fri, 10 May 2024 12:03:12 +0100
Vincent Donnefort  wrote:

> > I'm not particularly happy about us calling vm_insert_pages with NULL
> > pointers stored in pages.
> > 
> > Should we instead do
> > 
> > if (WARN_ON_ONCE(s >= nr_subbufs)) {
> > err = -EINVAL;
> > goto out;
> > }
> > 
> > ?  
> 
> I could also nr_pages = p in the event of s >= nr_subbufs... but that
> really that shouldn't happen so let's return an error.

I'm good with this. It should never happen anyway.

-- Steve



Re: [PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-05-10 Thread Mathieu Poirier
On Thu, May 09, 2024 at 10:54:51AM -0500, Andrew Davis wrote:
> On 5/9/24 10:22 AM, Mathieu Poirier wrote:
> > On Wed, 8 May 2024 at 09:36, Andrew Davis  wrote:
> > > 
> > > On 5/6/24 3:46 PM, Mathieu Poirier wrote:
> > > > Good day,
> > > > 
> > > > I have started reviewing this patchset.  Comments will be scattered over
> > > > multiple days and as such, I will explicitly inform you when  am done 
> > > > with the
> > > > review.
> > > > 
> > > > On Fri, Apr 26, 2024 at 02:18:08PM -0500, Andrew Davis wrote:
> > > > > From: Martyn Welch 
> > > > > 
> > > > > The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> > > > > the MCU domain. This core is typically used for safety applications 
> > > > > in a
> > > > > stand alone mode. However, some application (non safety related) may
> > > > > want to use the M4F core as a generic remote processor with IPC to the
> > > > > host processor. The M4F core has internal IRAM and DRAM memories and 
> > > > > are
> > > > > exposed to the system bus for code and data loading.
> > > > > 
> > > > > A remote processor driver is added to support this subsystem, 
> > > > > including
> > > > > being able to load and boot the M4F core. Loading includes to M4F
> > > > > internal memories and predefined external code/data memories. The
> > > > > carve outs for external contiguous memory is defined in the M4F device
> > > > > node and should match with the external memory declarations in the M4F
> > > > > image binary. The M4F subsystem has two resets. One reset is for the
> > > > > entire subsystem i.e including the internal memories and the other, a
> > > > > local reset is only for the M4F processing core. When loading the 
> > > > > image,
> > > > > the driver first releases the subsystem reset, loads the firmware 
> > > > > image
> > > > > and then releases the local reset to let the M4F processing core run.
> > > > > 
> > > > > Signed-off-by: Martyn Welch 
> > > > > Signed-off-by: Hari Nagalla 
> > > > > Signed-off-by: Andrew Davis 
> > > > > ---
> > > > >drivers/remoteproc/Kconfig   |  13 +
> > > > >drivers/remoteproc/Makefile  |   1 +
> > > > >drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 
> > > > > +++
> > > > >3 files changed, 799 insertions(+)
> > > > >create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> > > > > 
> > > > > diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> > > > > index 48845dc8fa852..1a7c0330c91a9 100644
> > > > > --- a/drivers/remoteproc/Kconfig
> > > > > +++ b/drivers/remoteproc/Kconfig
> > > > > @@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
> > > > > It's safe to say N here if you're not interested in utilizing
> > > > > the DSP slave processors.
> > > > > 
> > > > > +config TI_K3_M4_REMOTEPROC
> > > > > +tristate "TI K3 M4 remoteproc support"
> > > > > +depends on ARCH_K3 || COMPILE_TEST
> > > > > +select MAILBOX
> > > > > +select OMAP2PLUS_MBOX
> > > > > +help
> > > > > +  Say m here to support TI's M4 remote processor subsystems
> > > > > +  on various TI K3 family of SoCs through the remote processor
> > > > > +  framework.
> > > > > +
> > > > > +  It's safe to say N here if you're not interested in utilizing
> > > > > +  a remote processor.
> > > > > +
> > > > >config TI_K3_R5_REMOTEPROC
> > > > >   tristate "TI K3 R5 remoteproc support"
> > > > >   depends on ARCH_K3
> > > > > diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> > > > > index 91314a9b43cef..5ff4e2fee4abd 100644
> > > > > --- a/drivers/remoteproc/Makefile
> > > > > +++ b/drivers/remoteproc/Makefile
> > > > > @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)+= 
> > > > > st_remoteproc.o
> > > > >obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
> > > > >obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
> > > > >obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
> > > > > +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)   += ti_k3_m4_remoteproc.o
> > > > >obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
> > > > >obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
> > > > > diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> > > > > b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> > > > > new file mode 100644
> > > > > index 0..0030e509f6b5d
> > > > > --- /dev/null
> > > > > +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> > > > > @@ -0,0 +1,785 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > > +/*
> > > > > + * TI K3 Cortex-M4 Remote Processor(s) driver
> > > > > + *
> > > > > + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> > > > > https://www.ti.com/
> > > > > + *  Hari Nagalla 
> > > > > + */
> > > > > +
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > +#include 
> > > > > 

Re: [PATCH v23 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread David Hildenbrand

On 10.05.24 16:04, Vincent Donnefort wrote:

In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

   ring_buffer_{map,unmap}()

And controls on the ring-buffer:

   ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

   A unique ID for each subbuf of the ring-buffer, currently they are
   only identified through their in-kernel VA.

   A meta-page, where are stored ring-buffer statistics and a
   description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
  #include 
  #include 
  
+#include 

+
  struct trace_buffer;
  struct ring_buffer_iter;
  
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);

  #define trace_rb_cpu_prepare  NULL
  #endif
  
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,

+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
  #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..b682e9925539
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Internal use only.
+ * @Reserved2: Internal use only.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..849f6942bb96 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -26,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
  };
  
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {

u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -1599,6 +1608,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
init_irq_work(_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(_buffer->irq_work.waiters);

[PATCH v23 4/5] Documentation: tracing: Add ring-buffer mapping

2024-05-10 Thread Vincent Donnefort
It is now possible to mmap() a ring-buffer to stream its content. Add
some documentation and a code example.

Signed-off-by: Vincent Donnefort 

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 5092d6c13af5..0b300901fd75 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -29,6 +29,7 @@ Linux Tracing Technologies
timerlat-tracer
intel_th
ring-buffer-design
+   ring-buffer-map
stm
sys-t
coresight/index
diff --git a/Documentation/trace/ring-buffer-map.rst 
b/Documentation/trace/ring-buffer-map.rst
new file mode 100644
index ..8e296bcc0d7f
--- /dev/null
+++ b/Documentation/trace/ring-buffer-map.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Tracefs ring-buffer memory mapping
+==
+
+:Author: Vincent Donnefort 
+
+Overview
+
+Tracefs ring-buffer memory map provides an efficient method to stream data
+as no memory copy is necessary. The application mapping the ring-buffer becomes
+then a consumer for that ring-buffer, in a similar fashion to trace_pipe.
+
+Memory mapping setup
+
+The mapping works with a mmap() of the trace_pipe_raw interface.
+
+The first system page of the mapping contains ring-buffer statistics and
+description. It is referred to as the meta-page. One of the most important
+fields of the meta-page is the reader. It contains the sub-buffer ID which can
+be safely read by the mapper (see ring-buffer-design.rst).
+
+The meta-page is followed by all the sub-buffers, ordered by ascending ID. It 
is
+therefore effortless to know where the reader starts in the mapping:
+
+.. code-block:: c
+
+reader_id = meta->reader->id;
+reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
+
+When the application is done with the current reader, it can get a new one 
using
+the trace_pipe_raw ioctl() TRACE_MMAP_IOCTL_GET_READER. This ioctl also updates
+the meta-page fields.
+
+Limitations
+===
+When a mapping is in place on a Tracefs ring-buffer, it is not possible to
+either resize it (either by increasing the entire size of the ring-buffer or
+each subbuf). It is also not possible to use snapshot and causes splice to copy
+the ring buffer data instead of using the copyless swap from the ring buffer.
+
+Concurrent readers (either another application mapping that ring-buffer or the
+kernel with trace_pipe) are allowed but not recommended. They will compete for
+the ring-buffer and the output is unpredictable, just like concurrent readers 
on
+trace_pipe would be.
+
+Example
+===
+
+.. code-block:: c
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+
+#define TRACE_PIPE_RAW 
"/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw"
+
+int main(void)
+{
+int page_size = getpagesize(), fd, reader_id;
+unsigned long meta_len, data_len;
+struct trace_buffer_meta *meta;
+void *map, *reader, *data;
+
+fd = open(TRACE_PIPE_RAW, O_RDONLY | O_NONBLOCK);
+if (fd < 0)
+exit(EXIT_FAILURE);
+
+map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
+if (map == MAP_FAILED)
+exit(EXIT_FAILURE);
+
+meta = (struct trace_buffer_meta *)map;
+meta_len = meta->meta_page_size;
+
+printf("entries:%llu\n", meta->entries);
+printf("overrun:%llu\n", meta->overrun);
+printf("read:   %llu\n", meta->read);
+printf("nr_subbufs: %u\n", meta->nr_subbufs);
+
+data_len = meta->subbuf_size * meta->nr_subbufs;
+data = mmap(NULL, data_len, PROT_READ, MAP_SHARED, fd, 
meta_len);
+if (data == MAP_FAILED)
+exit(EXIT_FAILURE);
+
+if (ioctl(fd, TRACE_MMAP_IOCTL_GET_READER) < 0)
+exit(EXIT_FAILURE);
+
+reader_id = meta->reader.id;
+reader = data + meta->subbuf_size * reader_id;
+
+printf("Current reader address: %p\n", reader);
+
+munmap(data, data_len);
+munmap(meta, meta_len);
+close (fd);
+
+return 0;
+}
-- 
2.45.0.118.g7fe29c98d7-goog




[PATCH v23 3/5] tracing: Allow user-space mapping of the ring-buffer

2024-05-10 Thread Vincent Donnefort
Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.

A solution for that problem is to let the user-space map the ring-buffer
directly.

The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:

  * Meta-page -- include/uapi/linux/trace_mmap.h for a description
  * Subbuf ID 0
  * Subbuf ID 1
 ...

It is therefore easy to translate a subbuf ID into an offset in the
mapping:

  reader_id = meta->reader->id;
  reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;

When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.

Mapping will prevent snapshot and buffer size modifications.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
index b682e9925539..bd1066754220 100644
--- a/include/uapi/linux/trace_mmap.h
+++ b/include/uapi/linux/trace_mmap.h
@@ -43,4 +43,6 @@ struct trace_buffer_meta {
__u64   Reserved2;
 };
 
+#define TRACE_MMAP_IOCTL_GET_READER_IO('T', 0x1)
+
 #endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 233d1af39fff..a35e7f598233 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1191,6 +1191,12 @@ static void tracing_snapshot_instance_cond(struct 
trace_array *tr,
return;
}
 
+   if (tr->mapped) {
+   trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
+   trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
+   return;
+   }
+
local_irq_save(flags);
update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
@@ -1323,7 +1329,7 @@ static int tracing_arm_snapshot_locked(struct trace_array 
*tr)
lockdep_assert_held(_types_lock);
 
spin_lock(>snapshot_trigger_lock);
-   if (tr->snapshot == UINT_MAX) {
+   if (tr->snapshot == UINT_MAX || tr->mapped) {
spin_unlock(>snapshot_trigger_lock);
return -EBUSY;
}
@@ -6068,7 +6074,7 @@ static void tracing_set_nop(struct trace_array *tr)
 {
if (tr->current_trace == _trace)
return;
-   
+
tr->current_trace->enabled--;
 
if (tr->current_trace->reset)
@@ -8194,15 +8200,32 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
return ret;
 }
 
-/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
 static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
 {
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = >iter;
+   int err;
+
+   if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
+   if (!(file->f_flags & O_NONBLOCK)) {
+   err = ring_buffer_wait(iter->array_buffer->buffer,
+  iter->cpu_file,
+  iter->tr->buffer_percent,
+  NULL, NULL);
+   if (err)
+   return err;
+   }
 
-   if (cmd)
-   return -ENOIOCTLCMD;
+   return ring_buffer_map_get_reader(iter->array_buffer->buffer,
+ iter->cpu_file);
+   } else if (cmd) {
+   return -ENOTTY;
+   }
 
+   /*
+* An ioctl call with cmd 0 to the ring buffer file will wake up all
+* waiters
+*/
mutex_lock(_types_lock);
 
/* Make sure the waiters see the new wait_index */
@@ -8214,6 +8237,76 @@ static long tracing_buffers_ioctl(struct file *file, 
unsigned int cmd, unsigned
return 0;
 }
 
+#ifdef CONFIG_TRACER_MAX_TRACE
+static int get_snapshot_map(struct trace_array *tr)
+{
+   int err = 0;
+
+   /*
+* Called with mmap_lock held. lockdep would be unhappy if we would now
+* take trace_types_lock. Instead use the specific
+* snapshot_trigger_lock.
+*/
+   spin_lock(>snapshot_trigger_lock);
+
+   if (tr->snapshot || tr->mapped == UINT_MAX)
+   err = -EBUSY;
+   else
+   tr->mapped++;
+
+   spin_unlock(>snapshot_trigger_lock);
+
+   /* Wait for update_max_tr() to observe iter->tr->mapped */
+   if (tr->mapped == 1)
+   synchronize_rcu();
+
+   return err;
+
+}
+static void put_snapshot_map(struct trace_array *tr)
+{
+   spin_lock(>snapshot_trigger_lock);
+   if (!WARN_ON(!tr->mapped))
+  

[PATCH v23 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread Vincent Donnefort
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

  ring_buffer_{map,unmap}()

And controls on the ring-buffer:

  ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

  A unique ID for each subbuf of the ring-buffer, currently they are
  only identified through their in-kernel VA.

  A meta-page, where are stored ring-buffer statistics and a
  description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+#include 
+
 struct trace_buffer;
 struct ring_buffer_iter;
 
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
hlist_node *node);
 #define trace_rb_cpu_prepare   NULL
 #endif
 
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,
+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..b682e9925539
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Internal use only.
+ * @Reserved2: Internal use only.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..849f6942bb96 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -26,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
 };
 
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -1599,6 +1608,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
init_irq_work(_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(_buffer->irq_work.waiters);
init_waitqueue_head(_buffer->irq_work.full_waiters);
+   

[PATCH v23 1/5] ring-buffer: Allocate sub-buffers with __GFP_COMP

2024-05-10 Thread Vincent Donnefort
In preparation for the ring-buffer memory mapping, allocate compound
pages for the ring-buffer sub-buffers to enable us to map them to
user-space with vm_insert_pages().

Acked-by: David Hildenbrand 
Signed-off-by: Vincent Donnefort 

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..cc9ebe593571 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1524,7 +1524,7 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu 
*cpu_buffer,
list_add(>list, pages);
 
page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
-   mflags | __GFP_ZERO,
+   mflags | __GFP_COMP | __GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page)
goto free_pages;
@@ -1609,7 +1609,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
 
cpu_buffer->reader_page = bpage;
 
-   page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_ZERO,
+   page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_COMP | 
__GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page)
goto fail_free_reader;
@@ -5579,7 +5579,7 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, 
int cpu)
goto out;
 
page = alloc_pages_node(cpu_to_node(cpu),
-   GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO,
+   GFP_KERNEL | __GFP_NORETRY | __GFP_COMP | 
__GFP_ZERO,
cpu_buffer->buffer->subbuf_order);
if (!page) {
kfree(bpage);
-- 
2.45.0.118.g7fe29c98d7-goog




[PATCH v23 0/5] Introducing trace buffer mapping by user-space

2024-05-10 Thread Vincent Donnefort
The tracing ring-buffers can be stored on disk or sent to network
without any copy via splice. However the later doesn't allow real time
processing of the traces. A solution is to give userspace direct access
to the ring-buffer pages via a mapping. An application can now become a
consumer of the ring-buffer, in a similar fashion to what trace_pipe
offers.

Support for this new feature can already be found in libtracefs from
version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE.

Vincent

v22 -> v23:
  * Remove VM_IO (Prevent ptrace and does not bring any other useful
protection).
  * Bring back DONTDUMP (As we removed VM_IO).
  * Add a check for root user in map_test.
  * Make the s/nr_subbufs check a WARN.

v21 -> v22:
  * Remove DONTDUMP (VM_IO implies DONTDUMP already)
  * Remove MIXEDMAP (implicit when using vm_insert_page)
  * Remove PFNMAP (We do not perform raw PFN mappings and MIXEDMAP is
already implicitely set)
  * Add comments to justify the VM_* flags

v20 -> v21:
  * Collect Ack
  * Add .gitignore
  * Few nits
  * Remove meta-page padding (zero-page not supported by vm_insert_pages)
  * Remove single-usage macros
  * Move vma flags handling into ring-buffer.c

v19 -> v20:
  * Fix typos in documentation.
  * Remove useless mmap open and fault callbacks.
  * add mm.h include for vm_insert_pages

v18 -> v19:
  * Use VM_PFNMAP and vm_insert_pages
  * Allocate ring-buffer subbufs with __GFP_COMP
  * Pad the meta-page with the zero-page to align on the subbuf_order
  * Extend the ring-buffer test with mmap() dedicated suite

v17 -> v18:
  * Fix lockdep_assert_held
  * Fix spin_lock_init typo
  * Fix CONFIG_TRACER_MAX_TRACE typo

v16 -> v17:
  * Documentation and comments improvements.
  * Create get/put_snapshot_map() for clearer code.
  * Replace kzalloc with kcalloc.
  * Fix -ENOMEM handling in rb_alloc_meta_page().
  * Move flush(cpu_buffer->reader_page) behind the reader lock.
  * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer
mutex. (removes READ_ONCE/WRITE_ONCE accesses).

v15 -> v16:
  * Add comment for the dcache flush.
  * Remove now unnecessary WRITE_ONCE for the meta-page.

v14 -> v15:
  * Add meta-page and reader-page flush. Intends to fix the mapping
for VIVT and aliasing-VIPT data caches.
  * -EPERM on VM_EXEC.
  * Fix build warning !CONFIG_TRACER_MAX_TRACE.

v13 -> v14:
  * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu)
  * on unmap, sync meta-page teardown with the reader_lock instead of
the synchronize_rcu.
  * Add a dedicated spinlock for trace_array ->snapshot and ->mapped.
(intends to fix a lockdep issue)
  * Add kerneldoc for flags and Reserved fields.
  * Add kselftest for snapshot/map mutual exclusion.

v12 -> v13:
  * Swap subbufs_{touched,lost} for Reserved fields.
  * Add a flag field in the meta-page.
  * Fix CONFIG_TRACER_MAX_TRACE.
  * Rebase on top of trace/urgent.
  * Add a comment for try_unregister_trigger()

v11 -> v12:
  * Fix code sample mmap bug.
  * Add logging in sample code.
  * Reset tracer in selftest.
  * Add a refcount for the snapshot users.
  * Prevent mapping when there are snapshot users and vice versa.
  * Refine the meta-page.
  * Fix types in the meta-page.
  * Collect Reviewed-by.

v10 -> v11:
  * Add Documentation and code sample.
  * Add a selftest.
  * Move all the update to the meta-page into a single
rb_update_meta_page().
  * rb_update_meta_page() is now called from
ring_buffer_map_get_reader() to fix NOBLOCK callers.
  * kerneldoc for struct trace_meta_page.
  * Add a patch to zero all the ring-buffer allocations.

v9 -> v10:
  * Refactor rb_update_meta_page()
  * In-loop declaration for foreach_subbuf_page()
  * Check for cpu_buffer->mapped overflow

v8 -> v9:
  * Fix the unlock path in ring_buffer_map()
  * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer
  * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a)

v7 -> v8:
  * Drop the subbufs renaming into bpages
  * Use subbuf as a name when relevant

v6 -> v7:
  * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/
  * Support for subbufs
  * Rename subbufs into bpages

v5 -> v6:
  * Rebase on next-20230802.
  * (unsigned long) -> (void *) cast for virt_to_page().
  * Add a wait for the GET_READER_PAGE ioctl.
  * Move writer fields update (overrun/pages_lost/entries/pages_touched)
in the irq_work.
  * Rearrange id in struct buffer_page.
  * Rearrange the meta-page.
  * ring_buffer_meta_page -> trace_buffer_meta_page.
  * Add meta_struct_len into the meta-page.

v4 -> v5:
  * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3)

v3 -> v4:
  * Add to the meta-page:
   - pages_lost / pages_read (allow to compute how full is the
 ring-buffer)
   - read (allow to compute how many entries can be read)
   - A reader_page struct.
  * Rename ring_buffer_meta_header -> ring_buffer_meta
  * Rename ring_buffer_get_reader_page -> 

Re: [PATCH 0/2] Add basic APR sound support for SC7280 SoC

2024-05-10 Thread Rob Herring (Arm)


On Fri, 10 May 2024 14:27:07 +0200, Luca Weiss wrote:
> Validated on Fairphone 5 (QCM6490) smartphone by using DisplayPort over
> USB-C audio, connected to a TV, with a basic UCM to enable
> 'DISPLAY_PORT_RX Audio Mixer MultiMedia1':
> https://gitlab.com/postmarketOS/pmaports/-/tree/master/device/testing/device-fairphone-fp5/ucm
> 
> Unfortunately all the device-specific things can't be enabled yet
> upstream as detailed in the second patch, but the SoC parts should be
> good to go.
> 
> As an extra note, I'm not sure how this will behave on SC7280 devices
> that seem to use GPR (q6apm + q6prm) / "audioreach" as added in this
> series from mid 2023 which was never applied:
> https://lore.kernel.org/linux-arm-msm/20230616103534.4031331-1-quic_m...@quicinc.com/
> 
> Signed-off-by: Luca Weiss 
> ---
> Luca Weiss (2):
>   arm64: dts: qcom: sc7280: Add APR nodes for sound
>   [DNM] arm64: dts: qcom: qcm6490-fairphone-fp5: Add DisplayPort sound 
> support
> 
>  arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 36 +++
>  arch/arm64/boot/dts/qcom/sc7280.dtsi   | 73 
> ++
>  2 files changed, 109 insertions(+)
> ---
> base-commit: 940d65ef852b4a58c9115eb82b07844c999b8356
> change-id: 20240510-sc7280-apr-c6d10ac2c331
> 
> Best regards,
> --
> Luca Weiss 
> 
> 
> 


My bot found new DTB warnings on the .dts files added or changed in this
series.

Some warnings may be from an existing SoC .dtsi. Or perhaps the warnings
are fixed by another series. Ultimately, it is up to the platform
maintainer whether these warnings are acceptable or not. No need to reply
unless the platform maintainer has comments.

If you already ran DT checks and didn't see these error(s), then
make sure dt-schema is up to date:

  pip3 install dtschema --upgrade


New warnings running 'make CHECK_DTBS=y qcom/qcm6490-fairphone-fp5.dtb' for 
20240510-sc7280-apr-v1-0-e9eabda05...@fairphone.com:

arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dtb: /sound: failed to match any 
schema with compatible: ['fairphone,fp5-sndcard']








[PATCH net-next] virtio_net: Fix error code in __virtnet_get_hw_stats()

2024-05-10 Thread Dan Carpenter
The virtnet_send_command_reply() function returns true on success or
false on failure.  The "ok" variable is true/false depending on whether
it succeeds or not.  It's up to the caller to translate the true/false
into -EINVAL on failure or zero for success.

The bug is that __virtnet_get_hw_stats() returns false for both
errors and success.  It's not a bug, but it is confusing that the caller
virtnet_get_hw_stats() uses an "ok" variable to store negative error
codes.

Fix the bug and clean things up so that it's clear that
__virtnet_get_hw_stats() returns zero on success or negative error codes
on failure.

Fixes: 941168f8b40e ("virtio_net: support device stats")
Signed-off-by: Dan Carpenter 
---
 drivers/net/virtio_net.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 218a446c4c27..4fc0fcdad259 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -4016,7 +4016,7 @@ static int __virtnet_get_hw_stats(struct virtnet_info *vi,
_out, _in);
 
if (!ok)
-   return ok;
+   return -EINVAL;
 
for (p = reply; p - reply < res_size; p += le16_to_cpu(hdr->size)) {
hdr = p;
@@ -4053,7 +4053,7 @@ static int virtnet_get_hw_stats(struct virtnet_info *vi,
struct virtio_net_ctrl_queue_stats *req;
bool enable_cvq;
void *reply;
-   int ok;
+   int err;
 
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_DEVICE_STATS))
return 0;
@@ -4100,12 +4100,12 @@ static int virtnet_get_hw_stats(struct virtnet_info *vi,
if (enable_cvq)
virtnet_make_stat_req(vi, ctx, req, vi->max_queue_pairs * 2, 
);
 
-   ok = __virtnet_get_hw_stats(vi, ctx, req, sizeof(*req) * j, reply, 
res_size);
+   err = __virtnet_get_hw_stats(vi, ctx, req, sizeof(*req) * j, reply, 
res_size);
 
kfree(req);
kfree(reply);
 
-   return ok;
+   return err;
 }
 
 static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 
*data)




[PATCH 1/2] arm64: dts: qcom: sc7280: Add APR nodes for sound

2024-05-10 Thread Luca Weiss
Add the different services found on APR on some devices with SC7280 SoC.
Additionally add an empty sound node in the root node as is seen on
other SoC dtsi files so device dt's can easily use that.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sc7280.dtsi | 73 
 1 file changed, 73 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index fc9ec367e3a5..659212bb38c1 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3762,6 +3763,75 @@ IPCC_MPROC_SIGNAL_GLINK_QMP
label = "lpass";
qcom,remote-pid = <2>;
 
+   apr {
+   compatible = "qcom,apr-v2";
+   qcom,glink-channels = "apr_audio_svc";
+   qcom,domain = ;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   service@3 {
+   reg = ;
+   compatible = "qcom,q6core";
+   qcom,protection-domain = 
"avs/audio", "msm/adsp/audio_pd";
+   };
+
+   q6afe: service@4 {
+   compatible = "qcom,q6afe";
+   reg = ;
+   qcom,protection-domain = 
"avs/audio", "msm/adsp/audio_pd";
+
+   q6afedai: dais {
+   compatible = 
"qcom,q6afe-dais";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #sound-dai-cells = <1>;
+   };
+
+   q6afecc: clock-controller {
+   compatible = 
"qcom,q6afe-clocks";
+   #clock-cells = <2>;
+   };
+   };
+
+   q6asm: service@7 {
+   compatible = "qcom,q6asm";
+   reg = ;
+   qcom,protection-domain = 
"avs/audio", "msm/adsp/audio_pd";
+
+   q6asmdai: dais {
+   compatible = 
"qcom,q6asm-dais";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #sound-dai-cells = <1>;
+   iommus = <_smmu 
0x1801 0x0>;
+
+   dai@0 {
+   reg = <0>;
+   };
+
+   dai@1 {
+   reg = <1>;
+   };
+
+   dai@2 {
+   reg = <2>;
+   };
+   };
+   };
+
+   q6adm: service@8 {
+   compatible = "qcom,q6adm";
+   reg = ;
+   qcom,protection-domain = 
"avs/audio", "msm/adsp/audio_pd";
+
+   q6routing: routing {
+   compatible = 
"qcom,q6adm-routing";
+   #sound-dai-cells = <0>;
+   };
+   };
+   };
+
fastrpc {
compatible = "qcom,fastrpc";
qcom,glink-channels = 
"fastrpcglink-apps-dsp";
@@ -5991,6 +6061,9 @@ cpufreq_hw: cpufreq@18591000 {
 

[PATCH DNM 2/2] arm64: dts: qcom: qcm6490-fairphone-fp5: Add DisplayPort sound support

2024-05-10 Thread Luca Weiss
Add the required nodes for sound playback via a connected external
display (DisplayPort over USB-C).

Signed-off-by: Luca Weiss 
---
Depends on a bunch of patches upstream doing bringup of Display (DSI),
DisplayPort, GPU, and then finally audio could land. But we're blocked
on DPU 1:1:1 topology for all of that unfortunately.

And also machine driver for sound just exists a bit hackily.
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 05bbf1da5cb8..2bbbcaeff95e 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -14,6 +14,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "sc7280.dtsi"
 #include "pm7250b.dtsi"
 #include "pm7325.dtsi"
@@ -774,6 +776,12 @@ _resin {
status = "okay";
 };
 
+ {
+   dai@104 {
+   reg = ;
+   };
+};
+
 _spi13_cs {
drive-strength = <6>;
bias-disable;
@@ -847,6 +855,34 @@ _2 {
status = "okay";
 };
 
+ {
+   compatible = "fairphone,fp5-sndcard";
+   model = "Fairphone 5";
+
+   mm1-dai-link {
+   link-name = "MultiMedia1";
+   cpu {
+   sound-dai = < MSM_FRONTEND_DAI_MULTIMEDIA1>;
+   };
+   };
+
+   displayport-rx-dai-link {
+   link-name = "DisplayPort Playback";
+
+   cpu {
+   sound-dai = < DISPLAY_PORT_RX>;
+   };
+
+   platform {
+   sound-dai = <>;
+   };
+
+   codec {
+   sound-dai = <_dp>;
+   };
+   };
+};
+
  {
status = "okay";
 

-- 
2.45.0




[PATCH 0/2] Add basic APR sound support for SC7280 SoC

2024-05-10 Thread Luca Weiss
Validated on Fairphone 5 (QCM6490) smartphone by using DisplayPort over
USB-C audio, connected to a TV, with a basic UCM to enable
'DISPLAY_PORT_RX Audio Mixer MultiMedia1':
https://gitlab.com/postmarketOS/pmaports/-/tree/master/device/testing/device-fairphone-fp5/ucm

Unfortunately all the device-specific things can't be enabled yet
upstream as detailed in the second patch, but the SoC parts should be
good to go.

As an extra note, I'm not sure how this will behave on SC7280 devices
that seem to use GPR (q6apm + q6prm) / "audioreach" as added in this
series from mid 2023 which was never applied:
https://lore.kernel.org/linux-arm-msm/20230616103534.4031331-1-quic_m...@quicinc.com/

Signed-off-by: Luca Weiss 
---
Luca Weiss (2):
  arm64: dts: qcom: sc7280: Add APR nodes for sound
  [DNM] arm64: dts: qcom: qcm6490-fairphone-fp5: Add DisplayPort sound 
support

 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 36 +++
 arch/arm64/boot/dts/qcom/sc7280.dtsi   | 73 ++
 2 files changed, 109 insertions(+)
---
base-commit: 940d65ef852b4a58c9115eb82b07844c999b8356
change-id: 20240510-sc7280-apr-c6d10ac2c331

Best regards,
-- 
Luca Weiss 




Re: [PATCH v3 2/2] virtiofs: use GFP_NOFS when enqueuing request through kworker

2024-05-10 Thread Miklos Szeredi
On Fri, 26 Apr 2024 at 16:38, Hou Tao  wrote:
>
> From: Hou Tao 
>
> When invoking virtio_fs_enqueue_req() through kworker, both the
> allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
> Considering the size of the sg array may be greater than PAGE_SIZE, use
> GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory
> allocation failure and to avoid unnecessarily depleting the atomic
> reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly,
> GFP_KERNEL and memalloc_nofs_{save|restore} helpers are used instead.

Makes sense.

However, I don't understand why the GFP_NOFS behavior is optional. It
should work when queuing the request for the first time as well, no?

Thanks,
Miklos



Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread Vincent Donnefort
[...]

> > > +
> > > + while (s < nr_subbufs && p < nr_pages) {
> > > + struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]);
> > > + int off = 0;
> > > +
> > > + for (; off < (1 << (subbuf_order)); off++, page++) {
> > > + if (p >= nr_pages)
> > > + break;
> > > +
> > > + pages[p++] = page;
> > > + }
> > > + s++;
> > > + }
> > 
> > The above can be made to:
> > 
> > while (p < nr_pages) {
> > struct page *page;
> > int off = 0;
> > 
> > if (WARN_ON_ONCE(s >= nr_subbufs))
> > break;
> 
> I'm not particularly happy about us calling vm_insert_pages with NULL
> pointers stored in pages.
> 
> Should we instead do
> 
> if (WARN_ON_ONCE(s >= nr_subbufs)) {
>   err = -EINVAL;
>   goto out;
> }
> 
> ?

I could also nr_pages = p in the event of s >= nr_subbufs... but that
really that shouldn't happen so let's return an error.

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 



Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread Vincent Donnefort
On Fri, May 10, 2024 at 11:15:59AM +0200, David Hildenbrand wrote:
> On 09.05.24 13:05, Vincent Donnefort wrote:
> > On Tue, May 07, 2024 at 10:34:02PM -0400, Steven Rostedt wrote:
> > > On Tue, 30 Apr 2024 12:13:51 +0100
> > > Vincent Donnefort  wrote:
> > > 
> > > > +#ifdef CONFIG_MMU
> > > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> > > > +   struct vm_area_struct *vma)
> > > > +{
> > > > +   unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = 
> > > > vma->vm_pgoff;
> > > > +   unsigned int subbuf_pages, subbuf_order;
> > > > +   struct page **pages;
> > > > +   int p = 0, s = 0;
> > > > +   int err;
> > > > +
> > > > +   /* Refuse MP_PRIVATE or writable mappings */
> > > > +   if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> > > > +   !(vma->vm_flags & VM_MAYSHARE))
> > > > +   return -EPERM;
> > > > +
> > > > +   /*
> > > > +* Make sure the mapping cannot become writable later. Also 
> > > > tell the VM
> > > > +* to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). 
> > > > Finally,
> > > > +* prevent migration, GUP and dump (VM_IO).
> > > > +*/
> > > > +   vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, 
> > > > VM_MAYWRITE);
> > > 
> > > Do we really need the VM_IO?
> > > 
> > > When testing this in gdb, I would get:
> > > 
> > > (gdb) p tmap->map->subbuf_size
> > > Cannot access memory at address 0x77fc2008
> > > 
> > > It appears that you can't ptrace IO memory. When I removed that flag,
> > > gdb has no problem reading that memory.
> > 
> > Yeah, VM_IO indeed implies DONTDUMP. VM_IO was part of Linus 
> > recommendations.
> 
> Yes, the VM should recognize that memory to some degree as being special
> already due to VM_MIXEDMAP and VM_DONTEXPAND.
> 
> #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
> 
> So any of these flag achieve that (e.g., mlock_fixup() checks VM_SPECIAL).
> KSM similarly skips VM_DONTEXPAND and VM_MIXEDMAP (likely we should be using
> VM_SPECIAL in vma_ksm_compatible()). Not sure about page migration, likely
> its fine.
> 
> Thinking about MADV_DONTNEED, I can spot in
> madvise_dontneed_free_valid_vma() only that we disallow primarily VM_PFNMAP.
> 
> ... I assume if user space MADV_DONTNEED's some pages we'll simply get a
> page fault later on access that will SIGBUS, handling that gracefully (we
> should double-check!).

I've just tested and indeed, I get a SIGBUS! All good there.

> 
> 
> > But perhaps, VM_DONTEXPAND and MIXEDMAP (implicitely set by 
> > vm_insert_pages) are
> > enough protection?
> 
> Do we want to dump these pages? VM_DONTDUMP might be reasonabe then. 

Somehow I thought this would prevent ptrace as well, but I've just tested it and
this is not the case as well. So let's keep DONTDUMP.

Thanks!

> 
> > 
> > I don't see how anything could use GUP there and as David pointed-out on the
> > previous version, it doesn't event prevent the GUP-fast path.
> 
> Yes, GUP-fast would still have worked under some conditions.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 



Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread David Hildenbrand

On 08.05.24 04:34, Steven Rostedt wrote:

On Tue, 30 Apr 2024 12:13:51 +0100
Vincent Donnefort  wrote:


+#ifdef CONFIG_MMU
+static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
+   struct vm_area_struct *vma)
+{
+   unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
+   unsigned int subbuf_pages, subbuf_order;
+   struct page **pages;
+   int p = 0, s = 0;
+   int err;
+
+   /* Refuse MP_PRIVATE or writable mappings */
+   if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
+   !(vma->vm_flags & VM_MAYSHARE))
+   return -EPERM;
+
+   /*
+* Make sure the mapping cannot become writable later. Also tell the VM
+* to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). Finally,
+* prevent migration, GUP and dump (VM_IO).
+*/
+   vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, VM_MAYWRITE);


Do we really need the VM_IO?

When testing this in gdb, I would get:

(gdb) p tmap->map->subbuf_size
Cannot access memory at address 0x77fc2008

It appears that you can't ptrace IO memory. When I removed that flag,
gdb has no problem reading that memory.

I think we should drop that flag.

Can you send a v23 with that removed, Shuah's update, and also the
change below:


+
+   lockdep_assert_held(_buffer->mapping_lock);
+
+   subbuf_order = cpu_buffer->buffer->subbuf_order;
+   subbuf_pages = 1 << subbuf_order;
+
+   nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */
+   nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */
+
+   vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+   if (!vma_pages || vma_pages > nr_pages)
+   return -EINVAL;
+
+   nr_pages = vma_pages;
+
+   pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+
+   if (!pgoff) {
+   pages[p++] = virt_to_page(cpu_buffer->meta_page);
+
+   /*
+* TODO: Align sub-buffers on their size, once
+* vm_insert_pages() supports the zero-page.
+*/
+   } else {
+   /* Skip the meta-page */
+   pgoff--;
+
+   if (pgoff % subbuf_pages) {
+   err = -EINVAL;
+   goto out;
+   }
+
+   s += pgoff / subbuf_pages;
+   }
+
+   while (s < nr_subbufs && p < nr_pages) {
+   struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]);
+   int off = 0;
+
+   for (; off < (1 << (subbuf_order)); off++, page++) {
+   if (p >= nr_pages)
+   break;
+
+   pages[p++] = page;
+   }
+   s++;
+   }


The above can be made to:

while (p < nr_pages) {
struct page *page;
int off = 0;

if (WARN_ON_ONCE(s >= nr_subbufs))
break;


I'm not particularly happy about us calling vm_insert_pages with NULL 
pointers stored in pages.


Should we instead do

if (WARN_ON_ONCE(s >= nr_subbufs)) {
err = -EINVAL;
goto out;
}

?

--
Cheers,

David / dhildenb




Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-05-10 Thread David Hildenbrand

On 09.05.24 13:05, Vincent Donnefort wrote:

On Tue, May 07, 2024 at 10:34:02PM -0400, Steven Rostedt wrote:

On Tue, 30 Apr 2024 12:13:51 +0100
Vincent Donnefort  wrote:


+#ifdef CONFIG_MMU
+static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
+   struct vm_area_struct *vma)
+{
+   unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
+   unsigned int subbuf_pages, subbuf_order;
+   struct page **pages;
+   int p = 0, s = 0;
+   int err;
+
+   /* Refuse MP_PRIVATE or writable mappings */
+   if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
+   !(vma->vm_flags & VM_MAYSHARE))
+   return -EPERM;
+
+   /*
+* Make sure the mapping cannot become writable later. Also tell the VM
+* to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). Finally,
+* prevent migration, GUP and dump (VM_IO).
+*/
+   vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, VM_MAYWRITE);


Do we really need the VM_IO?

When testing this in gdb, I would get:

(gdb) p tmap->map->subbuf_size
Cannot access memory at address 0x77fc2008

It appears that you can't ptrace IO memory. When I removed that flag,
gdb has no problem reading that memory.


Yeah, VM_IO indeed implies DONTDUMP. VM_IO was part of Linus recommendations.


Yes, the VM should recognize that memory to some degree as being special 
already due to VM_MIXEDMAP and VM_DONTEXPAND.


#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)

So any of these flag achieve that (e.g., mlock_fixup() checks 
VM_SPECIAL). KSM similarly skips VM_DONTEXPAND and VM_MIXEDMAP (likely 
we should be using VM_SPECIAL in vma_ksm_compatible()). Not sure about 
page migration, likely its fine.


Thinking about MADV_DONTNEED, I can spot in 
madvise_dontneed_free_valid_vma() only that we disallow primarily VM_PFNMAP.


... I assume if user space MADV_DONTNEED's some pages we'll simply get a 
page fault later on access that will SIGBUS, handling that gracefully 
(we should double-check!).




But perhaps, VM_DONTEXPAND and MIXEDMAP (implicitely set by vm_insert_pages) are
enough protection?


Do we want to dump these pages? VM_DONTDUMP might be reasonabe then.



I don't see how anything could use GUP there and as David pointed-out on the
previous version, it doesn't event prevent the GUP-fast path.


Yes, GUP-fast would still have worked under some conditions.

--
Cheers,

David / dhildenb




Re: [PATCH 1/2] objpool: enable inlining objpool_push() and objpool_pop() operations

2024-05-10 Thread wuqiang.matt

On 2024/5/10 16:20, Vlastimil Babka wrote:

On 5/10/24 9:59 AM, wuqiang.matt wrote:

On 2024/5/7 21:55, Vlastimil Babka wrote:

  >>

+   } while (!try_cmpxchg_acquire(>tail, , tail + 1));
+
+   /* now the tail position is reserved for the given obj */
+   WRITE_ONCE(slot->entries[tail & slot->mask], obj);
+   /* update sequence to make this obj available for pop() */
+   smp_store_release(>last, tail + 1);
+
+   return 0;
+}
   
   /**

* objpool_push() - reclaim the object and return back to objpool
@@ -134,7 +219,19 @@ void *objpool_pop(struct objpool_head *pool);
* return: 0 or error code (it fails only when user tries to push
* the same object multiple times or wrong "objects" into objpool)
*/
-int objpool_push(void *obj, struct objpool_head *pool);
+static inline int objpool_push(void *obj, struct objpool_head *pool)
+{
+   unsigned long flags;
+   int rc;
+
+   /* disable local irq to avoid preemption & interruption */
+   raw_local_irq_save(flags);
+   rc = __objpool_try_add_slot(obj, pool, raw_smp_processor_id());


And IIUC, we could in theory objpool_pop() on one cpu, then later another
cpu might do objpool_push() and cause the latter cpu's pool to go over
capacity? Is there some implicit requirements of objpool users to take care
of having matched cpu for pop and push? Are the current objpool users
obeying this requirement? (I can see the selftests do, not sure about the
actual users).
Or am I missing something? Thanks.


The objects are all pre-allocated along with creation of the new objpool
and the total number of objects never exceeds the capacity on local node.


Aha, I see, the capacity of entries is enough to hold objects from all nodes
in the most unfortunate case they all end up freed from a single cpu.


So objpool_push() would always find an available slot from the ring-array
for the given object to insert back. objpool_pop() would try looping all
the percpu slots until an object is found or whole objpool is empty.


So it's correct, but seems rather wasteful to have the whole capacity for
entries replicated on every cpu? It does make objpool_push() simple and
fast, but as you say, objpool_pop() still has to search potentially all
non-local percpu slots, with disabled irqs, which is far from ideal.


Yes, it's a trade-off between performance and memory usage, with a slight
increase of memory consumption for a significant improvement of performance.

The reason of disabling local irqs is objpool uses a 32bit sequence number
as the state description of each element. It could likely overflow and go
back with the same value for extreme cases. 64bit value could eliminate the
collision but seems too heavy.


And the "abort if the slot was already full" comment for
objpool_try_add_slot() seems still misleading? Maybe that was your initial
idea but changed later?


Right, the comments are just left unchanged during iterations. The original
implementation kept each percpu ring-array very compact and objpool_push will
try looping all cpu nodes to return the given object to objpool.

Actually my new update would remove objpool_try_add_slot and integrate it's 
functionality into objpool_push. I'll submit the new patch when I finish the

verification.




Currently kretprobe is the only actual usecase of objpool.

I'm testing an updated objpool in our HIDS project for critical pathes,
which is widely deployed on servers inside my company. The new version
eliminates the raw_local_irq_save and raw_local_irq_restore pair of
objpool_push and gains up to 5% of performance boost.


Mind Ccing me and linux-mm once you are posting that?


Sure, I'll make sure to let you know.


Thanks,
Vlastimil



Regards,
Matt Wu



[PATCH v3] module: create weak dependecies

2024-05-10 Thread Jose Ignacio Tornos Martinez
It has been seen that for some network mac drivers (i.e. lan78xx) the
related module for the phy is loaded dynamically depending on the current
hardware. In this case, the associated phy is read using mdio bus and then
the associated phy module is loaded during runtime (kernel function
phy_request_driver_module). However, no software dependency is defined, so
the user tools will no be able to get this dependency. For example, if
dracut is used and the hardware is present, lan78xx will be included but no
phy module will be added, and in the next restart the device will not work
from boot because no related phy will be found during initramfs stage.

In order to solve this, we could define a normal 'pre' software dependency
in lan78xx module with all the possible phy modules (there may be some),
but proceeding in that way, all the possible phy modules would be loaded
while only one is necessary.

The idea is to create a new type of dependency, that we are going to call
'weak' to be used only by the user tools that need to detect this situation.
In that way, for example, dracut could check the 'weak' dependency of the
modules involved in order to install these dependencies in initramfs too.
That is, for the commented lan78xx module, defining the 'weak' dependency
with the possible phy modules list, only the necessary phy would be loaded
on demand keeping the same behavior, but all the possible phy modules would
be available from initramfs.

The 'weak' dependency support has been included in kmod:
https://github.com/kmod-project/kmod/commit/05828b4a6e9327a63ef94df544a042b5e9ce4fe7
But, take into account that this can only be used if depmod is new enough.
If it isn't, depmod will have the same behavior as always (keeping backward
compatibility) and the information for the 'weak' dependency will not be
provided.

Signed-off-by: Jose Ignacio Tornos Martinez 
---
V2 -> V3:
- Include note about backward compatibility.
- Balance the /* and */.
V1 -> V2:
- Include reference to 'weak' dependency support in kmod.

 include/linux/module.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..2a056017df5b 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -173,6 +173,12 @@ extern void cleanup_module(void);
  */
 #define MODULE_SOFTDEP(_softdep) MODULE_INFO(softdep, _softdep)
 
+/*
+ * Weak module dependencies. See man modprobe.d for details.
+ * Example: MODULE_WEAKDEP("module-foo")
+ */
+#define MODULE_WEAKDEP(_weakdep) MODULE_INFO(weakdep, _weakdep)
+
 /*
  * MODULE_FILE is used for generating modules.builtin
  * So, make it no-op when this is being built as a module
-- 
2.44.0




Re: [PATCH v2] module: create weak dependecies

2024-05-10 Thread Jose Ignacio Tornos Martinez
> I think it's important a note about backward compatibility. If a system
> doesn't have a new-enough depmod, it will basically not create the new
> weadep file and initrd generators won't be able to use that. Only
> downside is not being able to use the new feature, but it should still
> work as previously.
Ok, good, idea, I will send a new version of the patch with the note.

Documentation/process/coding-style.rst section 8 says to balance the /*
and */, but up to Luis what he enforces in the module tree.
Ok, I will fix that too.

Thanks

Best regards
José Ignacio




Re: [PATCH 1/2] objpool: enable inlining objpool_push() and objpool_pop() operations

2024-05-10 Thread Vlastimil Babka
On 5/10/24 9:59 AM, wuqiang.matt wrote:
> On 2024/5/7 21:55, Vlastimil Babka wrote:
 >>
>>> +   } while (!try_cmpxchg_acquire(>tail, , tail + 1));
>>> +
>>> +   /* now the tail position is reserved for the given obj */
>>> +   WRITE_ONCE(slot->entries[tail & slot->mask], obj);
>>> +   /* update sequence to make this obj available for pop() */
>>> +   smp_store_release(>last, tail + 1);
>>> +
>>> +   return 0;
>>> +}
>>>   
>>>   /**
>>>* objpool_push() - reclaim the object and return back to objpool
>>> @@ -134,7 +219,19 @@ void *objpool_pop(struct objpool_head *pool);
>>>* return: 0 or error code (it fails only when user tries to push
>>>* the same object multiple times or wrong "objects" into objpool)
>>>*/
>>> -int objpool_push(void *obj, struct objpool_head *pool);
>>> +static inline int objpool_push(void *obj, struct objpool_head *pool)
>>> +{
>>> +   unsigned long flags;
>>> +   int rc;
>>> +
>>> +   /* disable local irq to avoid preemption & interruption */
>>> +   raw_local_irq_save(flags);
>>> +   rc = __objpool_try_add_slot(obj, pool, raw_smp_processor_id());
>> 
>> And IIUC, we could in theory objpool_pop() on one cpu, then later another
>> cpu might do objpool_push() and cause the latter cpu's pool to go over
>> capacity? Is there some implicit requirements of objpool users to take care
>> of having matched cpu for pop and push? Are the current objpool users
>> obeying this requirement? (I can see the selftests do, not sure about the
>> actual users).
>> Or am I missing something? Thanks.
> 
> The objects are all pre-allocated along with creation of the new objpool
> and the total number of objects never exceeds the capacity on local node.

Aha, I see, the capacity of entries is enough to hold objects from all nodes
in the most unfortunate case they all end up freed from a single cpu.

> So objpool_push() would always find an available slot from the ring-array
> for the given object to insert back. objpool_pop() would try looping all
> the percpu slots until an object is found or whole objpool is empty.

So it's correct, but seems rather wasteful to have the whole capacity for
entries replicated on every cpu? It does make objpool_push() simple and
fast, but as you say, objpool_pop() still has to search potentially all
non-local percpu slots, with disabled irqs, which is far from ideal.

And the "abort if the slot was already full" comment for
objpool_try_add_slot() seems still misleading? Maybe that was your initial
idea but changed later?

> Currently kretprobe is the only actual usecase of objpool.
> 
> I'm testing an updated objpool in our HIDS project for critical pathes,
> which is widely deployed on servers inside my company. The new version
> eliminates the raw_local_irq_save and raw_local_irq_restore pair of
> objpool_push and gains up to 5% of performance boost.

Mind Ccing me and linux-mm once you are posting that?

Thanks,
Vlastimil




Re: [PATCH 1/2] objpool: enable inlining objpool_push() and objpool_pop() operations

2024-05-10 Thread wuqiang.matt

On 2024/5/7 21:55, Vlastimil Babka wrote:

On 4/24/24 11:52 PM, Andrii Nakryiko wrote:

objpool_push() and objpool_pop() are very performance-critical functions
and can be called very frequently in kretprobe triggering path.

As such, it makes sense to allow compiler to inline them completely to
eliminate function calls overhead. Luckily, their logic is quite well
isolated and doesn't have any sprawling dependencies.

This patch moves both objpool_push() and objpool_pop() into
include/linux/objpool.h and marks them as static inline functions,
enabling inlining. To avoid anyone using internal helpers
(objpool_try_get_slot, objpool_try_add_slot), rename them to use leading
underscores.

We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe
and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi
programs in a tight loop to evaluate the effect. BPF own overhead in
this case is minimal and it mostly stresses the rest of in-kernel
kretprobe infrastructure overhead. Results are in millions of calls per
second. This is not super scientific, but shows the trend nevertheless.

BEFORE
==
kretprobe  :9.794 ± 0.086M/s
kretprobe-multi:   10.219 ± 0.032M/s

AFTER
=
kretprobe  :9.937 ± 0.174M/s (+1.5%)
kretprobe-multi:   10.440 ± 0.108M/s (+2.2%)

Cc: Matt (Qiang) Wu 
Signed-off-by: Andrii Nakryiko 


Hello,


Hi Vlastimil,


this question is not specific to your patch, but since it's a recent thread,
I'll ask it here instead of digging up the original objpool patches.

I'm trying to understand how objpool works and if it could be integrated
into SLUB, for the LSF/MM discussion next week:

https://lore.kernel.org/all/b929d5fb-8e88-4f23-8ec7-6bdaf61f8...@suse.cz/


+/* adding object to slot, abort if the slot was already full */


I don't see any actual abort in the code (not in this code nor in the
deleted code - it's the same code, just moved for inlining purposes).


+static inline int
+__objpool_try_add_slot(void *obj, struct objpool_head *pool, int cpu)
+{
+   struct objpool_slot *slot = pool->cpu_slots[cpu];
+   uint32_t head, tail;
+
+   /* loading tail and head as a local snapshot, tail first */
+   tail = READ_ONCE(slot->tail);
+
+   do {
+   head = READ_ONCE(slot->head);
+   /* fault caught: something must be wrong */
+   WARN_ON_ONCE(tail - head > pool->nr_objs);


So this will only WARN if we go over the capacity, but continue and
overwrite a pointer that was already there, effectively leaking said object, no?


Yes, the WARN is only for error-catch in caller side (try to push one
same object multiple times for example).




+   } while (!try_cmpxchg_acquire(>tail, , tail + 1));
+
+   /* now the tail position is reserved for the given obj */
+   WRITE_ONCE(slot->entries[tail & slot->mask], obj);
+   /* update sequence to make this obj available for pop() */
+   smp_store_release(>last, tail + 1);
+
+   return 0;
+}
  
  /**

   * objpool_push() - reclaim the object and return back to objpool
@@ -134,7 +219,19 @@ void *objpool_pop(struct objpool_head *pool);
   * return: 0 or error code (it fails only when user tries to push
   * the same object multiple times or wrong "objects" into objpool)
   */
-int objpool_push(void *obj, struct objpool_head *pool);
+static inline int objpool_push(void *obj, struct objpool_head *pool)
+{
+   unsigned long flags;
+   int rc;
+
+   /* disable local irq to avoid preemption & interruption */
+   raw_local_irq_save(flags);
+   rc = __objpool_try_add_slot(obj, pool, raw_smp_processor_id());


And IIUC, we could in theory objpool_pop() on one cpu, then later another
cpu might do objpool_push() and cause the latter cpu's pool to go over
capacity? Is there some implicit requirements of objpool users to take care
of having matched cpu for pop and push? Are the current objpool users
obeying this requirement? (I can see the selftests do, not sure about the
actual users).
Or am I missing something? Thanks.


The objects are all pre-allocated along with creation of the new objpool
and the total number of objects never exceeds the capacity on local node.
So objpool_push() would always find an available slot from the ring-array
for the given object to insert back. objpool_pop() would try looping all
the percpu slots until an object is found or whole objpool is empty.

Currently kretprobe is the only actual usecase of objpool.

I'm testing an updated objpool in our HIDS project for critical pathes,
which is widely deployed on servers inside my company. The new version
eliminates the raw_local_irq_save and raw_local_irq_restore pair of
objpool_push and gains up to 5% of performance boost.




+   raw_local_irq_restore(flags);
+
+   return rc;
+}
+
  
  /**

   * objpool_drop() - discard the object and deref objpool
diff --git a/lib/objpool.c b/lib/objpool.c
index cfdc02420884..f696308fc026 100644
--- a/lib/objpool.c
+++